Files
blog/content/posts/threat-intel-homelab-post-en.md
Betty c5c57fee78
All checks were successful
Hugo Build & Deploy / build-deploy (push) Successful in 7s
post: add English version of threat intel homelab article
2026-02-25 18:25:11 +01:00

301 lines
12 KiB
Markdown

---
title: "Building a Homelab Threat Intelligence Platform with ML: A Beginner's Journey"
date: 2026-02-24T23:58:00+00:00
draft: false
tags: ["threat-intelligence", "cybersecurity", "machine-learning", "osint", "telegram", "homelab", "docker-swarm", "nlp", "ner", "mitre-attack", "infosec", "docker", "devops", "gitops", "selfhosted", "opensource", "build-in-public", "french-tech", "ddos", "soc", "blue-team"]
summary: "How I built, from scratch, a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels."
description: "A beginner's journey building a CTI platform with multilingual NER, Telegram OSINT bots and MITRE ATT&CK integration on a self-hosted Docker Swarm homelab."
author: "Bojemoi"
ShowToc: true
ShowReadingTime: true
---
Six months ago, I knew almost nothing about cybersecurity infrastructure. Today, I run a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels. If you're reading this thinking "that sounds impossible for someone starting out" - I get it. I thought the same thing.
This post is for you: the curious beginner who wants to build real security tools but doesn't know where to start.
## What Actually Is This Thing?
Let me start with what my system does in plain English:
**The Problem**: Hacktivist groups announce their attack targets on Telegram before launching DDoS attacks. French organizations often become targets, but there's no automated way to detect these threats early.
**My Solution**: A system that:
1. Monitors Telegram channels where hacktivists hang out
2. Reads messages in multiple languages (because hacktivists don't all speak English)
3. Identifies when organizations are mentioned
4. Scores how "buzzy" or threatening the discussion is
5. Alerts me when a French organization might be under threat
Think of it like having a robot that reads threatening forums 24/7 and taps you on the shoulder when trouble's brewing.
## Why You Can Build This (Even as a Beginner)
Here's my controversial take: **you don't need to be a coding expert to build sophisticated security systems anymore**.
My approach relies on three principles:
### 1. Let AI Write Your Code
I don't write much code by hand. Instead, I manage prompts and structures. I tell Claude (or similar AI) what I want to build, and it generates the implementation. My job is architecture and integration, not syntax.
**This means**: If you can describe what you want clearly, you can build it.
### 2. Open Source Everything
Every tool I use is free and open source:
- XenServer for virtualization
- Docker Swarm for container orchestration
- PostgreSQL for databases
- Gitea for Git workflows
- Prometheus/Grafana for monitoring
**This means**: Zero licensing costs, full control, and a massive community for support.
### 3. Automate Everything
Manual processes don't scale and you'll forget them. My infrastructure uses:
- GitOps workflows (push config → automatic deployment)
- Cloud-init for VM provisioning
- Webhook automation for CI/CD
- Container orchestration for self-healing
**This means**: Once set up, the system runs itself. You maintain infrastructure as code, not by clicking through interfaces.
## The Journey: What I (and Claude) Actually Built
Let me walk you through the evolution, because it wasn't linear:
### Phase 1: The Foundation (Months 0-2)
**Started with**: Basic XenServer setup, learning virtualization concepts
**What I learned**: You need a solid base layer before anything fancy. I spent time understanding:
- Network bonding for high availability
- Storage management
- VM templates and provisioning
**Beginner trap I avoided**: Trying to do everything at once. Master one layer before adding the next.
### Phase 2: Container Orchestration (Months 2-3)
**Built**: Docker Swarm cluster with automatic deployment
**What this gives you**: Deploy new services in seconds, not hours. Services restart automatically if they crash.
**The "aha" moment**: When I pushed code to Git and watched it automatically deploy to production without touching a terminal. That's when it clicked.
### Phase 3: GitOps Workflow (Months 3-4)
**Built**: Gitea integration with custom cloud-init datasources that query PostgreSQL
**What this enables**:
- Write a YAML config describing a VM
- Push it to Git
- VM automatically gets created and configured
- All infrastructure becomes reproducible
**Why this matters**: Your homelab becomes code. Disaster recovery is just re-running your Git repo.
### Phase 4: Threat Intelligence (Months 4-6)
**Built**: The actual threat intelligence platform with:
- Telegram bot monitoring with OSINT capabilities
- Multilingual Named Entity Recognition (NER)
- Entity extraction and relationship mapping
- Buzz scoring algorithms
- Integration with Maltego, TheHive, MISP
- MITRE ATT&CK framework mapping
**The breakthrough**: Realizing threat intelligence is pattern matching at scale. ML helps, but smart architecture and data streaming matter more.
## The Architecture (Simplified)
Here's how the pieces fit together without overwhelming you:
```
[Telegram Channels]
↓ (messages stream in)
[Telegram Bot Infrastructure]
↓ (raw text + metadata)
[Multilingual NER Processing]
↓ (identified entities)
[Entity Extraction & Scoring]
↓ (threat scores)
[PostgreSQL Database]
↓ (queries for analysis)
[Alert System] → [Me!]
```
Each box is a microservice in Docker. They communicate through message queues and APIs. If one crashes, the others keep running.
## Key Technologies Explained (For Beginners)
**XenServer**: Think of it like having multiple computers inside one physical computer. Each "virtual machine" acts independently.
**Docker Swarm**: Manages containers (lightweight mini-environments). If you deploy 5 containers, Swarm spreads them across your servers and restarts them if they die.
**PostgreSQL**: A database. It stores all the structured data (entities, threat scores, relationships).
**Gitea**: Like GitHub, but you host it yourself. Your code and configs live here.
**Cloud-init**: Automates VM setup. Instead of clicking through installers, you describe what you want in a file.
**NER (Named Entity Recognition)**: ML that finds entities in text. It spots "Microsoft" in a message and knows it's an organization, not just a word.
## Practical Tips If You're Starting Out
### Start Small, Think Big
Don't try to build everything at once. I started with:
1. One VM running Docker
2. One simple service (a Telegram bot)
3. One database
4. Then gradually added orchestration, monitoring, automation
### Embrace Configuration Over Code
Write YAML configs that describe what you want. Let tools like Docker Compose and cloud-init handle the implementation.
### Build in Production from Day One
Don't have a "learning environment" and a "production environment." Build production-grade from the start:
- Use container orchestration
- Set up monitoring
- Implement logging
- Design for failure
You'll learn better practices and won't need to rebuild later.
### Use AI Assistants Aggressively
I use Claude to:
- Generate FastAPI applications
- Write Docker configs
- Create database schemas
- Debug issues
- Explain concepts I don't understand
This isn't cheating - it's working smart.
### Focus on Integration, Not Implementation
Your value isn't writing Python - it's designing systems that solve problems. Let AI handle syntax. You handle architecture.
## The Threat Intelligence Platform: Deeper Dive
Since this is the cool part, let me break down how the ML/intelligence piece works:
### 1. Data Ingestion
Telegram bots monitor channels and capture:
- Message text
- Timestamp
- Sender info
- Channel metadata
This streams into a message queue for processing.
### 2. Multilingual Processing
Messages might be in Russian, English, French, or mixed. The NER pipeline:
- Detects language
- Applies appropriate NER model
- Extracts entities (organizations, people, locations, IPs, domains)
**Why multilingual matters**: Hacktivists often operate in Russian or use mixed languages to avoid detection.
### 3. Entity Extraction & Scoring
For each entity (like "Company X"), the system:
- Checks if it's French (geolocation + domain analysis)
- Counts mentions across time windows
- Analyzes sentiment and threat keywords
- Computes a "buzz score"
High buzz score = something's happening.
### 4. Threat Correlation
The system maps entities to:
- Known infrastructure (via OSINT tools like Shodan, VirusTotal)
- Historical attack patterns
- MITRE ATT&CK techniques
This builds a threat graph showing relationships.
### 5. Alerting
When patterns indicate elevated risk:
- Score exceeds threshold
- Multiple channels mention the same target
- Threat keywords appear in context
→ Alert fires with supporting evidence.
## OSINT Integration: Making It Smarter
The Telegram bot has integrated OSINT capabilities:
**IP Analysis**: Query VirusTotal, AbuseIPDB, Shodan for reputation and historical data
**Domain Intelligence**: Passive DNS, WHOIS, certificate analysis
**Blockchain Enrollment**: User verification via blockchain-based systems (for access control)
**Framework Mapping**: Automatic MITRE ATT&CK technique identification
This turns raw data into actionable intelligence.
## What Would I Do Differently?
**Start with better monitoring**: I added Prometheus/Grafana late. Wish I'd built it from day one. You can't debug what you can't see.
**Document as you go**: I'm rebuilding some knowledge because I didn't document decisions. Write down WHY you chose something, not just WHAT.
**Network design upfront**: I've had to refactor networking multiple times. Plan your subnets, VLANs, and firewall rules before deploying services.
**Test disaster recovery early**: I built an amazing system... then realized I hadn't tested restoring from backups. Test your failure modes.
## Resources That Actually Helped
**For learning infrastructure**:
- The Phoenix Project (book) - changed how I think about systems
- XenServer documentation - surprisingly readable
- Docker Swarm docs - shorter than Kubernetes, easier to start
**For threat intelligence**:
- MITRE ATT&CK framework - free, comprehensive
- MISP Project documentation - open source threat sharing
- TheHive Project - incident response platform
**For practical skills**:
- Claude (obviously) - for code generation and explanation
- GitHub repos of similar projects - learn from real implementations
- YouTube channels on homelab setups
## The Cost Reality
**Hardware**: I started with older server hardware (~$500 used)
**Software**: $0 (all open source)
**Cloud**: I have some AWS infrastructure, but homelab is self-hosted
**Time**: Significant, but compressed by using AI assistance
You can start smaller - a decent desktop or used server is enough.
## Final Thoughts: You Can Do This
The cybersecurity field sometimes feels gatekept by complexity and jargon. But here's the truth: **if you can describe a problem clearly, you can build a solution**.
Six months ago:
- I didn't know what Docker Swarm was
- I'd never written a FastAPI app
- I couldn't explain what NER meant
- I'd never deployed a VM programmatically
Today I run a production-grade threat intelligence platform.
The difference isn't that I became a genius - it's that I:
1. Broke big problems into small steps
2. Used AI to handle implementation details
3. Focused on open source tools
4. Automated relentlessly
5. Built in public (even when it was messy)
Your threat intelligence platform might look different than mine. Maybe you care about different threats, use different data sources, or have different infrastructure. That's perfect - build what matters to you.
The tools are free. The knowledge is accessible. The AI assistants are ready to help.
Start with one VM. Deploy one service. Automate one thing.
Six months from now, you'll be writing your own "beginner's journey" post.
---
**Next in this series**: I'll break down the technical architecture with code examples, Docker configs, and the actual Telegram bot implementation. But first, I want to hear from you: what part of this interests you most?
*Hit me up on @Betty_Bombers_bot with questions, or follow along as I document the technical deep-dives.*