From c5c57fee78342fbc105002e560631fd8130d5f0f Mon Sep 17 00:00:00 2001 From: Betty Date: Wed, 25 Feb 2026 18:25:11 +0100 Subject: [PATCH] post: add English version of threat intel homelab article --- content/posts/threat-intel-homelab-post-en.md | 300 ++++++++++++++++++ 1 file changed, 300 insertions(+) create mode 100644 content/posts/threat-intel-homelab-post-en.md diff --git a/content/posts/threat-intel-homelab-post-en.md b/content/posts/threat-intel-homelab-post-en.md new file mode 100644 index 0000000..a255bb1 --- /dev/null +++ b/content/posts/threat-intel-homelab-post-en.md @@ -0,0 +1,300 @@ +--- +title: "Building a Homelab Threat Intelligence Platform with ML: A Beginner's Journey" +date: 2026-02-24T23:58:00+00:00 +draft: false +tags: ["threat-intelligence", "cybersecurity", "machine-learning", "osint", "telegram", "homelab", "docker-swarm", "nlp", "ner", "mitre-attack", "infosec", "docker", "devops", "gitops", "selfhosted", "opensource", "build-in-public", "french-tech", "ddos", "soc", "blue-team"] +summary: "How I built, from scratch, a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels." +description: "A beginner's journey building a CTI platform with multilingual NER, Telegram OSINT bots and MITRE ATT&CK integration on a self-hosted Docker Swarm homelab." +author: "Bojemoi" +ShowToc: true +ShowReadingTime: true +--- + +Six months ago, I knew almost nothing about cybersecurity infrastructure. Today, I run a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels. If you're reading this thinking "that sounds impossible for someone starting out" - I get it. I thought the same thing. + +This post is for you: the curious beginner who wants to build real security tools but doesn't know where to start. + +## What Actually Is This Thing? + +Let me start with what my system does in plain English: + +**The Problem**: Hacktivist groups announce their attack targets on Telegram before launching DDoS attacks. French organizations often become targets, but there's no automated way to detect these threats early. + +**My Solution**: A system that: +1. Monitors Telegram channels where hacktivists hang out +2. Reads messages in multiple languages (because hacktivists don't all speak English) +3. Identifies when organizations are mentioned +4. Scores how "buzzy" or threatening the discussion is +5. Alerts me when a French organization might be under threat + +Think of it like having a robot that reads threatening forums 24/7 and taps you on the shoulder when trouble's brewing. + +## Why You Can Build This (Even as a Beginner) + +Here's my controversial take: **you don't need to be a coding expert to build sophisticated security systems anymore**. + +My approach relies on three principles: + +### 1. Let AI Write Your Code +I don't write much code by hand. Instead, I manage prompts and structures. I tell Claude (or similar AI) what I want to build, and it generates the implementation. My job is architecture and integration, not syntax. + +**This means**: If you can describe what you want clearly, you can build it. + +### 2. Open Source Everything +Every tool I use is free and open source: +- XenServer for virtualization +- Docker Swarm for container orchestration +- PostgreSQL for databases +- Gitea for Git workflows +- Prometheus/Grafana for monitoring + +**This means**: Zero licensing costs, full control, and a massive community for support. + +### 3. Automate Everything +Manual processes don't scale and you'll forget them. My infrastructure uses: +- GitOps workflows (push config → automatic deployment) +- Cloud-init for VM provisioning +- Webhook automation for CI/CD +- Container orchestration for self-healing + +**This means**: Once set up, the system runs itself. You maintain infrastructure as code, not by clicking through interfaces. + +## The Journey: What I (and Claude) Actually Built + +Let me walk you through the evolution, because it wasn't linear: + +### Phase 1: The Foundation (Months 0-2) +**Started with**: Basic XenServer setup, learning virtualization concepts + +**What I learned**: You need a solid base layer before anything fancy. I spent time understanding: +- Network bonding for high availability +- Storage management +- VM templates and provisioning + +**Beginner trap I avoided**: Trying to do everything at once. Master one layer before adding the next. + +### Phase 2: Container Orchestration (Months 2-3) +**Built**: Docker Swarm cluster with automatic deployment + +**What this gives you**: Deploy new services in seconds, not hours. Services restart automatically if they crash. + +**The "aha" moment**: When I pushed code to Git and watched it automatically deploy to production without touching a terminal. That's when it clicked. + +### Phase 3: GitOps Workflow (Months 3-4) +**Built**: Gitea integration with custom cloud-init datasources that query PostgreSQL + +**What this enables**: +- Write a YAML config describing a VM +- Push it to Git +- VM automatically gets created and configured +- All infrastructure becomes reproducible + +**Why this matters**: Your homelab becomes code. Disaster recovery is just re-running your Git repo. + +### Phase 4: Threat Intelligence (Months 4-6) +**Built**: The actual threat intelligence platform with: +- Telegram bot monitoring with OSINT capabilities +- Multilingual Named Entity Recognition (NER) +- Entity extraction and relationship mapping +- Buzz scoring algorithms +- Integration with Maltego, TheHive, MISP +- MITRE ATT&CK framework mapping + +**The breakthrough**: Realizing threat intelligence is pattern matching at scale. ML helps, but smart architecture and data streaming matter more. + +## The Architecture (Simplified) + +Here's how the pieces fit together without overwhelming you: + +``` +[Telegram Channels] + ↓ (messages stream in) +[Telegram Bot Infrastructure] + ↓ (raw text + metadata) +[Multilingual NER Processing] + ↓ (identified entities) +[Entity Extraction & Scoring] + ↓ (threat scores) +[PostgreSQL Database] + ↓ (queries for analysis) +[Alert System] → [Me!] +``` + +Each box is a microservice in Docker. They communicate through message queues and APIs. If one crashes, the others keep running. + +## Key Technologies Explained (For Beginners) + +**XenServer**: Think of it like having multiple computers inside one physical computer. Each "virtual machine" acts independently. + +**Docker Swarm**: Manages containers (lightweight mini-environments). If you deploy 5 containers, Swarm spreads them across your servers and restarts them if they die. + +**PostgreSQL**: A database. It stores all the structured data (entities, threat scores, relationships). + +**Gitea**: Like GitHub, but you host it yourself. Your code and configs live here. + +**Cloud-init**: Automates VM setup. Instead of clicking through installers, you describe what you want in a file. + +**NER (Named Entity Recognition)**: ML that finds entities in text. It spots "Microsoft" in a message and knows it's an organization, not just a word. + +## Practical Tips If You're Starting Out + +### Start Small, Think Big +Don't try to build everything at once. I started with: +1. One VM running Docker +2. One simple service (a Telegram bot) +3. One database +4. Then gradually added orchestration, monitoring, automation + +### Embrace Configuration Over Code +Write YAML configs that describe what you want. Let tools like Docker Compose and cloud-init handle the implementation. + +### Build in Production from Day One +Don't have a "learning environment" and a "production environment." Build production-grade from the start: +- Use container orchestration +- Set up monitoring +- Implement logging +- Design for failure + +You'll learn better practices and won't need to rebuild later. + +### Use AI Assistants Aggressively +I use Claude to: +- Generate FastAPI applications +- Write Docker configs +- Create database schemas +- Debug issues +- Explain concepts I don't understand + +This isn't cheating - it's working smart. + +### Focus on Integration, Not Implementation +Your value isn't writing Python - it's designing systems that solve problems. Let AI handle syntax. You handle architecture. + +## The Threat Intelligence Platform: Deeper Dive + +Since this is the cool part, let me break down how the ML/intelligence piece works: + +### 1. Data Ingestion +Telegram bots monitor channels and capture: +- Message text +- Timestamp +- Sender info +- Channel metadata + +This streams into a message queue for processing. + +### 2. Multilingual Processing +Messages might be in Russian, English, French, or mixed. The NER pipeline: +- Detects language +- Applies appropriate NER model +- Extracts entities (organizations, people, locations, IPs, domains) + +**Why multilingual matters**: Hacktivists often operate in Russian or use mixed languages to avoid detection. + +### 3. Entity Extraction & Scoring +For each entity (like "Company X"), the system: +- Checks if it's French (geolocation + domain analysis) +- Counts mentions across time windows +- Analyzes sentiment and threat keywords +- Computes a "buzz score" + +High buzz score = something's happening. + +### 4. Threat Correlation +The system maps entities to: +- Known infrastructure (via OSINT tools like Shodan, VirusTotal) +- Historical attack patterns +- MITRE ATT&CK techniques + +This builds a threat graph showing relationships. + +### 5. Alerting +When patterns indicate elevated risk: +- Score exceeds threshold +- Multiple channels mention the same target +- Threat keywords appear in context + +→ Alert fires with supporting evidence. + +## OSINT Integration: Making It Smarter + +The Telegram bot has integrated OSINT capabilities: + +**IP Analysis**: Query VirusTotal, AbuseIPDB, Shodan for reputation and historical data + +**Domain Intelligence**: Passive DNS, WHOIS, certificate analysis + +**Blockchain Enrollment**: User verification via blockchain-based systems (for access control) + +**Framework Mapping**: Automatic MITRE ATT&CK technique identification + +This turns raw data into actionable intelligence. + +## What Would I Do Differently? + +**Start with better monitoring**: I added Prometheus/Grafana late. Wish I'd built it from day one. You can't debug what you can't see. + +**Document as you go**: I'm rebuilding some knowledge because I didn't document decisions. Write down WHY you chose something, not just WHAT. + +**Network design upfront**: I've had to refactor networking multiple times. Plan your subnets, VLANs, and firewall rules before deploying services. + +**Test disaster recovery early**: I built an amazing system... then realized I hadn't tested restoring from backups. Test your failure modes. + +## Resources That Actually Helped + +**For learning infrastructure**: +- The Phoenix Project (book) - changed how I think about systems +- XenServer documentation - surprisingly readable +- Docker Swarm docs - shorter than Kubernetes, easier to start + +**For threat intelligence**: +- MITRE ATT&CK framework - free, comprehensive +- MISP Project documentation - open source threat sharing +- TheHive Project - incident response platform + +**For practical skills**: +- Claude (obviously) - for code generation and explanation +- GitHub repos of similar projects - learn from real implementations +- YouTube channels on homelab setups + +## The Cost Reality + +**Hardware**: I started with older server hardware (~$500 used) +**Software**: $0 (all open source) +**Cloud**: I have some AWS infrastructure, but homelab is self-hosted +**Time**: Significant, but compressed by using AI assistance + +You can start smaller - a decent desktop or used server is enough. + +## Final Thoughts: You Can Do This + +The cybersecurity field sometimes feels gatekept by complexity and jargon. But here's the truth: **if you can describe a problem clearly, you can build a solution**. + +Six months ago: +- I didn't know what Docker Swarm was +- I'd never written a FastAPI app +- I couldn't explain what NER meant +- I'd never deployed a VM programmatically + +Today I run a production-grade threat intelligence platform. + +The difference isn't that I became a genius - it's that I: +1. Broke big problems into small steps +2. Used AI to handle implementation details +3. Focused on open source tools +4. Automated relentlessly +5. Built in public (even when it was messy) + +Your threat intelligence platform might look different than mine. Maybe you care about different threats, use different data sources, or have different infrastructure. That's perfect - build what matters to you. + +The tools are free. The knowledge is accessible. The AI assistants are ready to help. + +Start with one VM. Deploy one service. Automate one thing. + +Six months from now, you'll be writing your own "beginner's journey" post. + +--- + +**Next in this series**: I'll break down the technical architecture with code examples, Docker configs, and the actual Telegram bot implementation. But first, I want to hear from you: what part of this interests you most? + +*Hit me up on @Betty_Bombers_bot with questions, or follow along as I document the technical deep-dives.*