301 lines
12 KiB
Markdown
301 lines
12 KiB
Markdown
---
|
|
title: "Building a Homelab Threat Intelligence Platform with ML: A Beginner's Journey"
|
|
date: 2026-02-24T23:58:00+00:00
|
|
draft: false
|
|
tags: ["threat-intelligence", "cybersecurity", "machine-learning", "osint", "telegram", "homelab", "docker-swarm", "nlp", "ner", "mitre-attack", "infosec", "docker", "devops", "gitops", "selfhosted", "opensource", "build-in-public", "french-tech", "ddos", "soc", "blue-team"]
|
|
summary: "How I built, from scratch, a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels."
|
|
description: "A beginner's journey building a CTI platform with multilingual NER, Telegram OSINT bots and MITRE ATT&CK integration on a self-hosted Docker Swarm homelab."
|
|
author: "Bojemoi"
|
|
ShowToc: true
|
|
ShowReadingTime: true
|
|
---
|
|
|
|
Six months ago, I knew almost nothing about cybersecurity infrastructure. Today, I run a production-grade threat intelligence platform that predicts DDoS attacks by monitoring hacktivist Telegram channels. If you're reading this thinking "that sounds impossible for someone starting out" - I get it. I thought the same thing.
|
|
|
|
This post is for you: the curious beginner who wants to build real security tools but doesn't know where to start.
|
|
|
|
## What Actually Is This Thing?
|
|
|
|
Let me start with what my system does in plain English:
|
|
|
|
**The Problem**: Hacktivist groups announce their attack targets on Telegram before launching DDoS attacks. French organizations often become targets, but there's no automated way to detect these threats early.
|
|
|
|
**My Solution**: A system that:
|
|
1. Monitors Telegram channels where hacktivists hang out
|
|
2. Reads messages in multiple languages (because hacktivists don't all speak English)
|
|
3. Identifies when organizations are mentioned
|
|
4. Scores how "buzzy" or threatening the discussion is
|
|
5. Alerts me when a French organization might be under threat
|
|
|
|
Think of it like having a robot that reads threatening forums 24/7 and taps you on the shoulder when trouble's brewing.
|
|
|
|
## Why You Can Build This (Even as a Beginner)
|
|
|
|
Here's my controversial take: **you don't need to be a coding expert to build sophisticated security systems anymore**.
|
|
|
|
My approach relies on three principles:
|
|
|
|
### 1. Let AI Write Your Code
|
|
I don't write much code by hand. Instead, I manage prompts and structures. I tell Claude (or similar AI) what I want to build, and it generates the implementation. My job is architecture and integration, not syntax.
|
|
|
|
**This means**: If you can describe what you want clearly, you can build it.
|
|
|
|
### 2. Open Source Everything
|
|
Every tool I use is free and open source:
|
|
- XenServer for virtualization
|
|
- Docker Swarm for container orchestration
|
|
- PostgreSQL for databases
|
|
- Gitea for Git workflows
|
|
- Prometheus/Grafana for monitoring
|
|
|
|
**This means**: Zero licensing costs, full control, and a massive community for support.
|
|
|
|
### 3. Automate Everything
|
|
Manual processes don't scale and you'll forget them. My infrastructure uses:
|
|
- GitOps workflows (push config → automatic deployment)
|
|
- Cloud-init for VM provisioning
|
|
- Webhook automation for CI/CD
|
|
- Container orchestration for self-healing
|
|
|
|
**This means**: Once set up, the system runs itself. You maintain infrastructure as code, not by clicking through interfaces.
|
|
|
|
## The Journey: What I (and Claude) Actually Built
|
|
|
|
Let me walk you through the evolution, because it wasn't linear:
|
|
|
|
### Phase 1: The Foundation (Months 0-2)
|
|
**Started with**: Basic XenServer setup, learning virtualization concepts
|
|
|
|
**What I learned**: You need a solid base layer before anything fancy. I spent time understanding:
|
|
- Network bonding for high availability
|
|
- Storage management
|
|
- VM templates and provisioning
|
|
|
|
**Beginner trap I avoided**: Trying to do everything at once. Master one layer before adding the next.
|
|
|
|
### Phase 2: Container Orchestration (Months 2-3)
|
|
**Built**: Docker Swarm cluster with automatic deployment
|
|
|
|
**What this gives you**: Deploy new services in seconds, not hours. Services restart automatically if they crash.
|
|
|
|
**The "aha" moment**: When I pushed code to Git and watched it automatically deploy to production without touching a terminal. That's when it clicked.
|
|
|
|
### Phase 3: GitOps Workflow (Months 3-4)
|
|
**Built**: Gitea integration with custom cloud-init datasources that query PostgreSQL
|
|
|
|
**What this enables**:
|
|
- Write a YAML config describing a VM
|
|
- Push it to Git
|
|
- VM automatically gets created and configured
|
|
- All infrastructure becomes reproducible
|
|
|
|
**Why this matters**: Your homelab becomes code. Disaster recovery is just re-running your Git repo.
|
|
|
|
### Phase 4: Threat Intelligence (Months 4-6)
|
|
**Built**: The actual threat intelligence platform with:
|
|
- Telegram bot monitoring with OSINT capabilities
|
|
- Multilingual Named Entity Recognition (NER)
|
|
- Entity extraction and relationship mapping
|
|
- Buzz scoring algorithms
|
|
- Integration with Maltego, TheHive, MISP
|
|
- MITRE ATT&CK framework mapping
|
|
|
|
**The breakthrough**: Realizing threat intelligence is pattern matching at scale. ML helps, but smart architecture and data streaming matter more.
|
|
|
|
## The Architecture (Simplified)
|
|
|
|
Here's how the pieces fit together without overwhelming you:
|
|
|
|
```
|
|
[Telegram Channels]
|
|
↓ (messages stream in)
|
|
[Telegram Bot Infrastructure]
|
|
↓ (raw text + metadata)
|
|
[Multilingual NER Processing]
|
|
↓ (identified entities)
|
|
[Entity Extraction & Scoring]
|
|
↓ (threat scores)
|
|
[PostgreSQL Database]
|
|
↓ (queries for analysis)
|
|
[Alert System] → [Me!]
|
|
```
|
|
|
|
Each box is a microservice in Docker. They communicate through message queues and APIs. If one crashes, the others keep running.
|
|
|
|
## Key Technologies Explained (For Beginners)
|
|
|
|
**XenServer**: Think of it like having multiple computers inside one physical computer. Each "virtual machine" acts independently.
|
|
|
|
**Docker Swarm**: Manages containers (lightweight mini-environments). If you deploy 5 containers, Swarm spreads them across your servers and restarts them if they die.
|
|
|
|
**PostgreSQL**: A database. It stores all the structured data (entities, threat scores, relationships).
|
|
|
|
**Gitea**: Like GitHub, but you host it yourself. Your code and configs live here.
|
|
|
|
**Cloud-init**: Automates VM setup. Instead of clicking through installers, you describe what you want in a file.
|
|
|
|
**NER (Named Entity Recognition)**: ML that finds entities in text. It spots "Microsoft" in a message and knows it's an organization, not just a word.
|
|
|
|
## Practical Tips If You're Starting Out
|
|
|
|
### Start Small, Think Big
|
|
Don't try to build everything at once. I started with:
|
|
1. One VM running Docker
|
|
2. One simple service (a Telegram bot)
|
|
3. One database
|
|
4. Then gradually added orchestration, monitoring, automation
|
|
|
|
### Embrace Configuration Over Code
|
|
Write YAML configs that describe what you want. Let tools like Docker Compose and cloud-init handle the implementation.
|
|
|
|
### Build in Production from Day One
|
|
Don't have a "learning environment" and a "production environment." Build production-grade from the start:
|
|
- Use container orchestration
|
|
- Set up monitoring
|
|
- Implement logging
|
|
- Design for failure
|
|
|
|
You'll learn better practices and won't need to rebuild later.
|
|
|
|
### Use AI Assistants Aggressively
|
|
I use Claude to:
|
|
- Generate FastAPI applications
|
|
- Write Docker configs
|
|
- Create database schemas
|
|
- Debug issues
|
|
- Explain concepts I don't understand
|
|
|
|
This isn't cheating - it's working smart.
|
|
|
|
### Focus on Integration, Not Implementation
|
|
Your value isn't writing Python - it's designing systems that solve problems. Let AI handle syntax. You handle architecture.
|
|
|
|
## The Threat Intelligence Platform: Deeper Dive
|
|
|
|
Since this is the cool part, let me break down how the ML/intelligence piece works:
|
|
|
|
### 1. Data Ingestion
|
|
Telegram bots monitor channels and capture:
|
|
- Message text
|
|
- Timestamp
|
|
- Sender info
|
|
- Channel metadata
|
|
|
|
This streams into a message queue for processing.
|
|
|
|
### 2. Multilingual Processing
|
|
Messages might be in Russian, English, French, or mixed. The NER pipeline:
|
|
- Detects language
|
|
- Applies appropriate NER model
|
|
- Extracts entities (organizations, people, locations, IPs, domains)
|
|
|
|
**Why multilingual matters**: Hacktivists often operate in Russian or use mixed languages to avoid detection.
|
|
|
|
### 3. Entity Extraction & Scoring
|
|
For each entity (like "Company X"), the system:
|
|
- Checks if it's French (geolocation + domain analysis)
|
|
- Counts mentions across time windows
|
|
- Analyzes sentiment and threat keywords
|
|
- Computes a "buzz score"
|
|
|
|
High buzz score = something's happening.
|
|
|
|
### 4. Threat Correlation
|
|
The system maps entities to:
|
|
- Known infrastructure (via OSINT tools like Shodan, VirusTotal)
|
|
- Historical attack patterns
|
|
- MITRE ATT&CK techniques
|
|
|
|
This builds a threat graph showing relationships.
|
|
|
|
### 5. Alerting
|
|
When patterns indicate elevated risk:
|
|
- Score exceeds threshold
|
|
- Multiple channels mention the same target
|
|
- Threat keywords appear in context
|
|
|
|
→ Alert fires with supporting evidence.
|
|
|
|
## OSINT Integration: Making It Smarter
|
|
|
|
The Telegram bot has integrated OSINT capabilities:
|
|
|
|
**IP Analysis**: Query VirusTotal, AbuseIPDB, Shodan for reputation and historical data
|
|
|
|
**Domain Intelligence**: Passive DNS, WHOIS, certificate analysis
|
|
|
|
**Blockchain Enrollment**: User verification via blockchain-based systems (for access control)
|
|
|
|
**Framework Mapping**: Automatic MITRE ATT&CK technique identification
|
|
|
|
This turns raw data into actionable intelligence.
|
|
|
|
## What Would I Do Differently?
|
|
|
|
**Start with better monitoring**: I added Prometheus/Grafana late. Wish I'd built it from day one. You can't debug what you can't see.
|
|
|
|
**Document as you go**: I'm rebuilding some knowledge because I didn't document decisions. Write down WHY you chose something, not just WHAT.
|
|
|
|
**Network design upfront**: I've had to refactor networking multiple times. Plan your subnets, VLANs, and firewall rules before deploying services.
|
|
|
|
**Test disaster recovery early**: I built an amazing system... then realized I hadn't tested restoring from backups. Test your failure modes.
|
|
|
|
## Resources That Actually Helped
|
|
|
|
**For learning infrastructure**:
|
|
- The Phoenix Project (book) - changed how I think about systems
|
|
- XenServer documentation - surprisingly readable
|
|
- Docker Swarm docs - shorter than Kubernetes, easier to start
|
|
|
|
**For threat intelligence**:
|
|
- MITRE ATT&CK framework - free, comprehensive
|
|
- MISP Project documentation - open source threat sharing
|
|
- TheHive Project - incident response platform
|
|
|
|
**For practical skills**:
|
|
- Claude (obviously) - for code generation and explanation
|
|
- GitHub repos of similar projects - learn from real implementations
|
|
- YouTube channels on homelab setups
|
|
|
|
## The Cost Reality
|
|
|
|
**Hardware**: I started with older server hardware (~$500 used)
|
|
**Software**: $0 (all open source)
|
|
**Cloud**: I have some AWS infrastructure, but homelab is self-hosted
|
|
**Time**: Significant, but compressed by using AI assistance
|
|
|
|
You can start smaller - a decent desktop or used server is enough.
|
|
|
|
## Final Thoughts: You Can Do This
|
|
|
|
The cybersecurity field sometimes feels gatekept by complexity and jargon. But here's the truth: **if you can describe a problem clearly, you can build a solution**.
|
|
|
|
Six months ago:
|
|
- I didn't know what Docker Swarm was
|
|
- I'd never written a FastAPI app
|
|
- I couldn't explain what NER meant
|
|
- I'd never deployed a VM programmatically
|
|
|
|
Today I run a production-grade threat intelligence platform.
|
|
|
|
The difference isn't that I became a genius - it's that I:
|
|
1. Broke big problems into small steps
|
|
2. Used AI to handle implementation details
|
|
3. Focused on open source tools
|
|
4. Automated relentlessly
|
|
5. Built in public (even when it was messy)
|
|
|
|
Your threat intelligence platform might look different than mine. Maybe you care about different threats, use different data sources, or have different infrastructure. That's perfect - build what matters to you.
|
|
|
|
The tools are free. The knowledge is accessible. The AI assistants are ready to help.
|
|
|
|
Start with one VM. Deploy one service. Automate one thing.
|
|
|
|
Six months from now, you'll be writing your own "beginner's journey" post.
|
|
|
|
---
|
|
|
|
**Next in this series**: I'll break down the technical architecture with code examples, Docker configs, and the actual Telegram bot implementation. But first, I want to hear from you: what part of this interests you most?
|
|
|
|
*Hit me up on @Betty_Bombers_bot with questions, or follow along as I document the technical deep-dives.*
|