{"id":34694131,"url":"https://github.com/laugiov/scambuster-preview","last_synced_at":"2026-05-01T08:32:58.395Z","repository":{"id":329829163,"uuid":"1118222447","full_name":"laugiov/scambuster-preview","owner":"laugiov","description":"Defensive engagement \u0026 threat intelligence research laboratory. Converts inbound scam emails into actionable IOCs through controlled, policy-driven AI engagement. Multi-agent LLM architecture with adaptive strategy selection. Docs-only preview.","archived":false,"fork":false,"pushed_at":"2025-12-22T10:11:02.000Z","size":62,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-23T09:57:26.588Z","etag":null,"topics":["cybersecurity","email-security","fraud-prevention","honeypot","misp","multi-agent-llm","php","reinforcement-learning","siem","soar","soc","stix","symfony","threat-intelligence"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/laugiov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-17T12:43:13.000Z","updated_at":"2025-12-22T10:11:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/laugiov/scambuster-preview","commit_stats":null,"previous_names":["laugiov/scambuster-preview"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/laugiov/scambuster-preview","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fscambuster-preview","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fscambuster-preview/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fscambuster-preview/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fscambuster-preview/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/laugiov","download_url":"https://codeload.github.com/laugiov/scambuster-preview/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fscambuster-preview/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32490810,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cybersecurity","email-security","fraud-prevention","honeypot","misp","multi-agent-llm","php","reinforcement-learning","siem","soar","soc","stix","symfony","threat-intelligence"],"created_at":"2025-12-24T22:26:37.088Z","updated_at":"2026-05-01T08:32:58.389Z","avatar_url":"https://github.com/laugiov.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ScamBuster\n\n**A Defensive Engagement \u0026 Threat Intelligence Research Laboratory (Email-first)**\n\n![Status](https://img.shields.io/badge/status-private_preview-blue)\n![Stack](https://img.shields.io/badge/stack-PHP%208.3%20|%20Symfony%207%20|%20PostgreSQL%20|%20LLM-green)\n![Tests](https://img.shields.io/badge/tests-955%20passing-brightgreen)\n![License](https://img.shields.io/badge/license-code%20private%20|%20docs%20CC%20BY--NC--SA-lightgrey)\n\n\u003e **Last updated**: 2026-02-02 | **Data period**: December 2025 - February 2026\n\nScamBuster turns inbound scam emails into **actionable threat intelligence** through **controlled, policy-driven engagement**.\n\nThe project serves defensive security, fraud prevention, and applied research purposes (not offensive use). It extracts IOCs, maps campaigns, measures engagement effectiveness, and exports intelligence in STIX/MISP formats. All workflows are safety-gated, cost-aware, and fully auditable.\n\n\u003e This repository is a **public preview** (documentation only). Operational assets remain private to prevent misuse.\n\n---\n\n## The Problem: Email Scams Are High-Volume, and Mostly \"Invisible\" to Defenders\n\nEmail scams operate at massive scale. Most security programs are forced into a **block-and-forget** posture: the message is removed, but the attacker infrastructure, financial rails, and campaign signals remain largely unobserved. Industry estimates and sourced figures are documented in [Problem Statement](docs/01_problem_statement.md).\n\nThis creates a structural gap. There is little to no attribution across messages and campaigns, limited visibility into evolving TTPs and infrastructure reuse, and slow feedback loops on what actually works. Most organizations miss opportunities to generate intelligence from real-world interaction with threat actors.\n\nScamBuster explores this gap by converting scam emails into measurable threat intelligence, safely and at scale.\n\n---\n\n## ScamBuster: From Blocking to Understanding\n\nScamBuster is a **research laboratory** that transforms email scams into actionable intelligence through controlled AI engagement.\n\n### The Vision: A Scam Observatory\n\nInstead of discarding scam emails, ScamBuster creates an **observatory** that answers critical questions:\n\n| Question | ScamBuster Insight |\n|----------|-------------------|\n| **What scam types are trending?** | Real-time classification across 13 categories |\n| **Which personas maximize engagement?** | Adaptive learning identifies optimal strategies per scam type |\n| **What IOCs do scammers reveal?** | Automatic extraction of 34 indicator types |\n| **How do campaigns evolve?** | Clustering and attribution over time |\n| **What works against different scammers?** | Data-driven optimization, not intuition |\n\n### Three Research Dimensions\n\n```\n┌─────────────────────────────────────────────────────────────────────────┐\n│                    SCAMBUSTER RESEARCH LABORATORY                        │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                          │\n│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐       │\n│  │  CONVERSATIONAL  │  │   INTELLIGENCE   │  │    ADAPTIVE      │       │\n│  │    LABORATORY    │  │    EXTRACTION    │  │    LEARNING      │       │\n│  ├──────────────────┤  ├──────────────────┤  ├──────────────────┤       │\n│  │                  │  │                  │  │                  │       │\n│  │ Test which       │  │ Analyze how \u0026    │  │ Automatically    │       │\n│  │ personas work    │  │ when IOCs are    │  │ optimize         │       │\n│  │ best for each    │  │ revealed during  │  │ strategies via   │       │\n│  │ scam type        │  │ conversations    │  │ reinforcement    │       │\n│  │                  │  │                  │  │ learning         │       │\n│  └──────────────────┘  └──────────────────┘  └──────────────────┘       │\n│                                                                          │\n└─────────────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Pilot Results (February 2026)\n\n### Controlled Live Deployment (60 Days)\n\n| Metric | Value | Notes |\n|--------|-------|-------|\n| **Conversations** | +1K | Real scammers engaged |\n| **IOCs Extracted** | +20K | Emails, phones, IBANs, crypto wallets |\n| **IOC Precision** | 100% on audited sample (N=107) | vs 44% with regex-only baseline |\n| **System Uptime** | 60 days | Zero incidents, fully automated |\n| **Operational Cost** | €5.2 | Total LLM API cost |\n| **Cost per IOC** | €0.0002 | Negligible operational expense |\n\n\u003e **Metrics scope \u0026 definitions**\n\u003e\n\u003e Figures come from **controlled live deployment** (December 2025 - February 2026):\n\u003e - **60-day run**: Used for stability, scale, and ROI indicators (+1K conversations, +20K IOCs, 60 days uptime)\n\u003e - **Controlled validation run**: Used for precision analysis and campaign-level attribution\n\u003e\n\u003e **IOC precision (100%)** = no false positives in audited sample (precision = TP / (TP + FP), N=107 messages).\n\u003e Sample-based validation details are documented in [Evaluation Methodology](docs/05_evaluation_methodology.md).\n\n### Validation Summary\n\nAdaptive strategy selection was validated on 2,221 synthetic conversations with statistically significant results (p \u003c 0.001). Full methodology and statistical details are available in [Evaluation Methodology](docs/05_evaluation_methodology.md).\n\n### Key Discoveries\n\n**Strategy Performance Varies Significantly by Scam Type**\n\nThe adaptive system discovered that:\n- Optimal strategy differs significantly across scam categories\n- Human intuition about \"best\" approaches is often wrong\n- Data-driven selection outperforms random assignment\n\n**Campaign Attribution**\n\nFrom +1K conversations, identified **coordinated operations**:\n- Shared infrastructure (same IBANs across conversations)\n- Common TTPs (message templates, escalation patterns)\n- Geographic clustering (phone number prefixes)\n\n---\n\n## How It Works\n\n### Multi-Agent LLM Architecture\n\nFive specialized AI agents work in concert:\n\n| Agent | Role | Achievement |\n|-------|------|-------------|\n| **ScamClassifier** | Categorize incoming scams | 82% auto-classification, 13 types |\n| **IocExtractor** | Extract threat indicators | 100% precision on audited sample, 34 IOC types |\n| **Generator** | Create contextual responses | +35% IOCs post-IBAN detection |\n| **Validator** | Ensure safety \u0026 quality | 95% approval rate |\n| **Orchestrator** | Coordinate \u0026 optimize costs | \u003c€0.0002/message |\n\n### Adaptive Strategy Selection (Applied Research)\n\nScamBuster does not rely on a single fixed \"best\" conversational approach. Instead, it uses **adaptive strategy selection** to learn, per scam category, which safe persona/response patterns maximize **intelligence yield** under strict constraints.\n\nStrategies are selected based on scam type (BEC, lottery, romance, refund, etc.). The system optimizes for defensive signals such as indicators revealed, validated artifacts, and sustained interaction, while controlling cost and safety. Every response is gated by validation rules and policy checks before being sent. Performance is monitored over time, enabling data-driven iteration rather than intuition.\n\n| Aspect | Summary |\n|--------|---------|\n| Approach | Contextual bandit / adaptive experimentation |\n| Context | One policy per scam category (extensible) |\n| Strategy space | Persona \u0026 response patterns (kept private to prevent misuse) |\n| Objectives | Intelligence yield, safety compliance, and cost efficiency |\n\n---\n\n## Value for Stakeholders\n\n### For SOC/CERT Teams\n\n| Capability | Benefit |\n|------------|---------|\n| **Automated IOC feeds** | STIX 2.1 / MISP-compatible exports |\n| **Campaign attribution** | Link individual scams to organized operations |\n| **Early warning** | Identify emerging threats before they scale |\n| **Reduced analyst workload** | Automated extraction vs manual review |\n\n### For MSSPs\n\n| Capability | Benefit |\n|------------|---------|\n| **Differentiation** | Proactive TI service vs reactive blocking |\n| **Scalability** | One deployment serves multiple clients |\n| **ROI demonstration** | Quantifiable intelligence value |\n\n### For Financial Institutions\n\n| Capability | Benefit |\n|------------|---------|\n| **BEC detection** | Early identification of business email compromise |\n| **Account protection** | Report fraudulent accounts to consortium |\n| **Fraud prevention** | Intelligence on active money mule networks |\n\n### For Research\n\n| Capability | Benefit |\n|------------|---------|\n| **Reproducible methodology** | Published protocol for evaluation |\n| **Dataset** | Anonymized corpus (Feb 2026) |\n| **Collaboration** | Open platform for strategy experimentation |\n\n---\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [Problem Statement](docs/01_problem_statement.md) | The €12.5B scam problem in depth |\n| [Value Proposition](docs/02_value_proposition.md) | Technical differentiators and ROI |\n| [Architecture](docs/03_high_level_architecture.md) | High-level system design |\n| [Security \u0026 Ethics](docs/04_security_guardrails.md) | Defensive principles, GDPR, safety |\n| [Evaluation](docs/05_evaluation_methodology.md) | Metrics, validation, statistical methods |\n| [Roadmap](docs/06_roadmap.md) | Timeline and milestones |\n| [FAQ](docs/07_faq.md) | Common questions |\n\n---\n\n## What's NOT Included (Operational Security)\n\nTo prevent misuse by adversaries, this repository contains **documentation only**:\n\n- No engagement prompts or persona definitions\n- No automation workflows or scripts\n- No operational playbooks or tactics\n- No real conversation data or scammer identifiers\n- No API keys, secrets, or operational configurations\n- No information enabling offensive use or replication without governance\n\n---\n\n## Project Status\n\n| Phase | Status | Timeline |\n|-------|--------|----------|\n| **Phase 1**: Multi-agent LLM architecture | ✅ Complete | Oct-Nov 2025 |\n| **Phase 2**: Adaptive engagement (ε-greedy) | ✅ Complete | Nov-Dec 2025 |\n| **Phase 3**: Thompson Sampling V2 | ✅ Feature-complete (rollout in progress) | Dec 2025 |\n| **Phase 4**: Scale \u0026 Dashboards | 🔄 In Progress | Dec 2025 |\n| **Phase 5**: A/B Testing | 📅 Planned | Jan 2026 |\n| **Phase 6**: Publication \u0026 Dataset Release | 📅 Planned | Feb 2026 |\n\n\u003e **Status note**: \"Feature-complete\" means core functionality is implemented and tested. \"Rollout in progress\" means gradual activation in production is ongoing. See [Roadmap](docs/06_roadmap.md) for week-by-week detail.\n\n---\n\n## Request Access\n\n### Private Demo (45 min)\n\n**What you'll see:**\n- End-to-end flow (ingestion → engagement → extraction → export)\n- Live dashboard with convergence visualization\n- Sanitized sample outputs and IOC examples\n\n**What we need from you:**\n- Your role and organization context\n- Specific use case or evaluation criteria\n- Any compliance constraints (optional)\n\n\u003e **Eligibility**: Access is granted for defensive security, research, or fraud prevention purposes only. No access for offensive use, scam operations, or purposes that conflict with the project's ethical guidelines.\n\n**Operational boundaries:** The system only responds to scam emails already received and never initiates contact. There is no impersonation of real organizations, brands, or individuals (personas are synthetic role patterns, non-identifying). There is no unauthorized access, no hack-back, and no exploitation of scammer infrastructure.\n\n### Pilot Program\n\n**Evaluate in your environment:**\n- Time-boxed deployment (4-8 weeks typical)\n- Defined scope and success criteria\n- Security and compliance review available\n- Integration assessment with existing tools\n\n### Partnership Opportunities\n\n- **SOC/MSSP**: SIEM/SOAR integration pilots\n- **Research**: Dataset sharing, methodology validation\n- **Commercial**: Enterprise licensing discussions\n\n### Contact\n\n| | |\n|---|---|\n| **Project lead** | Laurent Giovannoni |\n| **LinkedIn** | [linkedin.com/in/giovannonilaurent](https://linkedin.com/in/giovannonilaurent) |\n| **Context** | E-MSc Cybersecurity, Master's Thesis |\n| **Demo request** | Open a [GitHub Issue](../../issues) (private requests welcome) |\n| **Security** | See [SECURITY.md](SECURITY.md) for responsible disclosure |\n\n---\n\n## Technology Stack\n\n| Layer | Technology |\n|-------|------------|\n| **Backend** | PHP 8.3, Symfony 7, DDD architecture |\n| **Database** | PostgreSQL, Redis |\n| **LLM** | OpenAI API (GPT-4o-mini) |\n| **Orchestration** | n8n workflow automation |\n| **Infrastructure** | Docker, GitLab CI |\n| **Security** | Industry-standard encryption, secrets management |\n\n---\n\n## Academic Context\n\n### Research Contributions\n\n1. **Methodological**: Reproducible protocol for adaptive honeypot evaluation\n2. **Technical**: Multi-agent LLM with double validation (95% approval vs 60-70% baseline)\n3. **Scientific**: Empirically validated adaptive engagement (p \u003c 0.001, N=2,221)\n4. **Practical**: Demonstrated efficiency at pilot scale (€5.2 for +20K IOCs)\n\n### Citation\n\n```bibtex\n@master{giovannoni2025scambuster,\n  author = {Giovannoni, Laurent},\n  title = {ScamBuster: Adaptive Controlled Engagement via Multi-Armed Bandits\n           for Automated Threat Intelligence Extraction},\n  school = {E-MSc Cybersecurity},\n  year = {2025}\n}\n```\n\n---\n\n## License\n\n- **Documentation**: CC BY-NC-SA 4.0\n- **Code**: Private (commercial/research license available)\n- **Dataset**: CC BY-NC-SA 4.0 (anonymized, February 2026)\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"docs/01_problem_statement.md\"\u003eLearn More\u003c/a\u003e •\n  \u003ca href=\"#request-access\"\u003eRequest Demo\u003c/a\u003e •\n  \u003ca href=\"docs/06_roadmap.md\"\u003eView Roadmap\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaugiov%2Fscambuster-preview","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaugiov%2Fscambuster-preview","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaugiov%2Fscambuster-preview/lists"}