{"id":48835154,"url":"https://github.com/rolandpg/zettelforge","last_synced_at":"2026-05-26T06:01:34.805Z","repository":{"id":350904021,"uuid":"1203202149","full_name":"rolandpg/zettelforge","owner":"rolandpg","description":"Agentic memory for CTI in Python — STIX knowledge graphs, threat-actor alias resolution, offline-first RAG, MCP server for Claude Code and LangChain agents","archived":false,"fork":false,"pushed_at":"2026-05-26T00:54:25.000Z","size":22785,"stargazers_count":39,"open_issues_count":13,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-05-26T02:32:48.113Z","etag":null,"topics":["agentic-memory","ai-agent","claude-code","cti","cybersecurity","knowledge-graph","langchain","llm","llm-memory","mcp","mcp-server","mitre-attack","offline-first","python","rag","soc-automation","stix","threat-hunting","threat-intelligence","zettelkasten"],"latest_commit_sha":null,"homepage":"https://docs.threatrecall.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rolandpg.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":".github/SECURITY.md","support":null,"governance":"GOVERNANCE.md","roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-06T20:26:58.000Z","updated_at":"2026-05-26T00:54:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rolandpg/zettelforge","commit_stats":null,"previous_names":["rolandpg/zettelforge"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/rolandpg/zettelforge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rolandpg%2Fzettelforge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rolandpg%2Fzettelforge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rolandpg%2Fzettelforge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rolandpg%2Fzettelforge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rolandpg","download_url":"https://codeload.github.com/rolandpg/zettelforge/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rolandpg%2Fzettelforge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33506510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T03:12:49.672Z","status":"ssl_error","status_checked_at":"2026-05-26T03:12:47.976Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-memory","ai-agent","claude-code","cti","cybersecurity","knowledge-graph","langchain","llm","llm-memory","mcp","mcp-server","mitre-attack","offline-first","python","rag","soc-automation","stix","threat-hunting","threat-intelligence","zettelkasten"],"created_at":"2026-04-14T23:02:17.493Z","updated_at":"2026-05-26T06:01:34.797Z","avatar_url":"https://github.com/rolandpg.png","language":"Python","funding_links":[],"categories":["Concepts \u0026 Frameworks","Tools","Using AI for Pentesting"],"sub_categories":["Monitoring/Scanning"],"readme":"# ZettelForge\n\n\u003c!-- mcp-name: io.github.rolandpg/zettelforge --\u003e\n\n**The only agentic memory system built for cyber threat intelligence.**\n\nWhen a senior analyst leaves, two or three years of context walks out with them — customer environments, prior investigations, actor TTPs, false-positive patterns, every hard-won \"wait, we've seen this before.\" ZettelForge is an agentic memory system built so that context stays with the team.\n\nIt extracts CVEs, threat actors, IOCs, and ATT\u0026CK techniques from analyst notes and threat reports, resolves aliases (APT28 = Fancy Bear = STRONTIUM = Sofacy), builds a STIX 2.1 knowledge graph, and serves every past investigation back to your analysts — and to Claude Code via MCP — in natural language. Runs entirely in-process. No API keys. No cloud. No data leaves the host.\n\n[![PyPI](https://img.shields.io/pypi/v/zettelforge)](https://pypi.org/project/zettelforge/)\n[![Downloads/month](https://static.pepy.tech/personalized-badge/zettelforge?period=month\u0026units=international_system\u0026left_color=grey\u0026right_color=blue\u0026left_text=downloads%2Fmonth)](https://pepy.tech/projects/zettelforge)\n[![Star History](https://api.star-history.com/svg?repos=rolandpg/zettelforge\u0026type=Date)](https://star-history.com/#rolandpg/zettelforge\u0026Date)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT)\n[![CI](https://github.com/rolandpg/zettelforge/actions/workflows/ci.yml/badge.svg)](https://github.com/rolandpg/zettelforge/actions)\n[![Open Issues](https://img.shields.io/github/issues/rolandpg/zettelforge?color=blue)](https://github.com/rolandpg/zettelforge/issues)\n\n**[Star](https://github.com/rolandpg/zettelforge) · [`pip install zettelforge`](https://pypi.org/project/zettelforge/) · [Docs](https://docs.threatrecall.ai/) · [ThreatRecall (hosted)](https://threatrecall.ai) · [Changelog](CHANGELOG.md)**\n\n\u003e **v2.6.2** (2026-04-27): Config web editor ships with working dropdowns for all enum fields (LLM/embedding provider, log level, PII action, synthesis format) and a working Apply button. New `[crewai]` extra exposes ZettelForge as CrewAI tools -- `pip install zettelforge[crewai]`. [Full changelog](CHANGELOG.md)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/rolandpg/zettelforge/master/docs/assets/demo.gif\" width=\"720\" alt=\"ZettelForge demo -- CTI agentic memory in action\"\u003e\n\u003c/p\u003e\n\n\u003e If ZettelForge fits a CTI workflow you run, a star is the fastest signal that this category is worth continuing to invest in.\n\n## The problem\n\nEvery SOC loses analysts. When they leave, investigation context, actor attribution, and environment-specific false-positive patterns go with them. Their replacements re-open the same tickets, re-read the same reports, and re-build the same mental models from scratch.\n\nGeneral-purpose AI memory systems don't fix this for security teams. They can't tell APT28 from Fancy Bear, don't know that CVE-2024-3094 is the XZ Utils backdoor, can't parse Sigma or YARA, and have no concept of MITRE ATT\u0026CK technique IDs. When a CTI analyst gives them a year of intel reports, they get back fuzzy semantic search over chat history.\n\nZettelForge was built for analysts who think in threat graphs. It extracts CVEs, threat actors, IOCs, and ATT\u0026CK techniques automatically, resolves aliases across naming conventions, builds a knowledge graph with causal relationships, and retrieves memories using intent-aware blended search -- all in-process, with no external API dependency.\n\n\u003e Memory augmentation closes 33% of the gap between small and large models on CTI tasks ([CTI-REALM, Microsoft 2026](https://www.microsoft.com/en-us/security/blog/2026/03/20/cti-realm-a-new-benchmark-for-end-to-end-detection-rule-generation-with-ai-agents/), using GPT-4 as the large-model baseline). See [full benchmark report](benchmarks/BENCHMARK_REPORT.md) for methodology and comparisons.\n\n| Capability | ZettelForge | Mem0 | Graphiti | Cognee |\n|---|---|---|---|---|\n| CTI entity extraction (CVEs, actors, IOCs) | Yes | No | No | No |\n| STIX 2.1 ontology | Yes | No | No | No |\n| Threat actor alias resolution | Yes (APT28 = Fancy Bear) | No | No | No |\n| Knowledge graph with causal triples | Yes | No | Yes | Yes |\n| Intent-classified retrieval (5 types) | Yes | No | No | No |\n| In-process / no external API required | Yes | No | No | No |\n| Audit logs in OCSF schema | Yes | No | No | No |\n| MCP server (Claude Code) | Yes | No | No | No |\n\n## Data Pipeline\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/rolandpg/zettelforge/master/docs/assets/zettelforge_architecture.svg\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://raw.githubusercontent.com/rolandpg/zettelforge/master/docs/assets/zettelforge_architecture-light.svg\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/rolandpg/zettelforge/master/docs/assets/zettelforge_architecture.svg\" width=\"720\" alt=\"ZettelForge architecture -- neural recall loop: ingest, enrich, retrieve, synthesize, backed by SQLite + LanceDB\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\n## Features\n\n**Entity Extraction** -- Automatically identifies CVEs, threat actors, IOCs (IPs, domains, hashes, URLs, emails), MITRE ATT\u0026CK techniques, campaigns, intrusion sets, tools, people, locations, and organizations. Regex + LLM NER with STIX 2.1 types throughout.\n\n**Knowledge Graph** -- Entities become nodes, co-occurrence becomes edges. LLM infers causal triples (\"APT28 *uses* Cobalt Strike\"). Temporal edges and supersession track how intelligence evolves.\n\n**Alias Resolution** -- APT28, Fancy Bear, Sofacy, STRONTIUM all resolve to the same actor node. Works automatically on store and recall.\n\n**Blended Retrieval** -- Vector similarity (768-dim fastembed, ONNX) + graph traversal (BFS over knowledge graph edges), weighted by intent classification. Five intent types: factual, temporal, relational, exploratory, causal.\n\n**Memory Evolution** -- With `evolve=True`, new intel is compared to existing memory. LLM decides ADD, UPDATE, DELETE, or NOOP. Stale intel gets superseded. Contradictions get resolved. Duplicates get skipped.\n\n**RAG Synthesis** -- Synthesize answers across all stored memories with `direct_answer` format.\n\n**In-process by architecture** -- fastembed (ONNX) for embeddings, llama-cpp-python for optional local LLM inference, SQLite + LanceDB for storage, and Ollama on localhost by default. No external API keys are required. Outbound network access may occur on first run when embedding/LLM models are downloaded; after models are preloaded, it can run fully offline (including on air-gapped hosts).\n\n**Audit logging in OCSF schema** -- Every operation emits a structured event in the Open Cybersecurity Schema Framework format. What you do with the log stream (SIEM, WORM store, nothing) is up to you.\n\n## Quick Start\n\n### 30-second hello world (no LLM required)\n\n```bash\npip install zettelforge\n```\n\n```python\nfrom zettelforge import MemoryManager\n\nmm = MemoryManager()\n\n# Store CTI -- entities (CVEs, actors, ATT\u0026CK IDs, IOCs) extracted via regex\nmm.remember(\"APT28 uses Cobalt Strike for lateral movement via T1021\")\nmm.remember(\"APT28 (Fancy Bear) targets NATO defense contractors with spear-phishing\")\nmm.remember(\"CVE-2024-3094 is the XZ Utils backdoor (CVSS 10.0) affecting sshd\")\n\n# Recall blends vector + graph search; alias resolution kicks in (Fancy Bear -\u003e APT28)\nfor note in mm.recall(\"What tools does Fancy Bear use?\", k=3):\n    print(f\"[{note.metadata.tier}] {note.content.raw}\")\n```\n\nThat works on a fresh `pip install` with no external services. Embeddings run in-process via fastembed (~80MB ONNX model downloaded on first call). `MemoryManager()` writes to `~/.amem/` by default; override with `ZETTELFORGE_DATA_DIR` or via config. A runnable copy lives at [`examples/quickstart.py`](examples/quickstart.py).\n\n### Add an LLM for synthesis and richer extraction\n\n```bash\nollama pull qwen3.5:9b \u0026\u0026 ollama serve\n```\n\n```python\n# With Ollama running, synthesize() returns a real summary across stored notes\nanswer = mm.synthesize(\"Summarize known APT28 TTPs\")\nprint(answer[\"synthesis\"][\"answer\"])\n# Background LLM NER also enriches stored notes with additional entities\n```\n\nZettelForge auto-detects Ollama. To use a different provider (`local` llama-cpp, `litellm` for 100+ providers, `mock` for tests), see [Configuration](#configuration). Without an LLM, `synthesize()` still returns a structured response but the `answer` field is a fallback placeholder -- only `remember` and `recall` produce useful results in pip-only mode.\n\n### Memory Evolution\n\n```python\n# New intel arrives -- evolve=True enables memory evolution:\n# LLM extracts facts, compares to existing notes, decides ADD/UPDATE/DELETE/NOOP\nmm.remember(\n    \"APT28 has shifted tactics. They dropped DROPBEAR and now exploit edge devices.\",\n    domain=\"cti\",\n    evolve=True,   # existing APT28 note gets superseded, not duplicated\n)\n```\n\n## How It Works\n\nEvery `remember()` call triggers a pipeline:\n\n1. **Entity Extraction** -- regex + LLM NER identifies CVEs, intrusion sets, threat actors, tools, campaigns, ATT\u0026CK techniques, IOCs (IPv4, domain, URL, MD5/SHA1/SHA256, email), people, locations, organizations, events, activities, and temporal references (19 types)\n2. **Knowledge Graph Update** -- entities become nodes, co-occurrence becomes edges, LLM infers causal triples\n3. **Vector Embedding** -- 768-dim fastembed (ONNX, in-process, 7ms/embed) stored in LanceDB\n4. **Supersession Check** -- entity overlap detection marks stale notes as superseded\n5. **Dual-Stream Write** -- fast path returns in ~45ms; causal enrichment is deferred to a background worker\n\nEvery `recall()` call blends two retrieval strategies:\n\n1. **Vector similarity** -- semantic search over embeddings\n2. **Graph traversal** -- BFS over knowledge graph edges, scored by hop distance\n3. **Intent routing** -- query classified as factual/temporal/relational/causal/exploratory, weights adjusted per type\n4. **Cross-encoder reranking** -- ms-marco-MiniLM reorders final results by relevance\n\n## Use ZettelForge in Claude Desktop in 60 seconds\n\n```bash\npip install zettelforge\n```\n\nCreate or edit `.claude.json` in your project root (or `~/.claude/.claude.json` for global access):\n\n```json\n{\n  \"mcpServers\": {\n    \"zettelforge\": {\n      \"command\": \"python3\",\n      \"args\": [\"-m\", \"zettelforge.mcp\"]\n    }\n  }\n}\n```\n\nIf ZettelForge is installed in a virtual environment, use the full path to that Python interpreter:\n\n```json\n{\n  \"mcpServers\": {\n    \"zettelforge\": {\n      \"command\": \"/home/user/.venvs/zettelforge/bin/python\",\n      \"args\": [\"-m\", \"zettelforge.mcp\"]\n    }\n  }\n}\n```\n\nStart Claude Code and verify the tools are available:\n\n```bash\nclaude\n# Inside the session, ask: \"What tools do you have available from zettelforge?\"\n```\n\nSeven tools are exposed: `zettelforge_remember`, `zettelforge_recall`, `zettelforge_synthesize`, `zettelforge_entity`, `zettelforge_graph`, `zettelforge_stats`, and `zettelforge_sync` (requires enterprise package). See the [MCP protocol reference](docs/reference/mcp-protocol.md) for full schemas, JSON-RPC request/response examples, error codes, and the lazy-singleton lifecycle. For troubleshooting, virtualenv paths, and manual tool testing, see [set-up-mcp-server](docs/how-to/set-up-mcp-server.md).\n\n## Benchmarks\n\nEvaluated against published academic benchmarks:\n\n| Benchmark | What it measures | Score |\n|---|---|---|\n| **CTI Retrieval** (CTIBench subset) | Attribution, CVE linkage, multi-hop | **75.0%** |\n| **RAGAS** | Retrieval quality (keyword presence) | **78.1%** |\n| **LOCOMO** (ACL 2024) | Conversational memory recall | **22.0%** |\n\nThe **Score** column reports ZettelForge measurements run with Ollama-hosted models, with one exception: the LOCOMO row was re-measured at v2.1.1 using an Ollama cloud judge for evaluation grading (not local generation). See the [full benchmark report](benchmarks/BENCHMARK_REPORT.md) for benchmark-specific methodology, version history, and per-suite judge configuration.\n\n## Detection Rules as Memory (Sigma + YARA)\n\nSigma and YARA rules are first-class memory primitives. Parse, validate, and ingest a rule and its tags become graph edges: MITRE ATT\u0026CK techniques, CVEs, threat-actor aliases, tools, and malware families resolve against the same ontology as every other note. A shared `DetectionRule` supertype carries `SigmaRule` and `YaraRule` subtypes, so a single rule UUID is addressable across both formats.\n\nSigma rules are validated against the vendored [SigmaHQ JSON schema](https://github.com/SigmaHQ/sigma-specification). YARA rules are parsed with plyara and checked against the [CCCS YARA metadata standard](https://github.com/CybercentreCanada/CCCS-Yara) (tiers: `strict`, `warn`, `non_cccs`). Ingest is idempotent -- re-ingesting an unchanged rule returns the original note via a content-hashed `source_ref`.\n\n```python\nfrom zettelforge import MemoryManager\nfrom zettelforge.sigma import ingest_rule as ingest_sigma\nfrom zettelforge.yara import ingest_rule as ingest_yara\n\nmm = MemoryManager()\ningest_sigma(\"rules/proc_creation_win_office_macro.yml\", mm)\ningest_yara(\"rules/webshell_china_chopper.yar\", mm, tier=\"warn\")\n```\n\n```bash\n# Bulk ingest from SigmaHQ or a private rule repo\npython -m zettelforge.sigma.ingest /path/to/sigma/rules/\npython -m zettelforge.yara.ingest /path/to/yara/rules/ --tier warn\n\n# CI fixture check -- parse + validate, no writes\npython -m zettelforge.sigma.ingest rules/ --dry-run\n```\n\nAn LLM rule explainer (`zettelforge.detection.explainer.explain`) produces a structured JSON summary -- intent, key fields, evasion notes, false-positive hypotheses -- for any `DetectionRule`. It runs synchronously on demand in v1; async enrichment-queue wiring is v1.1. Rate-limited via `ZETTELFORGE_EXPLAIN_RPM` (default 60 calls/minute).\n\nReferences: [Sigma spec](https://github.com/SigmaHQ/sigma-specification), [SigmaHQ rules](https://github.com/SigmaHQ/sigma), [CCCS YARA](https://github.com/CybercentreCanada/CCCS-Yara), [YARA docs](https://yara.readthedocs.io).\n\n## Integrations\n\n### ATHF (Agentic Threat Hunting Framework)\n\nIngest completed [ATHF](https://github.com/Nebulock-Inc/agentic-threat-hunting-framework) hunts into ZettelForge memory. MITRE techniques and IOCs are extracted and linked in the knowledge graph.\n\n```bash\npython examples/athf_bridge.py /path/to/hunts/\n# 12 hunt(s) parsed\n# Ingested 12/12 hunts into ZettelForge\n```\n\nSee [examples/athf_bridge.py](examples/athf_bridge.py).\n\n\n## ThreatRecall (Hosted)\n\n[ThreatRecall](https://threatrecall.ai) is the commercial distribution of ZettelForge with enterprise extensions enabled. It is offered as managed SaaS by default, with optional self-hosted on-prem and air-gapped deployments for classified environments. Enterprise add-ons:\n\n- **TypeDB STIX 2.1 backend** -- schema-enforced ontology with inference rules\n- **OpenCTI sync** -- bi-directional sync with your OpenCTI instance\n- **Multi-tenant auth** -- OAuth/JWT with per-tenant data isolation\n- **Sigma rule generation** -- detection rules from extracted IOCs (upcoming)\n\nSaaS deploys in minutes with no infrastructure to maintain. Self-hosted ships as a deployable bundle for environments where outbound network egress is restricted or prohibited.\n\n**[Join the waitlist](https://threatrecall.ai)** -- currently onboarding design partners.\n\n## Configuration\n\n| Variable | Default | Description |\n|---|---|---|\n| `AMEM_DATA_DIR` | `~/.amem` | Data directory |\n| `ZETTELFORGE_BACKEND` | `sqlite` | SQLite community backend. TypeDB available via extension. |\n| `ZETTELFORGE_LLM_PROVIDER` | `local` | `local` (llama-cpp) or `ollama` |\n\nSee [config.default.yaml](config.default.yaml) for all options.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup.\n\n## License\n\nMIT -- See [LICENSE](LICENSE).\n\nBuilt by **Patrick Roland** -- [LinkedIn](https://www.linkedin.com/in/patrickgroland/) | Director of SOC Services, Summit 7 Systems | Navy nuclear veteran | CISSP, CCP (CMMC 2.0 Professional)\n\n## Support the Project\n\nZettelForge is MIT-licensed. Star the repo, open issues, and submit PRs — all contributions are welcome.\n\n## Acknowledgments\n\n- Inspired by [Zettelkasten](https://en.wikipedia.org/wiki/Zettelkasten) and [A-Mem](https://arxiv.org/abs/2602.10715) (NeurIPS 2025)\n- Two-phase pipeline inspired by [Mem0](https://mem0.ai/research)\n- STIX 2.1 schema informed by [typedb-cti](https://github.com/typedb-osi/typedb-cti)\n- Benchmarked against [LOCOMO](https://snap-research.github.io/locomo/) (ACL 2024) and [CTIBench](https://arxiv.org/abs/2406.07599) (NeurIPS 2024)\n- [LanceDB](https://lancedb.com) | [fastembed](https://github.com/qdrant/fastembed) | [Pydantic](https://pydantic.dev) | [TypeDB](https://typedb.com)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frolandpg%2Fzettelforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frolandpg%2Fzettelforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frolandpg%2Fzettelforge/lists"}