{"id":48963835,"url":"https://github.com/paperfoot/engram-cli","last_synced_at":"2026-04-18T03:04:33.926Z","repository":{"id":349900760,"uuid":"1204401551","full_name":"paperfoot/engram-cli","owner":"paperfoot","description":"Persistent memory for AI agents. Single Rust CLI, hybrid Gemini + FTS5 + RRF retrieval. R@5 = 0.99 on LongMemEval S (beats MemPalace). Agent-native: no MCP, no server, just shell out.","archived":false,"fork":false,"pushed_at":"2026-04-11T16:11:43.000Z","size":7126,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-17T22:29:22.248Z","etag":null,"topics":["ai-agents","claude","cli","gemini","hybrid-search","knowledge-graph","llm","longmemeval","memory","rag","retrieval-augmented-generation","rust","scientific-papers","semantic-search","vector-search"],"latest_commit_sha":null,"homepage":"https://github.com/199-biotechnologies/engram-2","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paperfoot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-08T01:22:19.000Z","updated_at":"2026-04-17T20:19:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/paperfoot/engram-cli","commit_stats":null,"previous_names":["199-biotechnologies/engram-2","paperfoot/engram-cli"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/paperfoot/engram-cli","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperfoot%2Fengram-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperfoot%2Fengram-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperfoot%2Fengram-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperfoot%2Fengram-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paperfoot","download_url":"https://codeload.github.com/paperfoot/engram-cli/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paperfoot%2Fengram-cli/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31954738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude","cli","gemini","hybrid-search","knowledge-graph","llm","longmemeval","memory","rag","retrieval-augmented-generation","rust","scientific-papers","semantic-search","vector-search"],"created_at":"2026-04-18T03:04:30.174Z","updated_at":"2026-04-18T03:04:33.920Z","avatar_url":"https://github.com/paperfoot.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# engram\n\n\u003e **Persistent memory for AI agents.** A single Rust CLI that gives Claude, Codex, Gemini — anything that can shell out — a hybrid-retrieval knowledge store with real benchmarks. No MCP server. No web service. No cloud dependency for the store itself.\n\n[![rust](https://img.shields.io/badge/rust-1.80%2B-orange?logo=rust)](https://www.rust-lang.org/)\n[![license](https://img.shields.io/badge/license-MIT-blue)](LICENSE)\n[![LongMemEval S R@5](https://img.shields.io/badge/LongMemEval_S_R%405-0.99-brightgreen)](#benchmarks)\n[![vs MemPalace](https://img.shields.io/badge/vs%20MemPalace-0.984-green)](#benchmarks)\n[![tests](https://img.shields.io/badge/tests-45%20passing-brightgreen)](crates/engram-cli/tests/cli.rs)\n\n```bash\ngit clone https://github.com/199-biotechnologies/engram-2\ncd engram-2\ncargo install --path crates/engram-cli --locked\nengram skill install          # tells Claude/Codex/Gemini it exists\nengram config set keys.gemini $GEMINI_API_KEY\nengram remember \"Rapamycin extends mouse lifespan via mTORC1 inhibition.\"\nengram recall \"what drug extends lifespan\"    # finds it\n```\n\n---\n\n## The problem engram solves\n\nEvery LLM chat forgets everything when the window closes. The community's answer has been **MCP servers**: long-lived processes your agent connects to over a structured protocol. The problem is that MCP tool discovery costs **~44,000 tokens** per session per server, the server has to be running, and every chat replays the whole thing.\n\nengram takes the opposite bet: **the binary is the interface**. Your agent runs `engram agent-info` once (~1,400 tokens, 32× cheaper) to learn every command, then shells out to `engram recall` / `engram remember` / `engram ingest` exactly like it already uses `gh` and `jq`. Nothing to start, nothing to keep alive, nothing to crash.\n\nThe cost of this bet is that engram has to be *demonstrably better* at retrieval than the MCP alternatives. So we benchmarked it.\n\n## Benchmarks\n\n### Retrieval — LongMemEval S (500 questions, 96% distractors)\n\nFull 500-question **[LongMemEval S split](https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned)** — 48 sessions per question, 96% distractors. Same dataset [MemPalace](https://github.com/kw27claw/mempalace) reports against.\n\n| Pipeline | R@1 | R@5 | R@10 | MRR |\n|---|---|---|---|---|\n| **MemPalace (published `hybrid_v4`)** | — | **0.984** | 0.998 | — |\n| **engram — hybrid only** (Gemini Embed 2 + FTS5 + RRF) | 0.910 | **0.990** | 0.998 | 0.946 |\n| **engram — hybrid + Cohere Rerank** (first 100 Qs) | 0.930 | 0.980 | 1.000 | 0.957 |\n\n**engram beats MemPalace on R@5 by 0.6 points** on retrieval alone — no reranking, no graph traversal, no AAAK compression, no PageRank. Adding Cohere rerank gains another ~4 points on R@1.\n\n### End-to-end QA (retrieve → LLM answer → LLM judge)\n\nRetrieval numbers alone hide the real bottleneck. [@parcadei tested MemPalace](https://x.com/parcadei/status/2041479166764196206) with an actual LLM answering questions using MemPalace's retrieved context, and got **only 17% correct answers** — despite the published R@5 of 0.984.\n\nWe implemented the same end-to-end evaluation for engram: retrieve top-k → pass to `openai/gpt-5.4` to answer → judge correctness with `openai/gpt-5.4`. Per-question results, token counts, and cost are saved to [`benchmarks/`](benchmarks/).\n\n| Suite | Sample | Correct | Accuracy | R@5 | MRR | Notes |\n|---|---|---|---|---|---|---|\n| **LongMemEval-QA** | 2 | 2 | **100%** | 1.00 | 1.00 | Easy single-session questions |\n| **LongMemEval-QA** | 3 | 1 | **33%** | 1.00 | 1.00 | Retrieval perfect, 1 interpretation error + 1 false refusal |\n| **LoCoMo-QA** | 5 | 2 | **40%** | — | — | Short multi-session test |\n| **LoCoMo-QA** | 50 | 14 | **28%** | — | — | First stable QA number on a harder dataset |\n\n**The 17% gap is real for everyone** — not just MemPalace. Our own retrieval is near-perfect (MRR = 1.0 on LongMemEval-QA), but the answerer LLM:\n- Interprets \"daily commute\" as round-trip (90 min) when the reference is one-way (45 min)\n- Refuses to answer with \"I don't know\" even when the answer is in the retrieved context\n- Fails on LoCoMo's harder multi-session reasoning\n\nThese aren't engram bugs, they're the state of the art. Retrieval R@5 ≠ answer accuracy. Measuring only retrieval — as MemPalace did — hides the real problem.\n\n**What this shows about MemPalace's claims:** their published 0.984 R@5 is probably real as a retrieval number, but the claim that \"MemPalace is the best agent memory system\" rests on conflating retrieval with end-to-end correctness. The [critical thread from Han Xiao (Jina AI)](https://x.com/hxiao/status/2041821141006971232) dissects this further.\n\n### RAGAS metrics (LLM-as-judge, four orthogonal dimensions)\n\nRun `engram bench longmemeval-qa --ragas` to compute four additional metrics on top of correctness: **faithfulness** (no hallucination), **answer relevance** (on-topic), **context precision** (retrieved chunks are all useful), **context recall** (every fact in the gold answer is in the retrieved chunks). Each adds 4 LLM calls per question, so run sparingly.\n\n### Reproducing\n\n```bash\n# Retrieval only (fast, no LLM judge):\nengram bench longmemeval --json                          # full 500\nengram bench longmemeval --limit 50 --json               # first 50\nengram bench mini --json                                 # 10-question smoke\n\n# End-to-end QA (requires OPENROUTER_API_KEY for answerer + judge):\nengram bench longmemeval-qa --limit 20 --json            # ~50 minutes on free Gemini tier\nengram bench longmemeval-qa --limit 20 --ragas --json    # + 4 extra LLM calls/question\nengram bench locomo-qa --limit 50 --json                 # ~3 minutes\n\n# Every run saves a timestamped report to benchmarks/\nls benchmarks/\n```\n\nAll runs are logged with full per-question detail, token counts, and model IDs to [`benchmarks/`](benchmarks/) so you can audit failures or rerun the judge with a different prompt without re-embedding. See [`benchmarks/README.md`](benchmarks/README.md) for the report schema.\n\n## Install\n\n```bash\n# Prerequisite: Rust 1.80+ (install via rustup.rs if needed)\ngit clone https://github.com/199-biotechnologies/engram-2\ncd engram-2\ncargo install --path crates/engram-cli --locked\n```\n\nOne binary at `~/.cargo/bin/engram`. No runtime, no Python, no Docker, no services. `engram --version` should print `engram 0.1.0`.\n\n### Configure keys\n\n```bash\n# Required for real hybrid retrieval. Free tier at https://aistudio.google.com/apikey\nengram config set keys.gemini $GEMINI_API_KEY\n\n# Optional — adds ~4 R@1 points via reranking. https://dashboard.cohere.com/api-keys\nengram config set keys.cohere $COHERE_API_KEY\n\nengram config check\n# -\u003e { \"gemini\": \"configured\", \"cohere\": \"configured (optional)\", \"ok\": true }\n```\n\nKeys are resolved in order: **explicit env var → `~/.config/engram/config.toml` → none**. Config file is written with `0600` perms (user-only). Without Gemini, recall falls back to a deterministic offline stub — useful for CI, unusable for real quality.\n\n### Tell your agents about it\n\n```bash\nengram skill install\n```\n\nThis writes a `SKILL.md` signpost to `~/.claude/skills/engram/`, `~/.codex/skills/engram/`, and `~/.gemini/skills/engram/`. Any agent that reads those directories will discover `engram`, learn the memory loop pattern, and start using it autonomously.\n\n## The memory loop (how agents should use engram)\n\nThe installed skill teaches your agent to do this every task:\n\n```bash\n# 1. LOAD — recall anything relevant before answering\nengram recall \"user's task in 4-6 words\" --top-k 5 --json\n\n# 2. WORK — do the task, citing recalled chunks when they matter\n\n# 3. SAVE — whatever the user told you that will matter later\nengram remember \"Boris prefers Rust over Go for CLI tools.\"           --importance 7 --tag preference\nengram remember \"Decision 2026-04-08: use BLOB embeddings in SQLite.\" --importance 9 --tag decision\n```\n\nRule of thumb: save preferences, explicit decisions with rationale, stable facts, and corrections. Don't save task-local state or conversation filler.\n\n## Scientific papers workflow\n\nengram is purpose-built for ingesting and querying research papers with real citations.\n\n```bash\n# Drop PDFs in a directory\ncurl -sL -o paper.pdf https://arxiv.org/pdf/2405.14831.pdf   # HippoRAG\ncurl -sL -o bert.pdf  https://arxiv.org/pdf/1810.04805.pdf   # BERT\n\n# Ingest. This runs pdf-extract -\u003e section-aware chunking (preserves\n# \"Methods \u003e Cell Culture\" breadcrumbs) -\u003e Gemini Embedding 2 (batched,\n# token-budgeted) -\u003e SQLite BLOBs. Embeddings persist forever.\nengram ingest . --mode papers\n\n# Ask questions. Returns the exact chunks with scores and sources.\nengram recall \"personalized pagerank for multi-hop retrieval\" --top-k 3 --json\n\n# Browse what engram extracted from the corpus\nengram entities list --limit 10\n# -\u003e BERT (58), HippoRAG (56), LightRAG (52), LLM (39), RAG (36), ...\n```\n\nEach result has `chunk_id`, `score`, `content`, and `sources: [\"dense\",\"lexical\",\"reranker\"]`. **Your agent should quote the content and cite the chunk_id** so you can always re-run `engram recall` to verify a claim.\n\nTested on 5 arXiv papers (Attention, BERT, HippoRAG, LightRAG, RAG — 1,171 chunks) in 21 seconds end-to-end.\n\n## Architecture\n\n```\n        query\n          │\n ┌────────┴────────┐\n │                 │\n ▼                 ▼\nDense          Lexical\n(Gemini        (FTS5\n Embed 2        BM25 over\n batched +      chunks.content\n cached)        in SQLite)\n │                 │\n └────────┬────────┘\n          │\n          ▼\n Reciprocal Rank Fusion\n (k=60, deterministic tiebreak)\n          │\n          ▼\n (optional) Cohere Rerank 4 Pro\n reranks the top 50 candidates\n          │\n          ▼\n Memory layer budgeting\n (L0 identity / L1 critical /\n  L2 topic / L3 deep)\n          │\n          ▼\n JSON envelope on stdout,\n errors on stderr,\n exit codes 0-4\n```\n\n- **SQLite** is the source of truth. Chunks store their embedding as a little-endian `f32` BLOB plus an `embed_model` tag.\n- **FTS5** is the lexical index, included in the same database file.\n- **No separate vector server** — at personal scale (\u003c100K vectors) brute-force cosine in Rust is fast enough. We skipped Qdrant and LanceDB on purpose.\n- **Deterministic everything**: UUID v5 for IDs, stable sort tiebreak in fusion, reproducible bench runs.\n\nCargo workspace layout:\n\n| Crate | Purpose |\n|---|---|\n| `engram-core` | Pure types, fusion (RRF), memory layers, AAAK compression, temporal validity. Zero I/O. |\n| `engram-storage` | SQLite source of truth + FTS5 + chunk-embedding BLOBs. |\n| `engram-embed` | `Embedder` trait + Gemini Embed 2 (batch + single) + deterministic offline stub. |\n| `engram-rerank` | `Reranker` trait + Cohere Rerank 4 Pro + passthrough. |\n| `engram-ingest` | Mining modes: papers (PDF + section-aware), conversations, repos, general, auto. |\n| `engram-graph` | Deterministic entity extraction + graph scaffolding. |\n| `engram-bench` | LongMemEval harness + inline mini bench. |\n| `engram-cli` | The single `engram` binary and the shared hybrid retrieval pipeline. |\n\n## Framework compliance\n\nengram follows the **[agent-cli-framework](https://github.com/199-biotechnologies/agent-cli-framework)** verbatim:\n\n- `agent-info` returns a raw JSON manifest (not enveloped) so agents can discover every command in one call\n- JSON envelope on every other stdout path (`version`, `status`, `data`, `metadata`)\n- Errors on stderr with `code`, `message`, `suggestion`, `exit_code`\n- Semantic exit codes: `0` success, `1` transient (retry), `2` config (fix setup), `3` bad input (fix args), `4` rate limited (back off)\n- No interactive prompts. Destructive ops like `forget` require `--confirm`\n- XDG paths everywhere (`~/.config/engram/`, `~/.local/share/engram/`, `~/.cache/engram/`)\n- Skill file embedded in the binary as a compile-time constant and deployed via `engram skill install`\n- Secrets resolved in order: env var → config file → none. Always masked on display (`AIzaSy...DW58`)\n\n## All the commands (`engram agent-info` for the full manifest)\n\n| | |\n|---|---|\n| `engram remember \u003ccontent\u003e` | Store a memory. Flags: `--importance 0-10`, `--tag` (repeatable), `--diary` |\n| `engram recall \u003cquery\u003e` | Hybrid search. Flags: `--top-k`, `--layer identity\\|critical\\|topic\\|deep`, `--diary`, `--since`, `--until` |\n| `engram ingest \u003cpath\u003e` | Mine a file or directory. `--mode papers\\|conversations\\|repos\\|general\\|auto` |\n| `engram edit \u003cid\u003e` | Update memory content or importance |\n| `engram forget \u003cid\u003e --confirm` | Soft-delete (destructive, requires `--confirm`) |\n| `engram entities list \\| show \u003cname\u003e` | Browse extracted entities |\n| `engram export` / `engram import \u003cfile\u003e` | JSON backup / restore |\n| `engram bench \u003cmini\\|mini-fts\\|longmemeval\u003e` | Run benchmarks |\n| `engram config show \\| set \\| check` | Configuration |\n| `engram skill install \\| uninstall` | Deploy agent skill signpost |\n| `engram agent-info` | Self-describing manifest (start here) |\n\n## Development\n\n```bash\ncargo build --release                         # build\ncargo test                                    # 27 unit + 18 integration tests\n./target/release/engram bench mini --json     # fast smoke bench (\u003c1s)\n./target/release/engram bench longmemeval     # real benchmark (~5 min with Cohere)\n```\n\nResearch direction for contributors: [`program.md`](program.md) — enumerates the hyperparameters and architecture experiments worth running via [autoresearch](https://github.com/199-biotechnologies/autoresearch) loops. Design rationale: [`docs/superpowers/specs/2026-04-07-engram-v2-design.md`](docs/superpowers/specs/2026-04-07-engram-v2-design.md).\n\n## Roadmap\n\n**Shipped (v0.1.0)**\n- Single-binary install, hybrid Gemini + FTS5 + RRF retrieval\n- Persistent SQLite store with chunk-embedding BLOBs\n- Full CRUD (`remember`, `recall`, `edit`, `forget`, `export`, `import`)\n- Mining modes for papers, conversations, repos, general\n- PDF ingestion via `pdf-extract`\n- Section-aware chunking, AAAK compression prototype\n- Cohere Rerank 4 Pro wired as optional lift\n- Memory layers (L0–L3) with token budgeting\n- Diary namespaces for specialist agents\n- Entity extraction and browsing\n- LongMemEval harness (Oracle + S splits)\n- 45 unit + integration tests\n\n**Next up**\n- GitHub Actions CI releasing prebuilt macOS + Linux binaries\n- `cargo install engram-cli` from crates.io\n- `engram update --check` wired to real GitHub Releases\n- Local embedding fallback via `candle` + `bge-small-en-v1.5` (zero API, p95 \u003c 10 ms)\n- `ENGRAM_RERANK_TOP_N` knob to cut Cohere cost ~60% with minimal quality loss\n- Graph expansion on retrieval (deterministic edges already extracted)\n\n## Credits\n\nInspired by:\n\n- **[MemPalace](https://github.com/kw27claw/mempalace)** — spatial memory + AAAK compression philosophy\n- **[HippoRAG 2](https://github.com/OSU-NLP-Group/HippoRAG)** — \"return verbatim passages, don't paraphrase\"\n- **[LongMemEval](https://github.com/xiaowu0162/LongMemEval)** — the benchmark we aimed at\n- **[agent-cli-framework](https://github.com/199-biotechnologies/agent-cli-framework)** — the principles engram follows verbatim\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\nBuilt by **[199 Biotechnologies](https://github.com/199-biotechnologies)**.\nQuestions? Open an issue. Pull requests welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaperfoot%2Fengram-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaperfoot%2Fengram-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaperfoot%2Fengram-cli/lists"}