https://github.com/paperfoot/engram-cli

Persistent memory for AI agents. Single Rust CLI, hybrid Gemini + FTS5 + RRF retrieval. R@5 = 0.99 on LongMemEval S (beats MemPalace). Agent-native: no MCP, no server, just shell out.
https://github.com/paperfoot/engram-cli
ai-agents claude cli gemini hybrid-search knowledge-graph llm longmemeval memory rag retrieval-augmented-generation rust scientific-papers semantic-search vector-search
Last synced: 3 months ago
JSON representation
Persistent memory for AI agents. Single Rust CLI, hybrid Gemini + FTS5 + RRF retrieval. R@5 = 0.99 on LongMemEval S (beats MemPalace). Agent-native: no MCP, no server, just shell out.
Host: GitHub
URL: https://github.com/paperfoot/engram-cli
Owner: paperfoot
Created: 2026-04-08T01:22:19.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-11T16:11:43.000Z (3 months ago)
Last Synced: 2026-04-17T22:29:22.248Z (3 months ago)
Topics: ai-agents, claude, cli, gemini, hybrid-search, knowledge-graph, llm, longmemeval, memory, rag, retrieval-augmented-generation, rust, scientific-papers, semantic-search, vector-search
Language: Rust
Homepage: https://github.com/199-biotechnologies/engram-2
Size: 6.8 MB
Stars: 5
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # engram

> **Persistent memory for AI agents.** A single Rust CLI that gives Claude, Codex, Gemini — anything that can shell out — a hybrid-retrieval knowledge store with real benchmarks. No MCP server. No web service. No cloud dependency for the store itself.

[![rust](https://img.shields.io/badge/rust-1.80%2B-orange?logo=rust)](https://www.rust-lang.org/)

[![license](https://img.shields.io/badge/license-MIT-blue)](LICENSE)

[![LongMemEval S R@5](https://img.shields.io/badge/LongMemEval_S_R%405-0.99-brightgreen)](#benchmarks)

[![vs MemPalace](https://img.shields.io/badge/vs%20MemPalace-0.984-green)](#benchmarks)

[![tests](https://img.shields.io/badge/tests-45%20passing-brightgreen)](crates/engram-cli/tests/cli.rs)

```bash

git clone https://github.com/199-biotechnologies/engram-2

cd engram-2

cargo install --path crates/engram-cli --locked

engram skill install          # tells Claude/Codex/Gemini it exists

engram config set keys.gemini $GEMINI_API_KEY

engram remember "Rapamycin extends mouse lifespan via mTORC1 inhibition."

engram recall "what drug extends lifespan"    # finds it

```

---

## The problem engram solves

Every LLM chat forgets everything when the window closes. The community's answer has been **MCP servers**: long-lived processes your agent connects to over a structured protocol. The problem is that MCP tool discovery costs **~44,000 tokens** per session per server, the server has to be running, and every chat replays the whole thing.

engram takes the opposite bet: **the binary is the interface**. Your agent runs `engram agent-info` once (~1,400 tokens, 32× cheaper) to learn every command, then shells out to `engram recall` / `engram remember` / `engram ingest` exactly like it already uses `gh` and `jq`. Nothing to start, nothing to keep alive, nothing to crash.

The cost of this bet is that engram has to be *demonstrably better* at retrieval than the MCP alternatives. So we benchmarked it.

## Benchmarks

### Retrieval — LongMemEval S (500 questions, 96% distractors)

Full 500-question **[LongMemEval S split](https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned)** — 48 sessions per question, 96% distractors. Same dataset [MemPalace](https://github.com/kw27claw/mempalace) reports against.

| Pipeline | R@1 | R@5 | R@10 | MRR |

|---|---|---|---|---|

| **MemPalace (published `hybrid_v4`)** | — | **0.984** | 0.998 | — |

| **engram — hybrid only** (Gemini Embed 2 + FTS5 + RRF) | 0.910 | **0.990** | 0.998 | 0.946 |

| **engram — hybrid + Cohere Rerank** (first 100 Qs) | 0.930 | 0.980 | 1.000 | 0.957 |

**engram beats MemPalace on R@5 by 0.6 points** on retrieval alone — no reranking, no graph traversal, no AAAK compression, no PageRank. Adding Cohere rerank gains another ~4 points on R@1.

### End-to-end QA (retrieve → LLM answer → LLM judge)

Retrieval numbers alone hide the real bottleneck. [@parcadei tested MemPalace](https://x.com/parcadei/status/2041479166764196206) with an actual LLM answering questions using MemPalace's retrieved context, and got **only 17% correct answers** — despite the published R@5 of 0.984.

We implemented the same end-to-end evaluation for engram: retrieve top-k → pass to `openai/gpt-5.4` to answer → judge correctness with `openai/gpt-5.4`. Per-question results, token counts, and cost are saved to [`benchmarks/`](benchmarks/).

| Suite | Sample | Correct | Accuracy | R@5 | MRR | Notes |

|---|---|---|---|---|---|---|

| **LongMemEval-QA** | 2 | 2 | **100%** | 1.00 | 1.00 | Easy single-session questions |

| **LongMemEval-QA** | 3 | 1 | **33%** | 1.00 | 1.00 | Retrieval perfect, 1 interpretation error + 1 false refusal |

| **LoCoMo-QA** | 5 | 2 | **40%** | — | — | Short multi-session test |

| **LoCoMo-QA** | 50 | 14 | **28%** | — | — | First stable QA number on a harder dataset |

**The 17% gap is real for everyone** — not just MemPalace. Our own retrieval is near-perfect (MRR = 1.0 on LongMemEval-QA), but the answerer LLM:

- Interprets "daily commute" as round-trip (90 min) when the reference is one-way (45 min)

- Refuses to answer with "I don't know" even when the answer is in the retrieved context

- Fails on LoCoMo's harder multi-session reasoning

These aren't engram bugs, they're the state of the art. Retrieval R@5 ≠ answer accuracy. Measuring only retrieval — as MemPalace did — hides the real problem.

**What this shows about MemPalace's claims:** their published 0.984 R@5 is probably real as a retrieval number, but the claim that "MemPalace is the best agent memory system" rests on conflating retrieval with end-to-end correctness. The [critical thread from Han Xiao (Jina AI)](https://x.com/hxiao/status/2041821141006971232) dissects this further.

### RAGAS metrics (LLM-as-judge, four orthogonal dimensions)

Run `engram bench longmemeval-qa --ragas` to compute four additional metrics on top of correctness: **faithfulness** (no hallucination), **answer relevance** (on-topic), **context precision** (retrieved chunks are all useful), **context recall** (every fact in the gold answer is in the retrieved chunks). Each adds 4 LLM calls per question, so run sparingly.

### Reproducing

```bash

# Retrieval only (fast, no LLM judge):

engram bench longmemeval --json                          # full 500

engram bench longmemeval --limit 50 --json               # first 50

engram bench mini --json                                 # 10-question smoke

# End-to-end QA (requires OPENROUTER_API_KEY for answerer + judge):

engram bench longmemeval-qa --limit 20 --json            # ~50 minutes on free Gemini tier

engram bench longmemeval-qa --limit 20 --ragas --json    # + 4 extra LLM calls/question

engram bench locomo-qa --limit 50 --json                 # ~3 minutes

# Every run saves a timestamped report to benchmarks/

ls benchmarks/

```

All runs are logged with full per-question detail, token counts, and model IDs to [`benchmarks/`](benchmarks/) so you can audit failures or rerun the judge with a different prompt without re-embedding. See [`benchmarks/README.md`](benchmarks/README.md) for the report schema.

## Install

```bash

# Prerequisite: Rust 1.80+ (install via rustup.rs if needed)

git clone https://github.com/199-biotechnologies/engram-2

cd engram-2

cargo install --path crates/engram-cli --locked

```

One binary at `~/.cargo/bin/engram`. No runtime, no Python, no Docker, no services. `engram --version` should print `engram 0.1.0`.

### Configure keys

```bash

# Required for real hybrid retrieval. Free tier at https://aistudio.google.com/apikey

engram config set keys.gemini $GEMINI_API_KEY

# Optional — adds ~4 R@1 points via reranking. https://dashboard.cohere.com/api-keys

engram config set keys.cohere $COHERE_API_KEY

engram config check

# -> { "gemini": "configured", "cohere": "configured (optional)", "ok": true }

```

Keys are resolved in order: **explicit env var → `~/.config/engram/config.toml` → none**. Config file is written with `0600` perms (user-only). Without Gemini, recall falls back to a deterministic offline stub — useful for CI, unusable for real quality.

### Tell your agents about it

```bash

engram skill install

```

This writes a `SKILL.md` signpost to `~/.claude/skills/engram/`, `~/.codex/skills/engram/`, and `~/.gemini/skills/engram/`. Any agent that reads those directories will discover `engram`, learn the memory loop pattern, and start using it autonomously.

## The memory loop (how agents should use engram)

The installed skill teaches your agent to do this every task:

```bash

# 1. LOAD — recall anything relevant before answering

engram recall "user's task in 4-6 words" --top-k 5 --json

# 2. WORK — do the task, citing recalled chunks when they matter

# 3. SAVE — whatever the user told you that will matter later

engram remember "Boris prefers Rust over Go for CLI tools."           --importance 7 --tag preference

engram remember "Decision 2026-04-08: use BLOB embeddings in SQLite." --importance 9 --tag decision

```

Rule of thumb: save preferences, explicit decisions with rationale, stable facts, and corrections. Don't save task-local state or conversation filler.

## Scientific papers workflow

engram is purpose-built for ingesting and querying research papers with real citations.

```bash

# Drop PDFs in a directory

curl -sL -o paper.pdf https://arxiv.org/pdf/2405.14831.pdf   # HippoRAG

curl -sL -o bert.pdf  https://arxiv.org/pdf/1810.04805.pdf   # BERT

# Ingest. This runs pdf-extract -> section-aware chunking (preserves

# "Methods > Cell Culture" breadcrumbs) -> Gemini Embedding 2 (batched,

# token-budgeted) -> SQLite BLOBs. Embeddings persist forever.

engram ingest . --mode papers

# Ask questions. Returns the exact chunks with scores and sources.

engram recall "personalized pagerank for multi-hop retrieval" --top-k 3 --json

# Browse what engram extracted from the corpus

engram entities list --limit 10

# -> BERT (58), HippoRAG (56), LightRAG (52), LLM (39), RAG (36), ...

```

Each result has `chunk_id`, `score`, `content`, and `sources: ["dense","lexical","reranker"]`. **Your agent should quote the content and cite the chunk_id** so you can always re-run `engram recall` to verify a claim.

Tested on 5 arXiv papers (Attention, BERT, HippoRAG, LightRAG, RAG — 1,171 chunks) in 21 seconds end-to-end.

## Architecture

```

        query

          │

 ┌────────┴────────┐

 │                 │

 ▼                 ▼

Dense          Lexical

(Gemini        (FTS5

 Embed 2        BM25 over

 batched +      chunks.content

 cached)        in SQLite)

 │                 │

 └────────┬────────┘

          │

          ▼

 Reciprocal Rank Fusion

 (k=60, deterministic tiebreak)

          │

          ▼

 (optional) Cohere Rerank 4 Pro

 reranks the top 50 candidates

          │

          ▼

 Memory layer budgeting

 (L0 identity / L1 critical /

  L2 topic / L3 deep)

          │

          ▼

 JSON envelope on stdout,

 errors on stderr,

 exit codes 0-4

```

- **SQLite** is the source of truth. Chunks store their embedding as a little-endian `f32` BLOB plus an `embed_model` tag.

- **FTS5** is the lexical index, included in the same database file.

- **No separate vector server** — at personal scale (<100K vectors) brute-force cosine in Rust is fast enough. We skipped Qdrant and LanceDB on purpose.

- **Deterministic everything**: UUID v5 for IDs, stable sort tiebreak in fusion, reproducible bench runs.

Cargo workspace layout:

| Crate | Purpose |

|---|---|

| `engram-core` | Pure types, fusion (RRF), memory layers, AAAK compression, temporal validity. Zero I/O. |

| `engram-storage` | SQLite source of truth + FTS5 + chunk-embedding BLOBs. |

| `engram-embed` | `Embedder` trait + Gemini Embed 2 (batch + single) + deterministic offline stub. |

| `engram-rerank` | `Reranker` trait + Cohere Rerank 4 Pro + passthrough. |

| `engram-ingest` | Mining modes: papers (PDF + section-aware), conversations, repos, general, auto. |

| `engram-graph` | Deterministic entity extraction + graph scaffolding. |

| `engram-bench` | LongMemEval harness + inline mini bench. |

| `engram-cli` | The single `engram` binary and the shared hybrid retrieval pipeline. |

## Framework compliance

engram follows the **[agent-cli-framework](https://github.com/199-biotechnologies/agent-cli-framework)** verbatim:

- `agent-info` returns a raw JSON manifest (not enveloped) so agents can discover every command in one call

- JSON envelope on every other stdout path (`version`, `status`, `data`, `metadata`)

- Errors on stderr with `code`, `message`, `suggestion`, `exit_code`

- Semantic exit codes: `0` success, `1` transient (retry), `2` config (fix setup), `3` bad input (fix args), `4` rate limited (back off)

- No interactive prompts. Destructive ops like `forget` require `--confirm`

- XDG paths everywhere (`~/.config/engram/`, `~/.local/share/engram/`, `~/.cache/engram/`)

- Skill file embedded in the binary as a compile-time constant and deployed via `engram skill install`

- Secrets resolved in order: env var → config file → none. Always masked on display (`AIzaSy...DW58`)

## All the commands (`engram agent-info` for the full manifest)

| | |

|---|---|

| `engram remember ` | Store a memory. Flags: `--importance 0-10`, `--tag` (repeatable), `--diary` |

| `engram recall ` | Hybrid search. Flags: `--top-k`, `--layer identity\|critical\|topic\|deep`, `--diary`, `--since`, `--until` |

| `engram ingest ` | Mine a file or directory. `--mode papers\|conversations\|repos\|general\|auto` |

| `engram edit ` | Update memory content or importance |

| `engram forget  --confirm` | Soft-delete (destructive, requires `--confirm`) |

| `engram entities list \| show ` | Browse extracted entities |

| `engram export` / `engram import ` | JSON backup / restore |

| `engram bench ` | Run benchmarks |

| `engram config show \| set \| check` | Configuration |

| `engram skill install \| uninstall` | Deploy agent skill signpost |

| `engram agent-info` | Self-describing manifest (start here) |

## Development

```bash

cargo build --release                         # build

cargo test                                    # 27 unit + 18 integration tests

./target/release/engram bench mini --json     # fast smoke bench (<1s)

./target/release/engram bench longmemeval     # real benchmark (~5 min with Cohere)

```

Research direction for contributors: [`program.md`](program.md) — enumerates the hyperparameters and architecture experiments worth running via [autoresearch](https://github.com/199-biotechnologies/autoresearch) loops. Design rationale: [`docs/superpowers/specs/2026-04-07-engram-v2-design.md`](docs/superpowers/specs/2026-04-07-engram-v2-design.md).

## Roadmap

**Shipped (v0.1.0)**

- Single-binary install, hybrid Gemini + FTS5 + RRF retrieval

- Persistent SQLite store with chunk-embedding BLOBs

- Full CRUD (`remember`, `recall`, `edit`, `forget`, `export`, `import`)

- Mining modes for papers, conversations, repos, general

- PDF ingestion via `pdf-extract`

- Section-aware chunking, AAAK compression prototype

- Cohere Rerank 4 Pro wired as optional lift

- Memory layers (L0–L3) with token budgeting

- Diary namespaces for specialist agents

- Entity extraction and browsing

- LongMemEval harness (Oracle + S splits)

- 45 unit + integration tests

**Next up**

- GitHub Actions CI releasing prebuilt macOS + Linux binaries

- `cargo install engram-cli` from crates.io

- `engram update --check` wired to real GitHub Releases

- Local embedding fallback via `candle` + `bge-small-en-v1.5` (zero API, p95 < 10 ms)

- `ENGRAM_RERANK_TOP_N` knob to cut Cohere cost ~60% with minimal quality loss

- Graph expansion on retrieval (deterministic edges already extracted)

## Credits

Inspired by:

- **[MemPalace](https://github.com/kw27claw/mempalace)** — spatial memory + AAAK compression philosophy

- **[HippoRAG 2](https://github.com/OSU-NLP-Group/HippoRAG)** — "return verbatim passages, don't paraphrase"

- **[LongMemEval](https://github.com/xiaowu0162/LongMemEval)** — the benchmark we aimed at

- **[agent-cli-framework](https://github.com/199-biotechnologies/agent-cli-framework)** — the principles engram follows verbatim

## License

MIT — see [LICENSE](LICENSE).

---

Built by **[199 Biotechnologies](https://github.com/199-biotechnologies)**.

Questions? Open an issue. Pull requests welcome.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paperfoot/engram-cli

Awesome Lists containing this project

README