https://github.com/toonight/mnemoscope

Observability and predictive memory tooling for LLM agents on Markdown vaults — predictive context-rot scoring, Ed25519-signed hash-chained journal, hierarchical tiering. 100% local. MCP server + Obsidian plugin.
https://github.com/toonight/mnemoscope
agent-memory claude-code context-rot ed25519 llm-observability markdown mcp model-context-protocol obsidian obsidian-plugin opentimestamps typescript
Last synced: 16 days ago
JSON representation
Host: GitHub
URL: https://github.com/toonight/mnemoscope
Owner: toonight
License: apache-2.0
Created: 2026-04-27T10:59:20.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-04-27T19:17:57.000Z (about 2 months ago)
Last Synced: 2026-04-27T21:14:04.413Z (about 2 months ago)
Topics: agent-memory, claude-code, context-rot, ed25519, llm-observability, markdown, mcp, model-context-protocol, obsidian, obsidian-plugin, opentimestamps, typescript
Language: TypeScript
Homepage: https://github.com/toonight/Mnemoscope
Size: 592 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          






  

  

  

  

  

  

  

  



An open-source observability layer for LLM agent memory on Markdown vaults.


Predict context rot before it happens · audit every agent write with a signed journal · tier your knowledge the way the science says you should.


_{Demo · Workflow · Tools · Quickstart · Architecture · Science}



---

> [!NOTE]

> The dominant 2025–2026 narrative on X — *"Markdown trips up the LLM at scale"* — is partially wrong. **Markdown** does not trip up the LLM. **Long-context loading** trips up the LLM ([Chroma, *Context Rot*, July 2025](https://www.trychroma.com/research/context-rot)). Mnemoscope is built on that distinction.

## 👀 See it in action

Real output, captured from the bundled [`examples/demo-vault`](./examples/demo-vault) — a 13-note synthetic vault built so every rot factor moves. Reproduce locally with `mnemoscope-init examples/demo-vault` (full transcript: [SAMPLE-OUTPUT.md](./examples/demo-vault/SAMPLE-OUTPUT.md)).

**`predict_rot` — score, factors, top-risk notes**



**`mnemoscope-verify` — clean run vs. tamper detection**



Full overview (click) — gauge, factors, top-risk notes, both verify states, tier counts, hash chain — all on one page



## ✨ What is Mnemoscope?

Mnemoscope is **not** another memory store. It is an **instrument** that sits between your LLM agent and your Markdown vault and gives you three things nobody else gives you in one tool:

- 🎯 **Predict** the rot risk of a corpus *before* injection, with a citation-backed score across 5 factors.

- 📝 **Witness** every read and write your agent performs, in an Ed25519-signed, hash-chained journal that detects field-level tampering, deletion, and reordering.

- 🧱 **Tier** the corpus into a working / episodic / semantic hierarchy, drawing on the 2025–2026 science instead of the GraphRAG hype.

It ships as an **MCP server** (Claude Code, Cursor, ChatGPT desktop, anything MCP-compatible), an **Obsidian plugin**, and a **Claude Code `PostToolUse` hook**. Everything runs **100% locally**. No cloud. No telemetry without explicit opt-in.

## 🔄 How it fits your workflow

Imagine you start a brand-new project — a folder of Markdown notes you'll grow with Claude Code over the next year. Mnemoscope plugs into the lifecycle in five places:

```

   [create project]

        │

        ▼

   mnemoscope-init                     ◄─── 1× at the very start

        │                                   creates .mnemoscope/, generates Ed25519 keypair

        ▼

   ┌───────────────────────────────────────────────────────────┐

   │  [you work with Claude Code on the vault]                 │

   │                                                           │

   │   predict_rot       ──┐                                   │

   │                       ├─►  on demand (or before sessions) │

   │   get_tiered_read   ──┘    "is the vault healthy?"        │

   │                            "what should the agent read?"  │

   │                                                           │

   │   PostToolUse hook  ────►  passive, on every Write/Edit   │

   │                            "what did the agent just do?"  │

   └───────────────────────────────────────────────────────────┘

        │

        ▼

   mnemoscope-verify                   ◄─── on demand, or in CI

                                            "has anyone tampered?"

```

| Phase | Tool / command | When to use it | What you get |

|---|---|---|---|

| 1. **Bootstrap** | `mnemoscope-init` | Once, at project creation | `.mnemoscope/` + per-vault Ed25519 keypair |

| 2. **Predict** | `predict_rot` (MCP tool) | Before injecting a vault into the LLM | A 0–100 risk score + factor breakdown + top-risk notes |

| 3. **Compact** | `get_tiered_read` (MCP tool) | When the vault grows past your model's effective context | Working / episodic / semantic split |

| 4. **Witness** | `mnemoscope-record-hook` (Claude Code PostToolUse hook) | Wired once in `~/.claude/settings.json`, then **passive** | Every agent write becomes a signed journal entry |

| 5. **Audit** | `mnemoscope-verify` | Any time, or as a pre-commit / CI step | Exit 0 if all entries verify, exit 1 if tampered |

## 🛠️ The four MCP tools

| Tool | Input | What it returns |

|---|---|---|

| `predict_rot` | `vault_path` | Score 0–100, dominant factor, full factor breakdown, top 5 risk notes, vault stats |

| `get_tiered_read` | `vault_path`, optional age thresholds | Note paths grouped into `working` / `episodic` / `semantic` |

| `record_journal` | `vault_path`, `session_id`, `op`, `target_path`, optional content | The signed entry, including its `sig`, `keyFingerprint`, and `prevHash` |

| `read_journal` | `vault_path`, optional `session_id` | All journal entries, or a single session's entries |

#### Example — `predict_rot` on a real vault

```json

{

  "rot_risk": 41,

  "dominant_factor": "tokenVolume",

  "factors": {

    "tokenVolume": 100, "semanticRedundancy": 0,

    "distractorDensity": 2.65, "structuralCoherence": 100, "freshnessSpread": 0

  },

  "top_risk_notes": [

    { "relPath": "brainstorms/.../transcript.md", "approxTokens": 13439, "reason": "very large note" },

    { "relPath": "brainstorms/.../sylvie-signaux.md", "approxTokens": 12605, "reason": "very large note" }

  ],

  "vault_stats": { "noteCount": 113, "approxTokens": 506823 },

  "baseline_model": "v0-heuristic",

  "version": "0.2.0"

}

```

## 🚀 Quickstart

```bash

git clone https://github.com/toonight/Mnemoscope

cd Mnemoscope

npm install

npm run build

npm test                           # 47 tests across core + mcp-server

npm audit                          # 0 vulnerabilities

# Make the CLI binaries available on your PATH

npm link --workspace @mnemoscope/cli

```

### Bootstrap a vault

```bash

mnemoscope-init /path/to/your/vault

# → state dir, Ed25519 keypair, fingerprint

```

> Add `.mnemoscope/` to your vault's `.gitignore` — the per-vault private key must never be committed.

### Connect the MCP server to Claude Code (or Cursor / any MCP client)

```json

// ~/.claude/settings.json

{

  "mcpServers": {

    "mnemoscope": {

      "command": "node",

      "args": ["/absolute/path/to/Mnemoscope/packages/mcp-server/dist/index.js"]

    }

  }

}

```

The four tools (`predict_rot`, `get_tiered_read`, `record_journal`, `read_journal`) become available to the agent immediately.

### (Optional) wire the auto-journal hook

Asking the agent to call `record_journal` on every write is a recipe for forgetting. Wire the bundled hook instead:

```json

// ~/.claude/settings.json

{

  "hooks": {

    "PostToolUse": [

      {

        "matcher": "Write|Edit|MultiEdit",

        "hooks": [{ "type": "command", "command": "mnemoscope-record-hook" }]

      }

    ]

  }

}

```

The hook resolves the vault root via `MNEMOSCOPE_VAULT_PATH` or by walking up to the closest `.mnemoscope/` directory. It **never blocks** the tool call: any internal error is caught, logged to stderr, and the process exits 0. Full setup including safety properties: [docs/claude-code-hook.md](./docs/claude-code-hook.md).

### Verify the journal

```bash

mnemoscope-verify /path/to/vault

# ok    2026-04-26T19:42:13.001Z  write  /vault/notes/foo.md

# ok    2026-04-26T19:43:01.220Z  write  /vault/notes/bar.md

# 2 entries; 2 valid; 0 invalid

```

`mnemoscope-verify` exits non-zero on any of:

- field-level tampering (signature mismatch),

- deletion or reordering (`prevHash` chain break),

- entries signed by a key the current vault does not own.

### (Optional) back up the per-vault private key

If you lose `/.mnemoscope/keys/ed25519.key`, the journal becomes unverifiable. The bundled backup CLIs encrypt the key with a passphrase (scrypt + AES-256-GCM, no extra deps) and let you restore it later:

```bash

mnemoscope-backup-key /path/to/vault /path/to/off-vault-backup.enc.json

# … prompts for a passphrase, writes chmod 0600 …

mnemoscope-restore-key /path/to/vault /path/to/off-vault-backup.enc.json

# … prompts for the same passphrase, writes the key back into the vault …

```

Full flow including threat model: [docs/key-escrow.md](./docs/key-escrow.md).

### (Optional) anchor the journal in time with OpenTimestamps

The signed hash chain proves *order*. To prove *absolute time* and stay safe against retroactive rewrites if the per-vault key is ever compromised, anchor each entry's signature to a public Bitcoin-backed OTS calendar:

```bash

mnemoscope-timestamp /path/to/vault

# … POSTs SHA-256(sig) per entry to the calendar, writes .ots proofs

# under /.mnemoscope/timestamps/. Idempotent on re-run.

```

Pending proofs are upgraded to fully self-verifying Bitcoin proofs with the upstream `ots upgrade` / `ots verify` CLIs — that part is intentionally not reimplemented. Full threat model and flow: [docs/timestamping.md](./docs/timestamping.md).

## ✅ What works today

| | What | How verified |

|---|---|---|

| ✅ | `predict_rot` returns a 5-factor breakdown, each factor citation-backed in source | 14 unit tests; smoke-tested on a real 506 K-token vault — sensible top-risk ordering |

| ✅ | `get_tiered_read` splits a vault into working / episodic / semantic by freshness | integration test on fixture vault; freshness-based, access-frequency aware in a future revision |

| ✅ | `record_journal` produces a real **Ed25519** signature with **prevHash** chaining | 9 journal tests, including 4 tamper tests + 2 chain-integrity tests (truncation, reordering) |

| ✅ | `mnemoscope-init` bootstraps a vault idempotently | manual run on multiple fresh + existing vaults |

| ✅ | `mnemoscope-verify` CLI replays and exits non-zero on any invalid entry | wired to the same `verifyAll` |

| ✅ | `mnemoscope-record-hook` Claude Code `PostToolUse` hook auto-journals every Write/Edit/MultiEdit | [docs/claude-code-hook.md](./docs/claude-code-hook.md), never blocks |

| ✅ | `mnemoscope-backup-key` / `mnemoscope-restore-key` encrypt the per-vault Ed25519 key with scrypt + AES-256-GCM | 7 unit tests, full flow in [docs/key-escrow.md](./docs/key-escrow.md) |

| ✅ | `mnemoscope-timestamp` anchors each entry's signature to a Bitcoin-backed OpenTimestamps calendar; pending `.ots` proofs upgraded with the official `ots` CLI | 12 unit tests + smoke-tested 3 entries → 3 `.ots` files round-trip through `verifyOtsHeaderForDigest`; full flow in [docs/timestamping.md](./docs/timestamping.md) |

| ✅ | MCP server passes 5 end-to-end tests over real JSON-RPC stdio | `server.test.ts` spawns the binary |

| ✅ | Obsidian plugin: sidebar view with SVG rot gauge, factor bars, top-risk list, settings tab, auto-onboarding modal on first launch | single-file bundle, no runtime deps; `eslint-plugin-obsidianmd` clean in CI |

| ✅ | Research sub-project: predictive classifier **calibrated on real LLM measurements** (Random Forest R² = 0.58 on 50 rows graded by Gemma 4 26B), MarkdownMemBench v0.1 schema + sample dataset + harness, Chroma replication protocol with position-of-needle sweep | self-contained Python project under [`research/`](./research); CI runs `ruff` + 14 pytest cases on every push; classifier metadata audited in [`research/classifier/model.json`](./research/classifier/model.json) |

| ✅ | CI green on Node 22 + Python 3.11, **0 npm vulnerabilities**, `npm audit --audit-level=moderate` and `eslint-plugin-obsidianmd` enforced on every push | GitHub Actions on every push and PR |

| ✅ | Three npm packages (`@mnemoscope/{core,mcp-server,cli}@0.2.0`) live on the public npm registry, published via [OIDC Trusted Publishing](https://docs.npmjs.com/trusted-publishers/) (no rotating token, automatic provenance) | `npm view @mnemoscope/core` etc.; release workflow at `.github/workflows/release.yml` |

| ✅ | The MCP server is listed on the [Official MCP Registry](https://registry.modelcontextprotocol.io/) under `io.github.toonight/mnemoscope @ 0.2.0` — automatic fan-out to PulseMCP and other downstream catalogs | [`server.json`](./server.json) at repo root, registered via `mcp-publisher` CLI |

## 🏗️ Architecture

```mermaid

flowchart LR

    A["Obsidian vault
Markdown files"] --> B["mnemoscope/core
signatures · rot · tiering · Ed25519 chained journal"]

    B --> C["mnemoscope/mcp-server
stdio MCP - 4 tools"]

    B --> D["mnemoscope/obsidian-plugin
UI · rot gauge"]

    B --> G["mnemoscope/cli
init · record-hook · verify"]

    C -->|tools| E(("Claude Code
Cursor
ChatGPT desktop"))

    G -->|PostToolUse hook| E

    F["research/
classifier · benchmark · replication"] -.->|trained ONNX classifier| B

    style A fill:#1a2444,stroke:#a78bfa,color:#cbd5e1

    style B fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style C fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style D fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style G fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style E fill:#1a2444,stroke:#7cf09d,color:#cbd5e1

    style F fill:#1a2444,stroke:#fbbf24,color:#cbd5e1

```

```

mnemoscope/

├── packages/

│   ├── core/              # rot scoring, tiering, Ed25519 hash-chained journal, signatures

│   ├── mcp-server/        # MCP server (stdio); 4 tools, integration-tested via spawn

│   ├── obsidian-plugin/   # Obsidian plugin: rot gauge, factor bars, top-risk list, settings

│   └── cli/               # mnemoscope-init, mnemoscope-record-hook, mnemoscope-verify

├── examples/

│   └── demo-vault/        # 13-note synthetic vault — every rot factor moves

├── research/              # Python (uv): classifier, MarkdownMemBench v0.1, Chroma replication

└── docs/                  # banner, logo, claude-code-hook setup, demo page, screenshots

```

## 🔐 The signed journal in one diagram

```mermaid

flowchart TD

    K["Per-vault Ed25519 keypair
.mnemoscope/keys/ed25519.key (mode 0600)"]

    E1["Entry 1
prevHash = GENESIS
sig = σ1"]

    E2["Entry 2
prevHash = SHA256 of σ1
sig = σ2"]

    E3["Entry 3
prevHash = SHA256 of σ2
sig = σ3"]

    K -->|signs| E1

    K -->|signs| E2

    K -->|signs| E3

    E1 -.->|chain| E2

    E2 -.->|chain| E3

    style K fill:#1a2444,stroke:#a78bfa,color:#cbd5e1

    style E1 fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style E2 fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

    style E3 fill:#0e1530,stroke:#5fd9d1,color:#cbd5e1

```

| Attack | Detected by |

|---|---|

| Edit a field of any single entry | per-entry signature mismatch |

| Delete an entry | next entry's `prevHash` no longer matches |

| Reorder two entries | both signatures still verify, but the chain breaks |

| Forge an entry with a different key | `keyFingerprint` flagged as foreign |

## 🤝 Voisins (not competitors)

| Project | What it does | Where Mnemoscope sits |

|---|---|---|

| [Anthropic Memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool) | Official, file-based, primitive | We add the rot scoring + signed chained journal Anthropic does not provide |

| [Letta](https://letta.com) / [MemGPT](https://github.com/letta-ai/letta) | Runtime-stateful agents | Different layer — we sit *under* the agent |

| [Mem0](https://mem0.ai), [Zep](https://getzep.com), [Cognee](https://cognee.ai) | Generic memory stores | Different scope — we are MD-vault-native |

| [MemPalace](https://mempalace.tech) | Viral OSS memory MCP | Not Obsidian-specific; complementary |

| [Smart Connections](https://smartconnections.app) | RAG-vector for Obsidian | Co-installable; we are runtime / forensics, they are search |

| [Basic Memory MCP](https://github.com/basicmachines-co/basic-memory) | Semantic graph over markdown | Closest in spirit — we want to interop, not duplicate |

| [claude-memory-compiler](https://github.com/coleam00/claude-memory-compiler) | MD-compiler approach | Reach out before duplicating |

> [!IMPORTANT]

> If you maintain one of these projects and see overlap or complementarity, please [open an issue](https://github.com/toonight/Mnemoscope/issues/new) — collaboration is the explicit design goal.

## 🔬 Scientific posture

Mnemoscope is meant to be a tool **and** a contribution to the public empirical record on agent memory.

| Research thread | Status | Why it matters |

|---|---|---|

| **MarkdownMemBench v0.1** | 🟢 schema + sample dataset + harness shipping | Today's benchmarks ([LongMemEval](https://arxiv.org/pdf/2410.10813), [LoCoMo](https://snap-research.github.io/locomo/)) are conversational and English-only. There is no public bench for vault-native, MD-native agent memory. |

| **Predictive Context Rot classifier** | 🟢 trained on **50 real `(signature, observed_loss)` rows** graded by `gemma4:26b` (Q4_K_M, `num_ctx=40000`). Random Forest wins out — **R² = 0.58, MAE = 0.14** on a held-out 10-row split — confirming the rot surface has interactions a linear model can't capture (Ridge collapses from 0.85 on the synthetic baseline to 0.14 on real data). First public observation of Chroma 2025's "structured > shuffled is worse" effect on real Markdown vaults graded by a real LLM (`structural_coherence` r = +0.30 vs observed loss). [Audit metadata](./research/classifier/model.json) | Every existing benchmark measures degradation *after* injection. We predict it *before*, with a calibrated baseline anyone can extend by dropping a fresh `measurements.csv` next to the existing one and re-training. |

| **Replication of Chroma's *"structured > shuffled is worse"*** | 🟢 runner + offline & online grading shipping; real-corpus runs pending vault contributions | Chroma showed coherent haystacks underperform shuffled ones on NIAH. Nobody has replicated or refuted this on real Obsidian vaults yet. The runner ([`research/replication/`](./research/replication)) needs only an API key and a vault path. |

Each thread lives in [`research/`](./research) and will produce a preprint alongside the code.

## 🛣️ Roadmap

### Done

- [x] **Publish the three packages on npm at v0.2.0** with [OIDC Trusted Publishing](https://docs.npmjs.com/trusted-publishers/) — `@mnemoscope/{core,mcp-server,cli}` are live on the npm registry. CI publishes automatically on tag push, no rotating token required, provenance attestations emitted on every publish.

- [x] **List the MCP server on the [Official MCP Registry](https://registry.modelcontextprotocol.io/)** — `io.github.toonight/mnemoscope @ 0.2.0` is indexed. PulseMCP ingests the official registry daily, so the server appears there too within ~7 days, no separate submission required.

- [x] **Submit the Obsidian plugin to the community plugins directory** — [obsidianmd/obsidian-releases#12354](https://github.com/obsidianmd/obsidian-releases/pull/12354) passes automated validation; awaiting human review (typical 2–4 weeks).

- [x] **Periodic remote attestation** — OpenTimestamps anchoring of every journal-entry signature, upgradable to a Bitcoin-backed proof via the upstream `ots` CLI ([docs/timestamping.md](./docs/timestamping.md)).

- [x] **Calibrate the predictive classifier on real LLM measurements** — 50 `(signature, observed_loss)` rows graded by `gemma4:26b`, Random Forest wins at R² = 0.58 / MAE = 0.14 on the held-out split; first public observation of the Chroma 2025 "structured > shuffled is worse" effect on real Markdown vaults graded by a real LLM ([model.json](./research/classifier/model.json)).

- [x] **Lint locally with the same plugin the Obsidian reviewer uses** — `eslint-plugin-obsidianmd` is wired into the project (root `eslint.config.mjs`, `npm run lint`) and gates CI, so reviewer-bot findings land at commit time instead of review time.

### Next

- [ ] Dogfood the auto-journal hook on the author's vault for two full weeks; tune heuristics against observed Claude Code session outcomes

- [ ] Wire the calibrated `model.onnx` into `@mnemoscope/core` via `onnxruntime-node` (optional dependency) so `predict_rot` returns the model's prediction next to the v0 heuristic

- [ ] Release **MarkdownMemBench v1** with 50–200 contributed real vaults

- [ ] Preprint #1: replication of Chroma *Context Rot* on real Obsidian vaults

- [ ] List on [Glama](https://glama.ai/mcp) (catalog ingestion path complementary to PulseMCP)

Full history: [CHANGELOG.md](./CHANGELOG.md).

## 🧑‍🤝‍🧑 Contributing

PRs are welcome but the most useful first step is opening an issue describing what you want to do. See [CONTRIBUTING.md](./CONTRIBUTING.md) for code style and process.

If you are a **researcher** at Letta, Chroma, Mem0, Cognee, OSU-NLP, Snap Research or any related lab and you see overlap with the *Predictive Context Rot* or *MarkdownMemBench* axes, please reach out — the project is explicitly designed for this.

## 📜 License

[Apache License 2.0](./LICENSE). Apache-2.0 was chosen over MIT for its explicit patent grant, which we believe is appropriate for a project introducing novel scoring methods in an active research area.

## 🙏 Acknowledgements

Mnemoscope's framing borrows directly from public work by:

- [Chroma Research — *Context Rot* (July 2025)](https://www.trychroma.com/research/context-rot)

- [Letta — *Is a Filesystem All You Need?* (August 2025)](https://www.letta.com/blog/benchmarking-ai-agent-memory)

- [Letta — *Sleep-time Compute* (2025)](https://www.letta.com/blog/sleep-time-compute)

- [Microsoft — *LazyGraphRAG* (June 2025)](https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/)

- [HippoRAG (NeurIPS'24, OSU-NLP)](https://github.com/osu-nlp-group/hipporag)

- [LongMemEval (ICLR 2025)](https://arxiv.org/pdf/2410.10813)

- [LoCoMo (Snap Research)](https://snap-research.github.io/locomo/)

- [Liu et al., *Lost in the Middle* (2023)](https://arxiv.org/abs/2307.03172)

- [Andrej Karpathy's LLM Wiki proposal (April 2026)](https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2)

Without their public artifacts, this project would not be possible.



_{🧠  predict · witness · tier  🧠}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/toonight/mnemoscope

Awesome Lists containing this project

README