https://github.com/avanrossum/pmem-project-memory-tool-for-claude-code
Local-first RAG memory for Claude Code. Semantic search over your project's docs, decisions, and history. No external APIs. Setup in 2 minutes.
https://github.com/avanrossum/pmem-project-memory-tool-for-claude-code
ai-tools anthropic chromadb claude claude-code developer-tools local-llm mcp memory model-context-protocol ollama rag semantic-search vector-search
Last synced: 3 months ago
JSON representation
Local-first RAG memory for Claude Code. Semantic search over your project's docs, decisions, and history. No external APIs. Setup in 2 minutes.
- Host: GitHub
- URL: https://github.com/avanrossum/pmem-project-memory-tool-for-claude-code
- Owner: avanrossum
- License: mit
- Created: 2026-03-25T16:34:58.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-26T15:08:14.000Z (3 months ago)
- Last Synced: 2026-03-27T05:34:46.530Z (3 months ago)
- Topics: ai-tools, anthropic, chromadb, claude, claude-code, developer-tools, local-llm, mcp, memory, model-context-protocol, ollama, rag, semantic-search, vector-search
- Language: Python
- Size: 101 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# pmem — Project Memory Tool
A portable, local-first RAG memory layer for [Claude Code](https://docs.anthropic.com/en/docs/claude-code) projects. Gives Claude semantic search over your project's documentation, decisions, and history — using local models, with no external API dependencies.
Think of it as long-term memory that persists across Claude Code sessions, queryable via MCP.
## Why this exists
I use Claude Code for more than writing code. I run specialized agents that maintain context across infrastructure, documentation, content pipelines, and operational workflows — sometimes six or more projects simultaneously. Each project accumulates hundreds of markdown files: architecture decisions, task logs, lessons learned, archived roadmaps.
Grepping through all of that wastes tokens and misses semantic matches. "What did we decide about the auth flow?" doesn't match "JWT was chosen over session tokens because..." — not with grep, anyway.
So I built institutional memory for AI agents. pmem indexes your project's documentation into a local vector store, and Claude queries it by meaning instead of by keyword. No data leaves your machine. Setup takes two minutes.
Read more about the methodology behind this: [Cognitive Offloading](https://mipyip.com/blog/cognitive-offloading) and [The Governance Documents](https://mipyip.com/blog/the-governance-documents).
### How pmem differs from session memory tools
Most Claude Code memory tools — claude-mem, claude-brain, supermemory — solve session continuity: what did Claude do last time? They capture Claude's actions, compress conversation history, and replay it into future sessions.
pmem solves a different problem: **what does the project know?**
Your project has architecture decisions, task logs, lessons learned, archived roadmaps, and governance documents accumulated over months. That institutional knowledge exists in files, not in session transcripts. When you ask "why did we choose this auth approach?" the answer isn't in what Claude did yesterday — it's in an ADR you wrote three months ago.
| | Session memory tools | pmem |
|---|---|---|
| **Remembers** | What Claude did | What the project documented |
| **Source data** | Session transcripts, tool usage | Markdown, text, code files in your repo |
| **Search method** | Keyword / hybrid over sessions | Semantic (vector) search over project docs |
| **Requires** | Cloud API or session capture hooks | Local only — Ollama + ChromaDB, no API keys |
| **Use case** | "Continue where we left off" | "What did we decide about X six months ago?" |
pmem doesn't replace session memory. It fills the gap that session memory can't: retrieving decisions, context, and rationale from your project's documentation by meaning, not by keyword.
### Real-world comparison
Same query — "identify governance-related blog posts" — run against a project with 500+ markdown files:
| | **pmem (index-based)** | **Fresh search (Explore agent)** |
|---|---|---|
| **Results found** | 18 posts | 11 posts |
| **Time** | ~20 seconds | ~90 seconds |
| **Token cost** | ~5,500 | ~20,000–24,000 |
| **Missed** | — | 7 posts (governance as supporting theme) |
The fresh search cost roughly **4× the tokens** and found **7 fewer results**. The posts it missed were the ones where governance was woven into the argument without being the headline topic — exactly the kind of semantic connection that keyword search can't make.
For the full breakdown — architecture decisions, the prompt that built it, and token cost analysis — read the [build story on the blog](https://mipyip.com/blog/project-memory-for-claude-code).
## How it works
```
Claude Code → MCP tool call → pmem server
↓
embed query (Ollama)
↓
search ChromaDB (local)
↓
(optional) synthesize answer via local LLM
↓
return answer + sources to Claude
```
pmem indexes your project's markdown and text files into a local vector database (ChromaDB). When Claude needs context, it queries the memory via MCP tools — no copy-pasting, no manual file pointing.
## Quick start
### 1. Prerequisites
- Python 3.11+
- [Ollama](https://ollama.ai) installed and running
Pull the embedding model (~274MB, one-time):
```bash
ollama pull nomic-embed-text
```
### 2. Install pmem
> **PyPI package coming soon.** Once published, installation will be just `pip install pmem`. For now, install from source:
```bash
git clone https://github.com/avanrossum/pmem-project-memory-tool-for-claude.git
cd pmem-project-memory-tool-for-claude
pip install -e .
pmem install-skills
```
### 3. Register the MCP server
Add to `~/.claude.json` (global, all projects) or `.mcp.json` (per-project):
```json
{
"mcpServers": {
"project-memory": {
"command": "/full/path/to/pmem",
"args": ["serve"]
}
}
}
```
> **Important:** Use the full path to `pmem`, not just `"pmem"`. Claude Code spawns MCP servers as subprocesses without your shell profile, so pyenv shims and other version managers won't work. Run `which pmem` to get the path. `pmem init` prints the correct snippet automatically.
>
> **Note:** MCP servers go in `~/.claude.json` or `.mcp.json`, NOT in `~/.claude/settings.json` (which is for permissions and hooks only).
### 4. Initialize in your project
```bash
cd ~/your-project
pmem init
pmem index
```
That's it. Claude Code can now query your project's memory.
## CLI reference
```
pmem init Create .memory/config.json with sensible defaults
pmem index Incremental index (only changed files)
pmem index --force Full reindex (re-embed everything)
pmem index --dry-run Show what would be indexed
pmem query "your question" Query memory from the terminal
pmem query "..." --no-llm Return raw chunks (no LLM synthesis)
pmem status Show index state, stale files, config
pmem exclude "snapshots/**" Add a pattern to the exclude list
pmem include "**/*.py" Add a pattern to the include list
pmem serve Start the MCP server (used by Claude Code)
pmem config Print current config
pmem config --edit Open config in $EDITOR
pmem config --global Show global config
pmem config --init-global Create global config at ~/.config/pmem/config.json
pmem watch Poll for changes and reindex automatically (every 5s)
pmem install-skills Install /welcome, /sleep, /reindex to Claude Code
pmem install-skills --link Symlink instead of copy (macOS/Linux)
```
> **Note:** Don't run `pmem index` from the terminal while Claude Code is active on the same project — use the `memory_reindex` MCP tool (or `/reindex` skill) instead. `pmem watch` uses polling (not filesystem events) so it works reliably on all platforms.
## MCP tools
Once registered, Claude Code has access to four tools:
| Tool | Description |
|------|-------------|
| `memory_query` | Ask a natural language question — retrieves relevant chunks and optionally synthesizes an answer via a local LLM |
| `memory_search` | Search for matching chunks with source locations (no synthesis) |
| `memory_status` | Check index state: file count, chunk count, stale files, config |
| `memory_reindex` | Trigger a reindex from within Claude Code |
## Configuration
`pmem init` creates `.memory/config.json` in your project root:
```json
{
"project_name": "my-project",
"embedding": {
"endpoint": "http://localhost:11434",
"model": "nomic-embed-text",
"provider": "ollama"
},
"llm": {
"endpoint": "http://localhost:1234/v1",
"model": "local-model",
"provider": "openai_compatible",
"enabled": false
},
"indexing": {
"include": ["**/*.md", "**/*.txt"],
"exclude": [".memory/**", "**/.git/**", "**/node_modules/**", "*.lock"],
"chunk_size": 400,
"chunk_overlap": 80,
"split_on_headers": true
},
"query": {
"top_k": 8,
"auto_reindex_on_query": false
},
"update_channel": "stable"
}
```
### Embedding providers
| Provider | Config | Notes |
|----------|--------|-------|
| `ollama` (default) | `endpoint: "http://localhost:11434"` | Uses `/api/embed` (batch). Free, local. |
| `openai_compatible` | Any OpenAI-compatible endpoint | Uses `/v1/embeddings`. Works with LMStudio, vLLM, etc. |
### LLM synthesis (optional, disabled by default)
When used via MCP with Claude Code, LLM synthesis is unnecessary — Claude interprets the raw chunks directly. Synthesis is disabled by default.
For standalone terminal use (`pmem query`), you can enable synthesis by setting `llm.enabled: true` and pointing at any OpenAI-compatible endpoint (LMStudio, Ollama's OpenAI mode, vLLM, etc.). This sends retrieved chunks to a local LLM for a summarized answer.
### Indexing options
- **`include`** — glob patterns for files to index (default: `**/*.md`, `**/*.txt`)
- **`exclude`** — glob patterns to skip (default: `.memory/**`, `.git/**`, `node_modules/**`, `*.lock`)
- **`chunk_size`** — target chunk size in words (default: 400)
- **`chunk_overlap`** — overlap between chunks in words (default: 80)
- **`split_on_headers`** — split markdown at H1/H2/H3 boundaries before splitting by size (default: true). When a section is too large for a single chunk, it's split by size — but each sub-chunk retains the heading path from its parent section, so query results always show where in the document a chunk came from.
### Indexing non-markdown files
pmem indexes markdown and text files by default, but you can add any file type:
```bash
pmem include "**/*.py"
pmem include "**/*.js"
pmem include "**/*.apex"
```
This writes to your project's `.memory/config.json` — it only affects the current project, not other projects using pmem.
Non-markdown files are chunked by size (word count with overlap), since there are no header boundaries to split on. This works well for most code and documentation formats. Language-aware chunking (splitting on function/class boundaries) is on the roadmap but not yet implemented — size-based splitting is good enough for semantic retrieval in practice.
After adding new patterns, reindex to pick up the new files:
```bash
pmem index
```
### Query options
- **`top_k`** — number of chunks to retrieve per query (default: 8)
- **`auto_reindex_on_query`** — check for stale files before every query and re-embed if needed (default: false — `/welcome`, `/sleep`, and `pmem watch` handle freshness)
### Update notifications
pmem checks GitHub for new releases once per day and shows a notice when an update is available — both in `pmem status` output and in MCP tool responses (so Claude will tell you).
By default, only stable releases trigger notifications. To opt into beta (pre-release) notifications:
```json
{
"update_channel": "beta"
}
```
Set this in `.memory/config.json` (per-project) or `~/.config/pmem/config.json` (global).
> **Warning:** Beta releases may contain breaking changes, incomplete features, or bugs. Use at your own risk. If something breaks, pin back to the last stable version with `git checkout v && pip install -e .`
## What gets created in your project
```
your-project/
└── .memory/
├── config.json ← commit this (your project's memory config)
├── chroma/ ← gitignore (generated vector store)
└── index_state.json ← gitignore (file hash registry)
```
Add to your `.gitignore` (done automatically by `pmem init`):
```
.memory/
```
> **Note:** Older versions of pmem added individual entries (`.memory/chroma/`, `.memory/index_state.json`). The single `.memory/` entry is preferred — it catches transient files like lock files that the specific entries miss.
## Skills (optional)
pmem ships with three Claude Code slash command skills:
- **`/welcome`** — Run at the start of each session. Reads governance files, runs incremental reindex, confirms readiness.
- **`/sleep`** — Run at the end of each session. Full governance pass: updates tasks, docs, changelog, memory, and reindexes.
- **`/reindex`** — Quick trigger to refresh the memory index mid-session.
### Install skills
```bash
# Recommended: use the built-in installer
pmem install-skills
# Or with symlinks (stays in sync with repo, macOS/Linux only)
pmem install-skills --link
```
Or manually:
```bash
# Copy
cp skills/welcome.md ~/.claude/commands/welcome.md
cp skills/sleep.md ~/.claude/commands/sleep.md
cp skills/reindex.md ~/.claude/commands/reindex.md
# Or symlink (macOS/Linux only)
ln -sf "$(pwd)/skills/welcome.md" ~/.claude/commands/welcome.md
ln -sf "$(pwd)/skills/sleep.md" ~/.claude/commands/sleep.md
ln -sf "$(pwd)/skills/reindex.md" ~/.claude/commands/reindex.md
```
## Recommended CLAUDE.md snippet
Add this to any project using pmem so Claude knows it's available:
```markdown
## Project Memory
This project has a local RAG memory index via `pmem`. Use the `memory_query` MCP tool when:
- Looking for past decisions, context, or rationale ("why did we do X?")
- Searching for historical task context or outcomes
- Finding documented gotchas or lessons learned
Do NOT use memory_query for: reading specific known files, checking current code
state, or anything derivable from `git log`. The index updates at session start
(`/welcome`) and session end (`/sleep`), so it may be slightly behind mid-session.
If results seem stale, run `memory_reindex` to refresh.
```
## Hardware notes
| Setup | Embedding | LLM synthesis |
|-------|-----------|---------------|
| Any Mac (even 8GB) | Runs locally — nomic-embed-text is tiny | Point at a remote machine or disable |
| 32GB+ Mac | Runs locally | Run 8B–32B model locally via Ollama/LMStudio |
| Dedicated server (Mac Studio, etc.) | Runs locally | Run 70B+ model, expose via Cloudflare tunnel |
## Design principles
- **Local-first** — no data leaves your machine. No API keys required.
- **Portable** — install once globally, `pmem init` in any project.
- **Low friction** — setup takes under 2 minutes. Querying is automatic via MCP.
- **Minimal dependencies** — no LangChain, no LlamaIndex. Just ChromaDB, httpx, click, pathspec, and the MCP SDK.
## Related reading
- [Cognitive Offloading](https://mipyip.com/blog/cognitive-offloading) — The methodology behind deliberate memory externalization
- [The Governance Documents](https://mipyip.com/blog/the-governance-documents) — ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md — the files pmem was built to index
- [What Is Pass@1?](https://mipyip.com/blog/what-is-pass-at-1) — The development methodology where governance documents are thorough enough that AI generates correct implementations on the first attempt
## Author
Built by [Alex van Rossum](https://mipyip.com/about) — systems architect, fractional CTO, and the kind of person who builds tools when the existing ones waste too many tokens.
## License
MIT