{"id":50796349,"url":"https://github.com/memstem/memstem","last_synced_at":"2026-06-12T15:01:06.664Z","repository":{"id":354744609,"uuid":"1220952538","full_name":"Memstem/memstem","owner":"Memstem","description":"Unified memory and skill infrastructure for AI agents — pull-based ingestion across Claude Code, OpenClaw, Codex, Cursor, and more. Markdown canonical + SQLite hybrid index + MCP-native API.","archived":false,"fork":false,"pushed_at":"2026-06-05T12:33:25.000Z","size":1509,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-05T14:12:38.835Z","etag":null,"topics":["ai-agent","anthropic","claude-code","knowledge-base","mcp","mcp-server","memory","python","second-brain"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Memstem.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-25T15:00:11.000Z","updated_at":"2026-06-05T12:33:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Memstem/memstem","commit_stats":null,"previous_names":["memstem/memstem"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/Memstem/memstem","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Memstem%2Fmemstem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Memstem%2Fmemstem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Memstem%2Fmemstem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Memstem%2Fmemstem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Memstem","download_url":"https://codeload.github.com/Memstem/memstem/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Memstem%2Fmemstem/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34249561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","anthropic","claude-code","knowledge-base","mcp","mcp-server","memory","python","second-brain"],"created_at":"2026-06-12T15:00:21.970Z","updated_at":"2026-06-12T15:01:06.653Z","avatar_url":"https://github.com/Memstem.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Memstem\n\n[![Stars](https://img.shields.io/github/stars/Memstem/memstem?style=social)](https://github.com/Memstem/memstem/stargazers)\n[![PyPI](https://img.shields.io/pypi/v/memstem)](https://pypi.org/project/memstem/)\n[![CI](https://github.com/Memstem/memstem/actions/workflows/ci.yml/badge.svg)](https://github.com/Memstem/memstem/actions/workflows/ci.yml)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)\n\nUnified memory and skill infrastructure for AI agents. One canonical knowledge store. Many AI clients. No version-fragility.\n\n\u003e A central memory with stems reaching out to other systems, drawing their memories in.\n\n![Memstem — one memory layer for every AI agent](./docs/images/hero.png)\n\n**If memstem helps you, please ⭐ [the repo](https://github.com/Memstem/memstem)** — there's no telemetry here, so stars are the only signal I have for whether to keep building this in the open.\n\n## What it is\n\nMemstem is a **standalone memory service** that acts as the single source of truth for memories and skills shared across multiple AI environments. Unlike traditional memory layers that you push to from each AI, Memstem **pulls** from the filesystem of each connected AI — so it's immune to upgrade churn in any of them.\n\nConnect Claude Code, OpenClaw, Codex, Cursor, Aider, Hermes — Memstem watches each system's session and memory files, ingests new content within seconds, and exposes one unified search API via MCP.\n\n## Why\n\nExisting AI memory systems break when their host upgrades. Push-based hooks fail silently across version changes. Each AI has its own memory format, and there's no clean way to share knowledge across them.\n\nMemstem solves this by:\n\n- **Pull-based ingestion** via `inotify` / FSEvents filesystem watchers — no hooks, no push APIs to break\n- **Markdown-canonical storage** — files are the truth, the index is rebuildable\n- **Hybrid search** — BM25 (FTS5) + cosine similarity (sqlite-vec) + reciprocal rank fusion\n- **Multi-AI adapters** — pluggable per-system ingestion (Claude Code, OpenClaw, Codex, etc.)\n- **MCP-native API** — every modern AI agent can call it\n\n## Architecture (one paragraph)\n\nMarkdown files in a structured tree are the canonical store. A SQLite database with FTS5 and sqlite-vec is the rebuildable index. A daemon watches each connected AI's filesystem and ingests deltas. An MCP server exposes search, get, and skill retrieval to clients. A hygiene loop runs inside the daemon — distilling sessions, judging duplicates, scoring importance, and building project records on configurable intervals.\n\nSee [ARCHITECTURE.md](./ARCHITECTURE.md) for the full design and [ROADMAP.md](./ROADMAP.md) for the phase plan.\n\n## Status\n\n**v0.16.1 — actively developed, running in production.**\nLive on the maintainer's infrastructure, ingesting from multi-agent\nOpenClaw, Claude Code, and Codex in real time. The 0.13 line added the\nrecall-quality stack (cross-encoder reranking + MMR, multimodal\nembeddings, a validated fully self-hosted Qwen3 recall setup); 0.14\nthrough 0.16 were three reliability batches from a full-codebase\nreview — durability, concurrency, failure visibility, embed-queue\nclaim/lease, and dedup-judge correctness. See\n[CHANGELOG.md](./CHANGELOG.md) for the release-by-release history.\nShipping:\n\n- **Hybrid search** (FTS5 BM25 + sqlite-vec cosine, merged with RRF) over a\n  markdown-canonical vault. Index is rebuildable from the files.\n- **Five MCP tools** (`memstem_search`, `_get`, `_list_skills`, `_get_skill`,\n  `_upsert`) plus a co-hosted local HTTP API on `127.0.0.1:7821` for\n  first-party clients (CLI tools, future editor extensions).\n- **Pluggable embedders** — Ollama (local default), OpenAI, Gemini, Voyage, or\n  any OpenAI-compatible server — selectable via `_meta/config.yaml`. For a\n  **self-hosted, no-cloud** setup the recommended embedder is\n  **Qwen3-Embedding-8B** (4096-dim, instruction-tuned); see\n  [Embedding provider](#embedding-provider--pick-one). Always-on embed queue\n  with retry/backoff and idle-timeout self-exit.\n- **Cross-encoder reranking + MMR** — opt-in recall-quality\n  pass that re-orders hybrid-search candidates with an LLM reranker and\n  diversifies near-duplicates with MMR, wired into config, the daemon, MCP, and\n  CLI (`--rerank`, `--mmr`, `--rerank-top-n`). Off by default; pair it with a\n  self-hosted Gemma/Qwen reranker for zero per-query cloud cost. See\n  [Search \u0026 reranking](#search--reranking-recall-quality).\n- **Derived records** — `memstem hygiene\n  distill-sessions` produces `type: distillation` companion records\n  for meaningful sessions, and `memstem hygiene project-records`\n  aggregates per-project-tag sessions into `type: project` rollups.\n  Both are CLI-driven, idempotent, opt-in (NoOp default; pluggable\n  OpenAI / Ollama summarizer). Direct fix for \"the project where we\n  did X\" queries that today fail to surface project work that\n  exists in the vault. See\n  [docs/distillation-verification.md](./docs/distillation-verification.md).\n- **In-daemon hygiene loop** — `memstem daemon` runs\n  the four hygiene stages (distill-sessions, dedup-judge, importance,\n  project-records) as background tasks alongside the watchers and embed\n  workers, each on its own configurable interval with per-stage locking\n  and failure isolation. `GET /health` exposes per-stage `last_run`\n  timestamps for fleet monitoring; set `loop_enabled: false` on\n  multi-tenant hosts where the customer hasn't authorized LLM spend.\n  See [ADR 0023](./docs/decisions/0023-in-daemon-hygiene-loop.md).\n- **OpenAI-compatible LLM backends for hygiene** — the\n  dedup judge and summarizer speak the OpenAI chat-completions protocol,\n  so dedup judging, distillation, and project-records can run against a\n  self-hosted vLLM / TGI / LM Studio / LiteLLM endpoint via a `base_url`\n  override — no per-customer cloud billing. The audit log and provenance\n  honestly label which service produced each verdict (`openai:gpt-…` for\n  OpenAI Inc., `openai-compat:gemma-…` for a self-hosted endpoint).\n- **Codex adapter** — third filesystem adapter (after\n  Claude Code and OpenClaw), watching `~/.codex/sessions|skills|memories`;\n  enabled by default and no-ops silently on hosts without Codex. Codex\n  sessions group by project tag alongside Claude Code's. See\n  [ADR 0022](./docs/decisions/0022-codex-adapter.md).\n- **Post-cleanup operator workflow** — `memstem hygiene\n  verify` is a single read-only command that summarizes vault state\n  after a cleanup + backfill sweep: per-type counts, distillation\n  coverage, undistilled-eligible sessions remaining, dedup /\n  noise findings cleanup-retro would still flag, open skill review\n  tickets, and parser/validation skips. Optional `--json-out`\n  emits a machine-readable payload for CI / monitoring. Replaces\n  ad-hoc SQLite inspection. See the [post-cleanup playbook in\n  docs/operations.md](./docs/operations.md#post-cleanup-operator-playbook).\n- **Explicit ranking policy** — `SearchConfig.type_bias`\n  multiplies each result's score by a small per-type weight so\n  default search clearly prefers curated/derived records (distillation\n  1.10, memory/skill/project 1.05) over raw conversational sessions\n  (0.85). Bounds are intentionally tight (`[0.85, 1.10]`) — the bias\n  breaks ties without overriding relevance. Tunable per-vault in\n  `_meta/config.yaml`; an empty mapping recovers pre-0.10 behaviour.\n- **Quality pipeline** — write-time noise filter, exact-body hash dedup\n  (Layer 1), TTL tagging for transient kinds, boot-echo hash filter —\n  keeps the vault from being polluted by AI-session firehose.\n- **`memstem auth`** for persistent embedder API keys\n  (`~/.config/memstem/secrets.yaml`, mode 0600), so cron, PM2, systemd,\n  and headless servers don't need per-shell exports.\n- **Secret handling (architecture and policy locked, implementation\n  in phases).** Memstem is being extended with a `SecretBackend`\n  interface, agent-side `vault.put` / `vault.get` tools, system-prompt\n  guidance, and an ingest-time regex pack that redacts known-format\n  secrets to vault placeholders before they enter the index. Scope and\n  limits are documented up front so customers know what Memstem will\n  and will not commit to — it is not a guaranteed secret scanner. See\n  [docs/secrets.md](./docs/secrets.md) for the full responsibility\n  boundary and shipping-status table.\n- **Operational tooling** — `memstem init`, `doctor`, `connect-clients`\n  (idempotent wiring into `~/.claude.json` and each OpenClaw agent's\n  `openclaw.json`), `migrate` (FlipClaw → Memstem one-shot), a\n  one-line `install.sh`, and a 15-second e2e smoke test\n  (`scripts/e2e-smoke.sh`).\n\nCross-platform CI runs Linux (gating) plus macOS and Windows\n(experimental, `continue-on-error: true` — sqlite-vec needs\n`enable_load_extension`, which `actions/setup-python`'s macOS build\ndoesn't ship; native Windows is WSL2-only by design for v0.x).\n1,400+ tests passing. See [ROADMAP.md](./ROADMAP.md) for what's\nnext.\n\n## Quickstart\n\nThe full one-liner. Installs everything (memstem, Ollama, embedding model), scaffolds the vault, imports your existing Claude Code + OpenClaw memory, wires Memstem into Claude Code, and starts the daemon under PM2:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/Memstem/memstem/main/scripts/install.sh | bash -s -- \\\n  --yes --connect-clients --migrate --migrate-no-embed --start-daemon\n```\n\nThe default uses **Ollama** (local, no API key, no network call). To install with a cloud embedder in one go:\n\n```bash\n# OpenAI (text-embedding-3-large at 3072 dimensions)\ncurl -fsSL https://raw.githubusercontent.com/Memstem/memstem/main/scripts/install.sh | bash -s -- \\\n  --yes --embedder openai --openai-key \"$OPENAI_API_KEY\" \\\n  --connect-clients --migrate --start-daemon\n\n# Or Voyage / Gemini — same shape:\n#   --embedder voyage --voyage-key \"$VOYAGE_API_KEY\"\n#   --embedder gemini --gemini-key \"$GEMINI_API_KEY\"\n```\n\nPicking `--embedder openai|gemini|voyage` implies `--no-ollama` (cloud doesn't need a local daemon). The key gets stored via `memstem auth set \u003cprovider\u003e`, so cron, PM2, and fresh shells all pick it up afterward without per-shell exports.\n\nThe `--migrate-no-embed` flag is the practical default on a CPU-only Ollama box: it imports records to vault + FTS5 in minutes instead of hours. After it returns:\n\n```bash\nmemstem search \"what did we decide about pricing\"   # FTS5 hits work immediately\npm2 logs memstem --lines 20                          # watch ingestion + embed worker\nmemstem doctor                                       # `Embed queue: N pending` shows backfill progress\n```\n\nEmbedding is **always queued** rather than inline (see ADR 0009): the migrate finishes in seconds and the daemon's embed worker drains the queue at its own pace. On CPU-only Ollama that means semantic search becomes \"good\" over an hour or two; on the API providers above it's done in seconds.\n\nManual install if you'd rather not pipe a script:\n\n```bash\npipx install memstem                         # or: pip install memstem\nollama pull nomic-embed-text                 # 768-dim local embedder\nmemstem init ~/memstem-vault                 # interactive wizard\nmemstem migrate --apply                      # one-shot history import\nmemstem connect-clients                      # patch settings + CLAUDE.md\nmemstem doctor                               # verify\nmemstem daemon                               # ingest + watch\n```\n\n**On macOS,** use a Homebrew or pyenv Python — the system Python ships a SQLite that can't load the `sqlite-vec` extension.\n\nThe full install reference — every installer flag, API-key handling, the macOS detail, and exactly what `connect-clients` edits — is in [docs/install.md](./docs/install.md).\n\n## Querying from an agent\n\nOnce `memstem connect-clients` has run, an MCP-aware client (Claude Code, etc.) sees five tools:\n\n| Tool | Purpose |\n|---|---|\n| `memstem_search` | Hybrid (FTS5 + vector) search across the vault |\n| `memstem_get` | Fetch a memory by id or vault path |\n| `memstem_list_skills` | List skills, optionally filtered by scope |\n| `memstem_get_skill` | Fetch a skill by title |\n| `memstem_upsert` | Create or update a memory record |\n\nSee [docs/mcp-api.md](./docs/mcp-api.md) for the full schema.\n\nEvery search runs in parallel down two paths and is merged with Reciprocal Rank Fusion, so exact-keyword hits and semantic neighbours both surface in one ranked list:\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./docs/images/hybrid-search.png\" alt=\"Hybrid search — FTS5 BM25 + sqlite-vec cosine, merged with RRF\" width=\"540\"\u003e\u003c/p\u003e\n\n## Configuration\n\n`~/memstem-vault/_meta/config.yaml` controls embedding, search, and adapters. The wizard writes a sensible default; common edits:\n\n### Embedding provider — pick one\n\nMemstem ships several providers. **Default is local Ollama** (zero-config, no API key). **For a high-quality self-hosted setup with no cloud API, the recommended embedder is Qwen3-Embedding-8B** (see the self-hosted block below). Switch by editing the `embedding:` block (then `memstem reindex` so existing vectors get redone against the new provider).\n\n```yaml\n# Default — local, no API key\nembedding:\n  provider: ollama\n  model: nomic-embed-text\n  dimensions: 768\n```\n\n```yaml\n# Google Gemini — Matryoshka shortening lets you keep any dim you want\n# (768 = same as Ollama, no reindex when switching from Ollama default).\nembedding:\n  provider: gemini\n  model: gemini-embedding-2-preview     # default; ~20% recall over -001, 8k context\n  api_key_env: GOOGLE_API_KEY\n  dimensions: 768            # 768 / 1536 / 3072 — Matryoshka truncates the native 3072d\n```\n\nPin `model: gemini-embedding-001` if you'd rather have the production-stable predecessor (the \"preview\" label means Google may change behavior; new-RAG quality vs API stability is your call).\n\n```yaml\n# OpenAI — or any OpenAI-compatible endpoint (Together, Mistral, Groq, vLLM, LM Studio)\nembedding:\n  provider: openai\n  model: text-embedding-3-small\n  api_key_env: OPENAI_API_KEY\n  dimensions: 1536\n  # base_url: https://api.together.xyz/v1   # for OpenAI-compatible providers\n```\n\n```yaml\n# Voyage — Anthropic's recommended embedding partner; tops retrieval benchmarks\nembedding:\n  provider: voyage\n  model: voyage-3\n  api_key_env: VOYAGE_API_KEY\n  dimensions: 1024\n```\n\n```yaml\n# Recommended self-hosted (no cloud API) — Qwen3-Embedding-8B on vLLM.\n# Instruction-tuned 4096-dim retriever, served over the OpenAI-compatible path\n# (point base_url at your own server). Pair with the query_instruction below and\n# a self-hosted reranker (see \"Search \u0026 reranking\") for the full no-cloud stack.\nembedding:\n  provider: openai                       # OpenAI-compatible client\n  model: qwen3-text-embed                # the name your vLLM serves\n  base_url: http://your-vllm-host:8000/v1\n  api_key_env: OPENAI_API_KEY            # any non-empty token; vLLM ignores it\n  dimensions: 4096\n  query_instruction: \"Given a search query, retrieve relevant memories, notes, and documents that answer it\"\n```\n\nAPI keys are read from environment variables named in `api_key_env` — they never land in the vault. `embedding.workers` (default 2) and `embedding.batch_size` (default 8) tune the queue throughput; CPU Ollama is happiest at 1 worker, API providers tolerate 4+.\n\n### Search \u0026 reranking (recall quality)\n\nHybrid search (BM25 + vector, merged with RRF) works out of the box. For higher\nprecision, enable the **reranker + MMR** pass: it re-orders the\ntop candidates with an LLM and diversifies near-duplicates. Off by default —\nopt in per vault:\n\n```yaml\nsearch:\n  mmr_lambda: 0.5            # 0 = max diversity, 1 = pure relevance\n  rerank_top_n: 15           # candidate pool the reranker re-scores\n  reranker:\n    enabled: true\n    provider: openai         # OpenAI-compatible — also works against a self-hosted vLLM box\n    model: gemma-4-e4b-it    # or gpt-4o-mini, qwen2.5:7b, ...\n    base_url: http://your-vllm-host:8000/v1\n    api_key_env: OPENAI_API_KEY\n```\n\nPer-query overrides: `memstem search \"q\" --rerank --mmr 0.5 --rerank-top-n 15`\n(and `--no-rerank` to skip). Together with the Qwen3 embedder + `query_instruction`\nabove, this is the validated **fully self-hosted recall stack** — no per-query cloud\ncost. For picking the reranker LLM, see\n[recall-quality model recommendations](./docs/recall-models.md).\n\n### Adapters\n\n```yaml\nembedding:\n  provider: ollama\n  model: nomic-embed-text\n  base_url: http://localhost:11434\n  dimensions: 768\n\nadapters:\n  openclaw:\n    agent_workspaces:\n      - { path: ~/assistant, tag: assistant }\n      - { path: ~/support-agent, tag: support }\n    shared_files:\n      - ~/assistant/RULES.md\n  claude_code:\n    project_roots:\n      - ~/.claude/projects\n    extra_files:\n      - ~/.claude/CLAUDE.md\n```\n\nRun `memstem doctor` after edits to verify every configured target exists and the embedder is reachable.\n\n## Distillation + project records\n\nTwo hygiene commands turn raw session transcripts and per-project\nsession sets into retrieval-shaped derived records. Both are\n**CLI-driven, idempotent, and opt-in** — NoOp is the install-time\ndefault, you opt into a real summarizer explicitly.\n\n```bash\n# One-shot backfill at cutover (or any time you want to refresh):\nmemstem auth set openai sk-...\nmemstem hygiene distill-sessions --backfill --provider openai --apply\nmemstem hygiene project-records --provider openai --apply\n\n# Routine refresh (post-backfill):\nmemstem hygiene distill-sessions --provider openai --apply\nmemstem hygiene project-records --provider openai --apply\n```\n\nWhat you get:\n\n- **Session distillations** at `vault/distillations/\u003csource\u003e/\u003csession_id\u003e.md` —\n  one paragraph + structured Key entities / Deliverables / Decisions /\n  Status sections per session. Provenance always points back to the\n  source transcript.\n- **Project records** at `vault/memories/projects/\u003cslug\u003e.md` — one\n  per Claude Code project tag with ≥2 sessions. Canonical project\n  name extracted from the work itself, accumulated decisions,\n  link map.\n\nBoth can also run with Ollama (`--provider ollama`, default model\n`qwen2.5:7b`) for local-only setups. See\n[docs/distillation-verification.md](./docs/distillation-verification.md)\nfor the full operator workflow (dry-run, quality spot-check, eval\ndiff, manual override) and\n[docs/recall-models.md](./docs/recall-models.md) for the model\nrecommendations + cost expectations.\n\n## Verifying it works\n\nTwo complementary commands cover \"is the install healthy?\" and \"is\nthe vault state right after a cleanup + backfill sweep?\".\n\n`memstem doctor` is the install-level check — Python, vault, index,\nembedder, and the configured adapter targets all reachable:\n\n```text\n$ memstem doctor\nMemstem doctor (vault=/home/ubuntu/memstem-vault):\n\n  ✓ Python 3.11\n  ✓ memstem 0.16.1\n  ✓ Vault: /home/ubuntu/memstem-vault\n  ✓ Config: /home/ubuntu/memstem-vault/_meta/config.yaml\n  ✓ Index opens cleanly\n  ✓ Ollama at http://localhost:11434 (nomic-embed-text)  (768 dims)\n  ✓ OpenClaw workspace: /home/ubuntu/assistant (tag=assistant)\n  ✓ Claude Code root: /home/ubuntu/.claude/projects\n\nAll checks passed.\n```\n\n`memstem hygiene verify` is the operator-level check — vault state\nafter `cleanup-retro` + `distill-sessions --backfill`. Read-only,\nsafe on production. Reports total memories, per-type breakdown,\ndistillation coverage, dedup / noise findings still detectable,\nopen skill review tickets, and any parser/validation skips\nencountered during the walk. `--json-out` writes the same payload as\nJSON for CI / monitoring scrapers:\n\n```text\n$ memstem hygiene verify\n============================================================\nMEMSTEM VERIFY\n============================================================\nVault:                    /home/ubuntu/memstem-vault\nTotal memories:           1722\n\nBy type:\n  type             total  deprecated  valid_to\n  --------------------------------------------------\n  session            665           1         1\n  memory             546         229         2\n  distillation       224           0         0\n  skill              193           0         0\n  daily               80           0         0\n  project             14           0         0\n\nCleanup state:\n  Deprecated records:                   230\n  Records with valid_to:                3\n  Active dedup collision groups:        6\n  Active dedup → would deprecate:       11\n  Active dedup skill groups (review):   6\n  Noise drops still detectable:         0\n  Noise transients still detectable:    1\n  Skill review tickets open:            6\n\nDerived records:\n  Sessions covered by distillation:     224\n  Undistilled eligible sessions left:   1\n\nParser/validation skips during scan: 0\n```\n\nThe full operator playbook (run cleanup, run backfill, run verify,\ninterpret findings, resolve skill review tickets, tune ranking) is\nin [docs/operations.md — Post-cleanup operator playbook](./docs/operations.md#post-cleanup-operator-playbook).\n\n## Platform support\n\n| OS | Support | Notes |\n|---|---|---|\n| Linux | ✅ Tested | Primary development platform. CI gates merges on Python 3.11 + 3.12. |\n| macOS | ⚠️ Supported, not CI-gated | `watchdog` uses FSEvents and the daemon runs. The CI runner's `actions/setup-python` ships a Python without `enable_load_extension`, which `sqlite-vec` needs, so macOS jobs run as `continue-on-error: true` for visibility. A user-installed Python (e.g. `brew install python@3.11`) has extension support enabled and works. |\n| Windows | ❌ Use WSL2 | Native Windows runs in CI for visibility (`continue-on-error: true`) but is not supported. Run Memstem inside WSL2; native PowerShell support is on the roadmap. |\n\n## Documentation\n\n- [Architecture](./ARCHITECTURE.md) — system design and rationale\n- [Roadmap](./ROADMAP.md) — release plan (Phases 1–5)\n- [Install guide](./docs/install.md) — installer flags, API keys, macOS notes, `connect-clients` details\n- [Operations](./docs/operations.md) — production smoke test, post-cleanup operator playbook, ranking-policy reference\n- [Frontmatter spec](./docs/frontmatter-spec.md) — the markdown schema\n- [MCP API](./docs/mcp-api.md) — tool definitions\n- [Decisions](./docs/decisions/) — Architecture Decision Records\n- [Distillation + project records — operator playbook](./docs/distillation-verification.md) — how to run the new derived-record commands and verify quality\n- [Recall-quality model recommendations](./docs/recall-models.md) — picking the right LLM for rerank / HyDE / dedup / summarization with cost expectations\n- [Recall eval results](./docs/recall-eval-results.md) — measured before/after data on the recall-quality features\n\n## License\n\nMIT — see [LICENSE](./LICENSE).\n\n## Acknowledgments\n\nMemstem builds on ideas from:\n\n- [basic-memory](https://github.com/basicmachines-co/basic-memory) — markdown + wikilinks pattern\n- [doobidoo/mcp-memory-service](https://github.com/doobidoo/mcp-memory-service) — sqlite-vec hybrid retrieval reference\n- [Karpathy's LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) — index/log pattern\n- [Graphiti](https://github.com/getzep/graphiti) — bi-temporal facts\n- [Anthropic memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool) — abstract memory interface\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemstem%2Fmemstem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmemstem%2Fmemstem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemstem%2Fmemstem/lists"}