{"id":45750549,"url":"https://github.com/shawnhack/exocortex","last_synced_at":"2026-02-25T18:15:05.153Z","repository":{"id":339625765,"uuid":"1146246075","full_name":"shawnhack/exocortex","owner":"shawnhack","description":"Personal unified memory system for AI agents. SQLite-backed, local-first, hybrid RAG retrieval with MCP integration.","archived":false,"fork":false,"pushed_at":"2026-02-20T20:14:23.000Z","size":243,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-20T20:59:06.411Z","etag":null,"topics":["ai","claude","embeddings","entity-extraction","hybrid-search","knowledge-graph","local-first","mcp","memory","model-context-protocol","personal-knowledge-management","rag","sqlite","typescript"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shawnhack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-30T20:18:21.000Z","updated_at":"2026-02-20T20:14:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/shawnhack/exocortex","commit_stats":null,"previous_names":["shawnhack/exocortex"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/shawnhack/exocortex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnhack%2Fexocortex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnhack%2Fexocortex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnhack%2Fexocortex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnhack%2Fexocortex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shawnhack","download_url":"https://codeload.github.com/shawnhack/exocortex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnhack%2Fexocortex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29834043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T17:57:15.019Z","status":"ssl_error","status_checked_at":"2026-02-25T17:56:11.472Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","claude","embeddings","entity-extraction","hybrid-search","knowledge-graph","local-first","mcp","memory","model-context-protocol","personal-knowledge-management","rag","sqlite","typescript"],"created_at":"2026-02-25T18:15:00.010Z","updated_at":"2026-02-25T18:15:05.147Z","avatar_url":"https://github.com/shawnhack.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/icon.svg\" alt=\"Exocortex\" width=\"120\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eExocortex\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  Personal unified memory system — SQLite-backed, local-first, hybrid RAG retrieval with MCP integration.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#mcp-server\"\u003eMCP Server\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#cli\"\u003eCLI\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#rest-api\"\u003eREST API\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#dashboard-1\"\u003eDashboard\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nExocortex gives AI coding agents persistent memory across sessions. It stores memories with embeddings, scores them using Reciprocal Rank Fusion, and exposes everything through an MCP server, REST API, CLI, and React dashboard. Works with any MCP-compatible tool — Claude Code, Codex, Gemini, Copilot, and others. All data stays local — no cloud, no API keys for embeddings.\n\n## Dashboard\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/dashboard.png\" alt=\"Dashboard — memory storage overview\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/memories.png\" alt=\"Memories — search, browse, and date-grouped timeline\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eMore screenshots\u003c/summary\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/entities.png\" alt=\"Entities — extracted knowledge graph nodes\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/graph.png\" alt=\"Graph — interactive force-directed knowledge graph\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/goals.png\" alt=\"Goals — objective tracking with milestones\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/entity-detail.png\" alt=\"Entity Detail — relationships and linked memories\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n\n---\n\n## Quick Start\n\nRequires **Node.js \u003e= 20** and **pnpm**.\n\n```bash\ngit clone https://github.com/shawnhack/exocortex.git\ncd exocortex\npnpm install\npnpm build        # first build downloads the embedding model (~80MB)\n```\n\nStart the server and dashboard:\n\n```bash\npnpm exec exo serve\n# → http://localhost:3210\n```\n\nOr use the CLI directly:\n\n```bash\npnpm exec exo add \"Remember this\" --tags \"test,demo\" --importance 0.8\npnpm exec exo search \"remember\" --verbose\n```\n\n### Connect an AI agent\n\nThe MCP server works with any tool that supports the [Model Context Protocol](https://modelcontextprotocol.io):\n\n**Claude Code:**\n```bash\nclaude mcp add --scope user exocortex node /path/to/exocortex/packages/mcp/dist/index.js\n```\n\n**Codex CLI** (`~/.codex/config.json`):\n```json\n{ \"mcpServers\": { \"exocortex\": { \"command\": \"node\", \"args\": [\"/path/to/exocortex/packages/mcp/dist/index.js\"] } } }\n```\n\n**Gemini CLI** (`~/.gemini/settings.json`):\n```json\n{ \"mcpServers\": { \"exocortex\": { \"command\": \"node\", \"args\": [\"/path/to/exocortex/packages/mcp/dist/index.js\"] } } }\n```\n\n**VS Code (Copilot / Cline / etc.)** (`.vscode/mcp.json`):\n```json\n{ \"servers\": { \"exocortex\": { \"command\": \"node\", \"args\": [\"/path/to/exocortex/packages/mcp/dist/index.js\"] } } }\n```\n\n---\n\n## How It Works\n\n```\nUser prompt → Agent reads/writes memories via MCP tools\n                ↓\n            MCP Server (stdio)\n                ↓\n          MemoryStore / MemorySearch (core)\n                ↓\n    ┌───────────┼───────────┐\n    │           │           │\n SQLite     FTS5 Index   Vector Store\n (memories)  (full-text)  (384-dim embeddings)\n                ↓\n    Reciprocal Rank Fusion scoring\n    + recency/frequency/usefulness boost\n    + graph-aware expansion (1-hop links)\n                ↓\n     Ranked results + linked context\n```\n\n**Key design choices:**\n- **No external services** — embeddings run locally via HuggingFace transformers\n- **Hybrid retrieval** — vector similarity + BM25 full-text search, fused with RRF\n- **Graph-aware retrieval** — search results include 1-hop linked memories for richer context\n- **Usefulness feedback** — memories accessed after search get implicit quality signals, improving future ranking\n- **Automatic enrichment** — entity extraction, auto-tagging, and deduplication happen on every write\n- **Importance decay** — unused memories lose importance over time, frequently accessed ones gain it\n- **Privacy stripping** — `\u003cprivate\u003e` blocks are stripped before storage/embedding\n\n---\n\n## Packages\n\n| Package | Description |\n|---------|-------------|\n| `@exocortex/core` | Storage, retrieval, embedding, scoring, entity extraction, intelligence, ingestion |\n| `@exocortex/mcp` | MCP server — exposes all memory tools via stdio (works with any MCP client) |\n| `@exocortex/server` | Hono REST API on port 3210 + serves the React dashboard |\n| `@exocortex/cli` | CLI tool (`exo`) — add, search, import/export, serve, consolidate, retrieval-regression, backfill, verify-backup |\n| `@exocortex/dashboard` | React SPA with Neural Interface theme — memories, chat, graph, entities, goals, analytics, timeline, skills, trash, mobile-responsive |\n\n---\n\n## MCP Server\n\nThe MCP server exposes all Exocortex tools over stdio. See [Quick Start](#connect-an-ai-agent) for setup with your preferred tool.\n\n### Tools\n\n| Tool | Description |\n|------|-------------|\n| `memory_store` | Store a new memory with tags, importance, and content type |\n| `memory_search` | Hybrid search with RRF scoring, token budgets, and compact mode |\n| `memory_get` | Fetch full content for specific memory IDs (use after compact search) |\n| `memory_update` | Update content, tags, importance, or content type of an existing memory |\n| `memory_forget` | Delete a memory by ID |\n| `memory_context` | Load contextual memories for a topic (use at session start) |\n| `memory_browse` | Browse memories by tags, type, or date range without semantic search |\n| `memory_feedback` | Mark retrieved memories as useful to improve ranking |\n| `memory_entities` | List tracked entities with optional tag filtering |\n| `memory_timeline` | Query decision history, lineage, or topic evolution over time |\n| `memory_ingest` | Index markdown files — splits by `##` headers, deduplicates by `source_uri`, supports globs |\n| `memory_link` | Create/remove memory-to-memory links for graph-aware retrieval |\n| `memory_digest_session` | Digest a coding session transcript into a structured session summary |\n| `memory_maintenance` | Run maintenance: importance adjustment, archival, health checks, search friction, re-embedding, entity backfill, importance recalibration, graph densification, co-retrieval links, adaptive weight tuning, entity orphan pruning |\n| `memory_consolidate` | Find and merge clusters of similar memories into summaries |\n| `memory_graph` | Entity graph analysis — full graph, bridge detection, community detection |\n| `memory_contradictions` | List and resolve detected contradictions between memories |\n| `memory_tag_cleanup` | Find and merge near-duplicate tags (preview or apply) |\n| `memory_diff` | See what changed since a timestamp — new, updated, and archived memories |\n| `memory_project_snapshot` | Quick project snapshot: recent activity, goals, decisions, techniques |\n| `memory_decay_preview` | Dry-run preview of what maintenance would archive |\n| `memory_ping` | Health check — memory counts, entity/tag stats, date range, uptime |\n| `goal_create` | Create a persistent goal |\n| `goal_list` | List goals by status |\n| `goal_get` | Get goal details with milestones and progress |\n| `goal_update` | Update goal fields/status/metadata |\n| `goal_log` | Log progress on a goal |\n| `goal_add_milestone` | Add a milestone to a goal |\n| `goal_update_milestone` | Update milestone title/status/order/deadline |\n| `goal_remove_milestone` | Remove a milestone from a goal |\n\n### Search workflow\n\nToken-efficient layered retrieval:\n\n```\n1. memory_search(query, compact=true)     → IDs + previews + scores (~50 tokens/result)\n2. Review results, pick relevant IDs\n3. memory_get(ids=[...])                  → Full content for selected memories\n```\n\nOr use `max_tokens` to let the server pack results into a token budget:\n\n```\nmemory_search(query, max_tokens=2000)     → As many full results as fit in 2000 tokens\n```\n\n### Session digestion\n\nThe `memory_digest_session` tool reads a session transcript JSONL file, extracts meaningful actions (edits, writes, bash commands, web fetches), and stores a structured summary:\n\n```\nSession 2026-02-01 (project: exocortex)\n- Edit packages/core/src/memory/digest.ts\n- Bash: pnpm test\n- Bash: git commit -m \"Add session digestion\"\n- Edit README.md\n\nFiles changed: 2 | Commands: 2 | Tools used: 2\n```\n\nRead-only tools (Read, Glob, Grep) and Exocortex's own MCP calls are filtered out. Consecutive edits to the same file are deduplicated. The project is auto-detected from file paths.\n\n### Stop hook (Claude Code only, optional)\n\nAn optional stop hook can remind the agent to store a session summary before exiting substantial sessions. Not enabled by default. To enable, add to `~/.claude/settings.json`:\n\n```json\n{\n  \"hooks\": {\n    \"Stop\": [{ \"type\": \"command\", \"command\": \"node /path/to/exocortex/packages/mcp/src/hooks/stop.js\" }]\n  }\n}\n```\n\n---\n\n## CLI\n\n```bash\npnpm exec exo \u003ccommand\u003e [options]\n```\n\n| Command | Description |\n|---------|-------------|\n| `add \u003ccontent\u003e` | Add a new memory. Options: `-t/--tags`, `-i/--importance`, `--type`, `--source` |\n| `search \u003cquery\u003e` | Hybrid retrieval search. Options: `-l/--limit`, `--after`, `--before`, `-t/--tags`, `--type`, `-v/--verbose` |\n| `import \u003cfile\u003e` | Import from file. Options: `-f/--format` (json\\|markdown\\|chatexport), `--dry-run`, `-d/--decrypt`. Structured Exocortex backup JSON is auto-detected and restored directly. |\n| `stats` | Show memory statistics — counts, breakdowns by type/source, date range |\n| `serve` | Start HTTP server + dashboard. Options: `-p/--port` (default 3210), `-H/--host` (default 127.0.0.1) |\n| `mcp` | Start MCP server on stdio |\n| `consolidate` | Find and merge similar memories. Options: `--dry-run`, `--similarity`, `--min-size`, `--history` |\n| `entities` | List and manage entities. Options: `--type`, `--search`, `--memories` |\n| `contradictions` | View and manage contradictions. Options: `--status`, `--detect`, `--resolve \u003cid\u003e`, `--dismiss \u003cid\u003e` |\n| `export` | Export JSON backup (memories, entities, goals, links, settings) |\n| `obsidian-export` | Export memories to Obsidian-compatible markdown vault |\n| `retrieval-regression` | Run golden-query retrieval drift checks and baseline management |\n| `verify-backup` | Verify backup integrity — checks row counts, embeddings, entity links |\n| `backfill` | Backfill canonical memory state. Options: `--dry-run`, `--limit` |\n\n---\n\n## REST API\n\n### Health\n\n```\nGET  /health                    — DB connection status\n```\n\n### Memories\n\n```\nPOST   /api/memories            — Create memory\nGET    /api/memories/:id        — Get by ID\nPATCH  /api/memories/:id        — Update (content, tags, importance, is_active)\nDELETE /api/memories/:id        — Permanent delete\nPOST   /api/memories/search     — Hybrid search\nPOST   /api/memories/context-graph — Search + 1-hop linked context\nGET    /api/memories/recent     — Recent memories\nPOST   /api/memories/import     — Bulk import\nGET    /api/memories/archived   — List archived/trashed memories\nPOST   /api/memories/:id/restore — Restore an archived memory\nGET    /api/memories/:id/links  — List memory-to-memory links\nGET    /api/memories/namespaces — List available namespaces\nGET    /api/memories/diff       — Changes since a timestamp (new, updated, archived)\nPOST   /api/memories/bulk-tag   — Add/remove tags on multiple memories\nPOST   /api/memories/bulk-update — Bulk update memory fields\n```\n\n### Entities\n\n```\nGET    /api/entities            — List entities (filter by tags; optional type for manually classified entities)\nPOST   /api/entities            — Create entity\nGET    /api/entities/tags       — List all distinct entity tags\nGET    /api/entities/:id        — Get by ID\nPATCH  /api/entities/:id        — Update (name, aliases, tags, optional manual type)\nDELETE /api/entities/:id        — Delete\nGET    /api/entities/:id/memories       — Linked memories\nGET    /api/entities/:id/relationships  — Entity relationships\nGET    /api/entities/graph              — All entities + relationships (single query)\nGET    /api/entities/graph/analysis     — Centrality, communities, stats\n```\n\n### Chat\n\n```\nPOST   /api/chat               — RAG chat (requires ai.api_key in settings)\n```\n\nSend `{ message, history?, conversation_id? }`. The server searches memories for context, sends prior conversation history + retrieved context to the configured LLM, and returns `{ response, sources, conversation_id }`. Supports Anthropic (default) and OpenAI providers via `ai.provider` setting.\n\n### Intelligence\n\n```\nPOST   /api/consolidate         — Find and consolidate memory clusters\nGET    /api/consolidations      — Consolidation history\nPOST   /api/contradictions/detect — Scan for contradictions\nGET    /api/contradictions      — List contradictions\nGET    /api/contradictions/:id  — Get contradiction by ID\nPATCH  /api/contradictions/:id  — Update contradiction (resolve/dismiss)\nPOST   /api/archive             — Archive stale memories\nPOST   /api/importance-adjust   — Adjust importance from access patterns\nGET    /api/timeline            — Memory timeline with filters\nGET    /api/temporal-stats      — Temporal analysis (streaks, averages)\nGET    /api/hierarchy           — Temporal hierarchy (epoch → theme → episode)\nPOST   /api/reembed             — Re-embed memories with missing embeddings\nPOST   /api/auto-consolidate    — Run auto-consolidation\nGET    /api/retrieval-regression/runs   — Regression run history\nGET    /api/retrieval-regression/latest — Latest run per-query breakdown\n```\n\n### Analytics\n\n```\nGET    /api/analytics/summary             — Overview stats (counts, trends)\nGET    /api/analytics/access-distribution — Access count distribution\nGET    /api/analytics/tag-effectiveness   — Tag usage and effectiveness\nGET    /api/analytics/tag-health          — Tag quality metrics\nGET    /api/analytics/producer-quality    — Quality by provider/model/agent\nGET    /api/analytics/quality-trend       — Quality over time\nGET    /api/analytics/quality-distribution — Quality score distribution\nGET    /api/analytics/embedding-health    — Embedding coverage and gaps\nGET    /api/analytics/decay-preview       — Preview importance decay candidates\nGET    /api/analytics/search-misses       — Zero-result queries\nGET    /api/analytics/knowledge-gaps      — Detected knowledge gaps\n```\n\n### Goals\n\n```\nGET    /api/goals              — List goals (default status=active)\nGET    /api/goals/:id          — Get goal with progress + milestones\nPOST   /api/goals              — Create goal\nPATCH  /api/goals/:id          — Update goal\nDELETE /api/goals/:id          — Delete goal\n```\n\n### Data\n\n```\nGET    /api/export              — Export JSON backup (memories, entities, goals, links, settings)\nGET    /api/stats               — Memory statistics\nGET    /api/settings            — Get all settings (API keys masked)\nPATCH  /api/settings            — Update settings (masked values skipped)\n```\n\nServer security defaults:\n- Binds to `127.0.0.1` by default.\n- CORS is disabled by default. To allow browser cross-origin access, set `EXOCORTEX_CORS_ORIGINS` (comma-separated exact origins).\n\n---\n\n## Scoring\n\n### RRF mode (default)\n\nReciprocal Rank Fusion fuses two independent ranked lists — vector similarity and FTS5 BM25 — into a single ranking:\n\n```\nRRF_score(d) = 1/(k + rank_vector(d)) + 1/(k + rank_fts(d))\n```\n\nWhere `k = 60` (configurable via `scoring.rrf_k`). Recency and frequency are applied as a multiplicative boost on top. Score range: ~0.001–0.03.\n\n### Legacy mode\n\nActivate with `scoring.use_rrf = false`. Uses a weighted average:\n\n```\nscore = 0.45 * vector + 0.25 * fts + 0.20 * recency + 0.10 * frequency\n```\n\nAll weights are configurable via the `settings` table. Score range: ~0.15–0.80.\n\n---\n\n## Retrieval Intelligence\n\n### Usefulness feedback loop\n\nEvery memory has a `useful_count` column. When a memory appears in search results and is then fetched via `memory_get` within 5 minutes, the count is implicitly incremented. Explicit feedback via `memory_feedback` also increments it. `usefulnessScore()` uses this count to boost future ranking.\n\n### Adaptive weight tuning\n\nRunning `memory_maintenance` with `tune_weights: true` analyzes feedback data and nudges scoring weights ±0.02 per cycle. Over time, the retrieval system self-tunes toward the signals that correlate with usefulness.\n\n### Graph-aware retrieval\n\nMemory-to-memory links (created manually via `memory_link` or automatically) are used during search and context loading. Up to 3 linked memories (1-hop) are appended to results as a \"Linked\" section, providing richer context without additional queries.\n\n### Valence scoring\n\nMemories can carry an emotional significance field (`valence`, -1 to 1). Both breakthroughs (+1) and failures (-1) boost retrieval via `Math.abs()` — strong emotional signals surface more readily regardless of polarity. Inspired by Damasio's Somatic Marker Hypothesis.\n\n### Store-time relation discovery\n\nOn every `memory_store`, the top 200 recent memories are scanned by cosine similarity. Memories with similarity \u003e= 0.75 are automatically linked (up to 5 per write), building the knowledge graph organically.\n\n### Co-retrieval link building\n\nMemories frequently retrieved together in the same search sessions are tracked in the `co_retrievals` table. During maintenance, pairs with enough co-retrieval history are automatically linked, reinforcing natural knowledge clusters.\n\n---\n\n## Intelligence Features\n\n### Consolidation\n\nGreedy agglomerative clustering of semantically similar memories (threshold: 0.75). Clusters of 3+ memories are merged into a basic summary that extracts key facts — dates, metrics, decisions, architecture notes. Source memories are archived and linked to the summary via `parent_id`. LLM-powered synthesis can be handled externally by scheduled maintenance jobs, keeping Exocortex API-cost-free.\n\nTags like `skill`, `prompt-amendment`, and `goal-progress-implicit` are excluded from propagating to consolidated summaries to prevent semantic pollution.\n\n### Entity Graph Analysis\n\nThe entity graph supports three analysis modes via `memory_graph`:\n- **Full graph** — all entities and relationships\n- **Bridge detection** — entities connecting otherwise-separate clusters (betweenness centrality)\n- **Community detection** — label propagation algorithm discovers dense subgraphs, O(V+E) per iteration\n\n### Search Friction Tracking\n\nZero-result queries are logged to a `search_misses` table, revealing gaps in indexed knowledge. `memory_maintenance` surfaces the top missed queries as \"Search Friction Signals\", helping identify what knowledge should be stored or better tagged.\n\n### Contradiction detection\n\nFinds memory pairs with high semantic similarity (\u003e0.7) that contain conflicting statements — negations, value changes, or reversed positions. Detected contradictions can be resolved or dismissed.\n\n### Automatic maintenance\n\nImportance adjustment and memory archival run automatically — no manual intervention needed:\n\n- **On server startup** (5-second delay)\n- **After every 50 memory stores**\n- **Nightly cron jobs** (importance at 3:30 AM, archival at 4:00 AM)\n\n**Importance adjustment** tunes scores based on access patterns:\n- **Boost**: Memories accessed 5+ times get importance increased (up to 0.9)\n- **Decay**: Never-accessed memories older than 30 days get importance decreased (down to 0.1)\n- **Pinned**: Memories with importance 1.0 are never adjusted\n\n**Memory archival** soft-deletes stale memories (`is_active = 0`):\n- Low importance (\u003c0.3) + old (\u003e90 days) + rarely accessed (\u003c2 times)\n- Never accessed + very old (\u003e365 days)\n\nAdditional maintenance operations available via `memory_maintenance`:\n- **Entity orphan pruning** — entities with fewer than 2 active memory links are automatically deleted\n- **Importance recalibration** — optional percentile-rank normalization of importance distribution\n- **Graph densification** — creates co-occurrence relationships between entities sharing memories\n\nConsolidation and contradiction detection run nightly only (2:00 AM / 2:30 AM) since they may need human review. Entity extraction for unprocessed memories runs nightly at 3:00 AM.\n\n### Temporal analysis\n\nTimeline view of memories grouped by date, with statistics: total active days, average memories per day, most active day, current and longest streaks.\n\n---\n\n## Automatic Enrichment\n\nMemory writes apply three enrichment behaviors. Extraction/tagging are non-blocking (failures don't prevent storage).\n\n### Entity extraction\n\nRegex-based extraction scans memory content with keyword/context heuristics (an optional LLM-based extractor exists at `packages/core/src/entities/llm-extractor.ts`, but is not integrated by default).\n\nAutomatic category assignment is disabled: extracted entities are stored as type `concept` by default. The `type` field is manual/editable metadata (API/dashboard), not an automatic classifier at ingest time.\n\nEntities below 0.5 confidence are filtered out at extraction time, removing noisy single-mention concepts.\n\nEntities are linked to memories with relevance scores and can be queried independently. Relationships include optional context phrases extracted from memory content (e.g. \"uses -\u003e for real-time event streaming\").\n\n### Auto-tagging\n\nWhen `auto_tagging.enabled = true`, up to 5 tags are generated per memory by matching against:\n1. **Tech keywords** — languages, frameworks, databases, tools, platforms\n2. **Topic patterns** — decision, bug, architecture, lesson, config, performance, deployment, testing, refactor, security\n3. **Project names** — kebab-case identifiers (filtered against a blocklist of common compounds)\n\nGenerated tags are normalized through the alias map and merged with user-supplied tags (duplicates ignored).\n\n### Keyword generation\n\nKeywords are generated from content, tags, and entity names, then stored in the `keywords` column for FTS boosting. This gives full-text search additional high-signal terms beyond the raw content.\n\n### Relation discovery\n\nOn every `memory_store`, the top 200 recent memories are scanned by cosine similarity. Memories with similarity \u003e= 0.75 are auto-linked (up to 5 per write), building the knowledge graph incrementally without manual intervention.\n\n### Deduplication\n\nDedup uses two checks:\n1. **Hash dedup** — compares against active root memories with the same `content_type` + `content_hash`. Whitespace is normalized before hashing when `dedup.hash_normalize_whitespace` is enabled (default).\n2. **Semantic dedup** — for content \u003e= 50 chars with an embedding available, compares against the most recent `dedup.candidate_pool` active root memories (default `200`) of the same content type\n\nSemantic dedup requires cosine similarity \u003e= `dedup.similarity_threshold` (default `0.85`). If incoming tags are provided, at least one tag must overlap the candidate.\n\nAction on match depends on `dedup.skip_insert_on_match`:\n- `true` (default): skip insert, return existing memory (`dedup_action = \"skipped\"`)\n- `false`: insert new memory and supersede old one (`is_active = 0`, `superseded_by = new_id`)\n\n---\n\n## Privacy\n\nContent wrapped in `\u003cprivate\u003e...\u003c/private\u003e` tags is stripped before storage, embedding, and indexing. Use this to include context in prompts without persisting sensitive information.\n\n---\n\n## Attribution\n\nEach memory tracks its origin: `provider`, `model_id`, `model_name`, `agent`, `session_id`, `conversation_id`. These are set via MCP tool parameters or environment defaults (`EXOCORTEX_DEFAULT_PROVIDER`, etc.). Attribution enables filtering by source and auditing which model/agent produced a memory.\n\n---\n\n## Database Schema\n\n17 tables + 1 virtual FTS5 table:\n\n| Table | Purpose |\n|-------|---------|\n| `memories` | Core records — content, embeddings, importance, access tracking, parent/child links |\n| `memory_tags` | Many-to-many tag associations |\n| `memory_entities` | Junction table linking memories to entities with relevance scores |\n| `entities` | Named entities with freeform tags |\n| `entity_tags` | Many-to-many tag associations for entities |\n| `access_log` | Query access history for importance adjustment |\n| `consolidations` | Consolidation history — which memories were merged and how |\n| `contradictions` | Detected contradictions with status tracking (pending/resolved/dismissed) |\n| `entity_relationships` | Directed relationships between entities with labels and optional context phrases |\n| `search_misses` | Zero-result query log for friction analysis |\n| `observability_counters` | Lightweight operational counters |\n| `memory_links` | Memory-to-memory graph links (`related`, `supports`, etc.) |\n| `goals` | Persistent goals with status, priority, deadline, metadata |\n| `co_retrievals` | Co-retrieval history used to infer links |\n| `retrieval_regression_baselines` | Golden query baseline result IDs |\n| `retrieval_regression_runs` | Retrieval regression run history + drift metrics |\n| `settings` | Key-value configuration store |\n| `memories_fts` | FTS5 virtual table with auto-sync triggers on insert/update/delete |\n\n---\n\n## Configuration\n\nAll settings are stored in the `settings` table and can be changed via the REST API (`PATCH /api/settings`) or the dashboard.\n\n### Scoring\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `scoring.use_rrf` | `true` | Use Reciprocal Rank Fusion (false = legacy weighted average) |\n| `scoring.rrf_k` | `60` | RRF smoothing constant |\n| `scoring.rrf_min_score` | `0.001` | Minimum score threshold in RRF mode |\n| `scoring.vector_weight` | `0.45` | Vector similarity weight (legacy mode) |\n| `scoring.fts_weight` | `0.25` | Full-text search weight (legacy mode) |\n| `scoring.recency_weight` | `0.20` | Recency weight (legacy mode) |\n| `scoring.frequency_weight` | `0.10` | Frequency weight (legacy mode) |\n| `scoring.recency_decay` | `0.05` | Recency decay rate |\n| `scoring.min_score` | `0.15` | Minimum score threshold (legacy mode) |\n| `scoring.tag_boost` | `0.10` | Score boost for tag matches |\n| `scoring.graph_weight` | `0.10` | Graph proximity weight |\n| `scoring.usefulness_weight` | `0.05` | Usefulness feedback weight |\n| `scoring.valence_weight` | `0.05` | Valence (emotional significance) weight |\n\n### Embedding\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `embedding.model` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model identifier |\n| `embedding.dimensions` | `384` | Embedding vector dimensions |\n\n### Importance\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `importance.auto_adjust` | `true` | Enable automatic importance adjustment |\n| `importance.boost_threshold` | `5` | Access count to trigger importance boost |\n| `importance.decay_age_days` | `30` | Days before unused memories start decaying |\n\n### Deduplication\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `dedup.enabled` | `true` | Enable semantic deduplication |\n| `dedup.hash_enabled` | `true` | Enable hash-based deduplication by `content_type` + `content_hash` |\n| `dedup.similarity_threshold` | `0.85` | Cosine similarity threshold for semantic dedup |\n| `dedup.candidate_pool` | `200` | Number of recent active root memories scanned for semantic dedup |\n| `dedup.skip_insert_on_match` | `true` | Reuse existing memory on match instead of inserting + superseding |\n| `dedup.hash_normalize_whitespace` | `true` | Normalize whitespace before computing content hash |\n\n### Chunking\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `chunking.enabled` | `true` | Split long memories into chunks |\n| `chunking.max_length` | `1500` | Character length before chunking triggers |\n| `chunking.target_size` | `500` | Target chunk size in characters |\n\n### AI / Chat\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `ai.api_key` | — | API key for chat (Anthropic or OpenAI). Masked in GET responses |\n| `ai.provider` | `anthropic` | LLM provider (`anthropic` or `openai`) |\n| `ai.model` | `claude-sonnet-4-5-20250929` / `gpt-4o-mini` | Model to use for chat |\n\n### Other\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `server.port` | `3210` | REST API / dashboard port |\n| `auto_tagging.enabled` | `true` | Auto-generate tags on memory creation |\n\n### Tag Normalization\n\nTags are normalized on read/write through an alias map stored in settings. Define aliases to merge variant spellings: `tag_alias.js = javascript`, `tag_alias.ts = typescript`. All queries and writes go through normalization.\n\n---\n\n## Data Directory\n\nAll data is stored in `~/.exocortex/`:\n\n```\n~/.exocortex/\n  exocortex.db     # SQLite database (memories, entities, settings)\n  models/          # Cached embedding model (all-MiniLM-L6-v2, ~80MB)\n```\n\nOverride the model cache location with `EXOCORTEX_MODEL_DIR` environment variable.\n\n---\n\n## System Requirements\n\n- **Node.js** \u003e= 20 (uses built-in `node:sqlite`)\n- **pnpm** (workspace package manager)\n- No external database — SQLite is built into Node\n- No API keys — embeddings run locally\n\n---\n\n## Stack\n\n| Component | Technology |\n|-----------|-----------|\n| Runtime | Node.js \u003e= 20 (built-in `node:sqlite`) |\n| Language | TypeScript |\n| Package manager | pnpm (workspaces) |\n| Server | Hono |\n| Dashboard | React 19 + Vite 7 + TanStack Query |\n| Validation | Zod 4 |\n| Testing | Vitest 4 |\n| Embeddings | @huggingface/transformers (all-MiniLM-L6-v2, 384 dims) |\n| MCP | @modelcontextprotocol/sdk |\n| IDs | ULID |\n\n---\n\n## Development\n\n```bash\n# Install dependencies\npnpm install\n\n# Build all packages\npnpm build\n\n# Run all tests\npnpm test\n\n# Run UI regression checks (layout + interaction flows on desktop/mobile)\npnpm test:ui\n\n# Type-check\npnpm lint\n\n# Dev server with watch mode\npnpm dev\n```\n\nIf this is your first Playwright run on the machine, install Chromium once:\n\n```bash\npnpm exec playwright install chromium\n```\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawnhack%2Fexocortex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshawnhack%2Fexocortex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawnhack%2Fexocortex/lists"}