{"id":49368035,"url":"https://github.com/ashlesh-t/cognirepo","last_synced_at":"2026-04-27T21:00:38.046Z","repository":{"id":353255923,"uuid":"1179860462","full_name":"ashlesh-t/cognirepo","owner":"ashlesh-t","description":"Local cognitive infrastructure for AI coding agents — semantic memory, repository intelligence, and MCP tools to reduce token usage.","archived":false,"fork":false,"pushed_at":"2026-04-23T04:54:48.000Z","size":2099,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T06:24:02.971Z","etag":null,"topics":["ai","ai-agents-automation","ai-agents-cli","ai-agents-mcp","mcp","mcp-servers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashlesh-t.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-12T13:06:23.000Z","updated_at":"2026-04-23T06:19:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ashlesh-t/cognirepo","commit_stats":null,"previous_names":["ashlesh-t/cognirepo"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/ashlesh-t/cognirepo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashlesh-t%2Fcognirepo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashlesh-t%2Fcognirepo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashlesh-t%2Fcognirepo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashlesh-t%2Fcognirepo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashlesh-t","download_url":"https://codeload.github.com/ashlesh-t/cognirepo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashlesh-t%2Fcognirepo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32354574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-agents-automation","ai-agents-cli","ai-agents-mcp","mcp","mcp-servers"],"created_at":"2026-04-27T21:00:19.737Z","updated_at":"2026-04-27T21:00:38.038Z","avatar_url":"https://github.com/ashlesh-t.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CogniRepo\n\n\u003e Persistent memory and context for any AI tool. Not a chatbot — infrastructure.\n\n[![CI](https://github.com/ashlesh-t/cognirepo/actions/workflows/ci.yml/badge.svg)](https://github.com/ashlesh-t/cognirepo/actions/workflows/ci.yml)\n[![Security](https://github.com/ashlesh-t/cognirepo/actions/workflows/security.yml/badge.svg)](https://github.com/ashlesh-t/cognirepo/actions/workflows/security.yml)\n[![PyPI version](https://badge.fury.io/py/cognirepo.svg)](https://badge.fury.io/py/cognirepo)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![Discord](https://img.shields.io/badge/Discord-CogniRepo-5865F2?logo=discord\u0026logoColor=white)](https://discord.com/channels/1488386981917360289/1488387271190380636)\n\n---\n\n![alt text](image.png)\n\n## What it does\n\nEvery AI conversation starts from zero. Claude, Cursor, Gemini — none of them remember\nwhat you fixed yesterday, which files relate to which features, or what decisions were made\nlast sprint. CogniRepo fixes that.\n\nIt sits between your codebase and any AI tool, providing:\n\n- **Semantic memory** — FAISS vector store with sentence-transformer embeddings. Store\n  decisions, docs, architecture notes. Retrieve them with natural language.\n- **Episodic log** — append-only event journal. Know what happened before that error.\n- **Knowledge graph** — NetworkX DiGraph linking functions, classes, files, imports,\n  inheritance chains, call relationships, and concepts. All queryable.\n- **AST reverse index** — O(1) symbol lookup across your entire codebase in any supported language.\n- **User behavior profiling** — tracks how you prompt so Claude adapts its response\n  style without you having to re-explain preferences every session.\n- **Error tracking** — records errors with prevention hints so Claude avoids\n  repeating the same mistake across sessions.\n- **Session history** — persists conversation exchanges so any session can resume\n  where the last one ended.\n- **Architectural summaries** — auto-generated on first init; built entirely from\n  the local AST index (no API key needed). File → directory → repo summary tree,\n  embedded into FAISS for semantic search.\n- **Multi-model orchestration** — classify query complexity → build context → route to the\n  right model. Claude for deep reasoning, Gemini Flash for quick lookups. All automatic.\n\nEvery AI tool that connects gets the same accumulated project knowledge. Memory persists\nacross sessions, across tools, across time.\n\n---\n\n## When to use CogniRepo\n\n**Most effective on codebases ≥ 15K LOC.** On small repos (\u003c 10K LOC), native file reads\nare fast enough that the MCP tool schema overhead (~3,650 tokens for 30 tools) takes more\nthan you save. Break-even is roughly 4 tool calls on a medium-sized repo.\n\n**CogniRepo vs. claude-context / similar tools:**\n\n| Feature | CogniRepo | claude-context / similar |\n|---------|-----------|--------------------------|\n| Pure code retrieval | ✓ (FAISS + graph + AST) | ✓ Often faster on first use |\n| Episodic memory (what happened last sprint) | ✓ Persistent BM25 + vector | ✗ |\n| Cross-agent handoff (Claude → Gemini → Cursor) | ✓ `last_context.json` shared | ✗ |\n| User behaviour profile (adapts depth/style) | ✓ `get_user_profile()` | ✗ |\n| Error pattern avoidance (learns from past fails) | ✓ `record_error()` | ✗ |\n| Architectural decision records | ✓ `record_decision()` | ✗ |\n| Multi-repo org graph (microservices) | ✓ `CHILD_OF` / `CALLS_API` edges | ✗ |\n\n**Conclusion:** prefer CogniRepo when you value institutional memory across sessions.\nUse simpler tools when you just need one-shot code retrieval on a small codebase.\n\n---\n\n## Why it helps — measured numbers\n\nBenchmarked across 6 real open-source repos (FastAPI, Flask, Celery, Ansible, Moby/Docker, Kubernetes) using 30 structured prompts tested against Claude, Gemini, and Cursor/Codex.\n\n| Metric | Value | Notes |\n|--------|-------|-------|\n| Token reduction — Python repos | **50–84%** | FastAPI FA-2: 12 000 → 2 500 · FA-4: 2 000 → 450 · FL-4: 8 000 → 1 250 |\n| Token reduction — average (all tested) | **~60%** | Across FA/FL/CE/AN where both baselines were captured |\n| Token reduction — complex dynamic codebases | **20–35%** | Celery CE-4/CE-5; deep async/dynamic-dispatch patterns reduce gains |\n| Symbol lookup latency | **\u003c 1 ms** | vs. `grep` at 2–8 s on large repos |\n| Accuracy vs. baseline | **equal or better in 100% of tests** | No regression observed; FA-2 accuracy improved Moderate → High |\n| Cross-agent context handoff | **✅ validated** | CE-4: Claude primed index, Gemini CLI consumed it — 35% token saving, same accuracy |\n| Dynamic dispatch coverage | **honest gap** | CE-3 (APScheduler beat dispatch) returned NA for both; CogniRepo does not fabricate call chains |\n| Go/multi-language coverage | **partial** | Moby MO-2 showed 67% savings; MO-3-5 / K8-* incomplete pending Go grammar improvements |\n\n\u003e **Honest limits:** CogniRepo adds the most value on Python repos with clear static structure.\n\u003e Dynamic dispatch patterns (Celery beat, plugin registries), deep Go codebases, and Ansible's\n\u003e 22-level variable precedence chains reduce retrieval confidence. The tool reports uncertainty\n\u003e rather than hallucinating call chains.\n\n### Measured: precision@k and index build time (4 external repos)\n\nIndexed 4 real repos, measured with `cognirepo index-repo` + `context_pack` queries. CPU-only, no GPU.\n\n| Repo | Files | Index time | Lookup latency | precision@3 |\n|------|-------|-----------|----------------|-------------|\n| flask | 83 | 12s | 0.011 ms | **100%** |\n| fastapi | 1,122 | 34s | 0.005 ms | **89%** |\n| celery | 416 | 44s | 0.025 ms | **100%** |\n| ansible | 1,813 | 145s | 0.018 ms | **80%** |\n\nAll repos: symbol hit rate 5/5, lookup latency \u003c 0.1ms. All quality gates pass. Full numbers: [docs/METRICS.md](docs/METRICS.md).\n\nRun `cognirepo benchmark` on your own codebase to reproduce. See [docs/METRICS.md](docs/METRICS.md).\n\n---\n\n## How it works\n\n```\nUser / AI Tool\n    │\n    ├── MCP stdio         (Claude Desktop, Gemini CLI, Cursor)\n              │\n         tools/           ← single entry point to memory engine\n              │\n    ┌─────────┼─────────────────────────────────────┐\n    ▼         ▼                                      ▼\nmemory/    retrieval/hybrid.py               graph/knowledge_graph.py\nFAISS      3-signal merge:                   NetworkX DiGraph\nepisodic   vector + graph + behaviour        7 node types:\nembeddings                                   FILE, FUNCTION, CLASS,\n           indexer/ast_indexer.py            CONCEPT, QUERY, SESSION,\n           tree-sitter multi-language        ERROR\n           + stdlib ast fallback             9 edge types:\n                                             CALLS, CALLED_BY,\ngraph/behaviour_tracker.py                  DEFINED_IN, CO_OCCURS,\n  per-symbol hit counts                     IMPORTS, INHERITS,\n  user behavior profile                     RELATES_TO,\n  error pattern tracking                    QUERIED_WITH\n  session history\n              │\n         .cognirepo/   (Fernet encrypted if storage.encrypt: true)\n```\n\n---\n\n## Quick start\n\n### Requirements\n\n- Python 3.11+\n- API key (optional — only needed for `cognirepo ask`):\n  `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `OPENAI_API_KEY`, or `GROK_API_KEY`.\n  Indexing, memory, summarization, and all MCP tools work fully offline.\n\n### Install\n\n```bash\n# Recommended — CPU-only, no GPU required (~400 MB vs ~2 GB):\npip install 'cognirepo[cpu,languages]'\n\n# For encryption at rest:\npip install 'cognirepo[cpu,languages,security]'\n\n# With model routing (cognirepo ask — needs an API key):\npip install 'cognirepo[cpu,languages,providers]'\n\n# Full development install:\npip install -e '.[dev,security,languages]'\n```\n\n\u003e **Note:** `[cpu]` is now the default — `sentence-transformers[cpu]` ships with PyTorch CPU wheels only.\n\u003e Use `pip install 'cognirepo[gpu]'` if you need GPU acceleration.\n\n### Run\n\n```bash\n# One-command onboarding (init + index + auto-configure MCP for Claude/Cursor/VS Code):\ncognirepo setup\n\n# Or step by step:\ncognirepo init --no-index     # scaffold .cognirepo/\ncognirepo index-repo .        # index your codebase (required before MCP tools work)\ncognirepo index-repo . --daemon  # index and run watcher in background\n\n# Check everything is working:\ncognirepo status                        # shows symbol count, graph nodes, signal warmth\ncognirepo doctor                        # full health check\n\n# Query through multi-model orchestrator:\ncognirepo ask \"why is auth slow?\"\n\n# Manage background watchers:\ncognirepo list                          # show all running watcher daemons\ncognirepo list -n \u003cPID\u003e --view          # tail the log of a specific watcher\ncognirepo list -n \u003cPID\u003e --stop          # stop a watcher\n```\n\n\u003e **First-time setup:** `cognirepo init` + `cognirepo index-repo .` must complete before\n\u003e MCP tools (`context_pack`, `lookup_symbol`, `who_calls`, etc.) return data.\n\n---\n\n## Connect your AI tools\n\n### Claude Code / Claude Desktop (recommended — project-scoped)\n\nRun `cognirepo init` inside your project — it asks if you want to configure Claude and\nautomatically writes `.claude/CLAUDE.md` and `.claude/settings.json` with the correct\nproject-locked connector.\n\nEach project gets its **own isolated connector** named `cognirepo-\u003cproject\u003e`:\n\n```json\n{\n  \"mcpServers\": {\n    \"cognirepo-myproject\": {\n      \"command\": \"cognirepo\",\n      \"args\": [\"serve\", \"--project-dir\", \"/abs/path/to/myproject\"],\n      \"env\": {}\n    }\n  }\n}\n```\n\nThe `--project-dir` flag locks the MCP server to that project's `.cognirepo/` directory.\nWhen Claude has multiple projects open simultaneously, each connector reads only its own\nmemories — **never mixing data across projects or teams**.\n\n### Cursor / Copilot\n\n```bash\ncognirepo export-spec\ncp adapters/cursor_mcp_config.json .cursor/mcp.json\n# Restart Cursor — CogniRepo tools appear in the tool selector\n```\n\n### Docker\n\n```bash\ncp .env.example .env          # add your API keys\ndocker compose up mcp         # MCP stdio server\n```\n\n---\n\n## MCP Tools — complete reference\n\nAll 30 tools are available to Claude, Cursor, and any MCP-compatible client.\n\n### Core retrieval\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `context_pack(query, max_tokens=2000)` | Token-budget code + memory context | Every session — FIRST call before any file read |\n| `lookup_symbol(name)` | O(1) symbol lookup → file + line | Before grepping for a function |\n| `who_calls(function_name)` | Trace callers + dynamic dispatch fallback | Impact analysis, refactoring |\n| `search_token(word)` | Word-level reverse index across names, docs, comments | Finding where a concept lives |\n| `retrieve_memory(query, top_k=5)` | Semantic similarity search over stored memories | Before answering — pull past context |\n| `search_docs(query)` | Full-text search in all `.md` files | Documentation lookups |\n| `semantic_search_code(query, language=None)` | Vector search over code symbols only | Code-specific semantic queries |\n| `subgraph(entity, depth=2)` | Local knowledge graph neighbourhood | Understand symbol relationships |\n| `graph_stats()` | Node/edge count and graph health | Check if graph has data |\n| `episodic_search(query, limit=10)` | BM25 keyword search in event history | Find past decisions or incidents |\n| `dependency_graph(module, direction=\"both\")` | Import/dependency relationships | Module coupling analysis |\n| `explain_change(target, since=\"7d\")` | What changed in a file/function + git cross-ref | Understanding recent changes |\n| `architecture_overview(scope=\"root\")` | Pre-computed LLM architectural summaries | Big-picture questions |\n\n### User \u0026 session intelligence\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `get_user_profile()` | User's interaction style: depth pref, question types, vocabulary | **Call at session start** — calibrates Claude's response style |\n| `get_session_history(limit=10)` | Recent conversation exchanges across sessions | Resuming context from prior sessions |\n\n### Error tracking \u0026 prevention\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `get_error_patterns(min_count=1)` | Recurring errors with prevention hints | Before proposing a fix — check if it has failed before |\n| `record_error(error_type, message, file_path, query_context)` | Log an error for future avoidance | After any error Claude or user encounters |\n\n### Session start\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `get_session_brief()` | Architecture + hot symbols + index health | **First call every session** |\n| `get_last_context()` | Most recent context_pack snapshot from prior session | **Second call every session** — resume where previous agent left off |\n\n### Memory \u0026 storage\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `store_memory(text, source=\"\")` | Persist a memory to the FAISS index | After solving bugs, recording decisions |\n| `log_episode(event, metadata={})` | Append event to episodic journal | Track milestones, incidents, deployments |\n| `record_decision(summary, rationale=\"\")` | Record architectural decision to episodic memory | When making non-obvious design choices |\n\n### Cross-repo (organization)\n\n| Tool | Description | When to use |\n|------|-------------|-------------|\n| `org_search(query)` | Search memories across all org repos | Multi-repo context queries |\n| `org_wide_search(query)` | Search across every project in the org | Broadest cross-repo sweep |\n| `org_dependencies(depth=2)` | Bidirectional inter-repo dependency graph | \"What does this service depend on?\" |\n| `cross_repo_search(query, scope=\"project\")` | Project-scoped or org-scoped search | Finding shared components |\n| `cross_repo_traverse(symbol, direction=\"both\")` | Traverse org graph from a repo or symbol | Tracing bugs across service boundaries |\n| `list_org_context()` | Org metadata + sibling repos | Understanding repo relationships |\n| `link_repos(src_repo, dst_repo, relationship)` | Record cross-repo dependency | When you discover one repo imports another |\n\n---\n\n## Knowledge graph — what gets indexed\n\nThe knowledge graph is significantly richer than a simple call graph.\n\n### Node types\n\n| Type | Description |\n|------|-------------|\n| `FILE` | Every indexed source file |\n| `FUNCTION` | Function and method definitions with docstrings |\n| `CLASS` | Class definitions with base classes |\n| `CONCEPT` | Semantic concepts extracted from docstrings and identifiers |\n| `QUERY` | Recorded query nodes (for retrieval scoring) |\n| `SESSION` | Conversation session nodes |\n| `ERROR` | Recurring error pattern nodes |\n| `MEMORY` | Cross-agent memory nodes (synced from Claude/Gemini) |\n\n### Edge types\n\n| Type | Direction | Description |\n|------|-----------|-------------|\n| `DEFINED_IN` | symbol → file | Symbol lives in this file |\n| `CALLS` / `CALLED_BY` | bidirectional | Function call relationships with purpose labels |\n| `IMPORTS` | file → file | Python import dependencies |\n| `INHERITS` | class → parent | Inheritance hierarchy |\n| `CO_OCCURS` | file ↔ file | Files edited together (behavioural co-edit signal) |\n| `RELATES_TO` | concept → symbol | Semantic concept linkage |\n| `QUERIED_WITH` | query → symbol | Retrieval tracking for scoring |\n\n`IMPORTS` and `INHERITS` edges are built automatically during `index-repo` from Python AST.\nUse `subgraph(\"MyClass\", depth=2)` or `dependency_graph(\"mymodule\")` to query them.\n\n---\n\n## User behavior profiling\n\nCogniRepo tracks how you interact across sessions and builds a profile that Claude uses to\ncalibrate its responses — without you having to repeat preferences every session.\n\n### What gets tracked\n\n- **Depth preference** — inferred from average query length: `concise` / `medium` / `detailed`\n- **Question types** — distribution across: `why`, `what`, `how`, `fix`, `explain`, `where`, `refactor`, `add`\n- **Domain vocabulary** — top terms that appear frequently in your queries\n- **Code focus** — percentage of queries referencing code identifiers (symbols, functions)\n- **Sample queries** — last 3 queries for Claude to infer framing style\n\n### Accessing your profile\n\n```bash\n# MCP tool (Claude calls automatically at session start):\nget_user_profile()\n\n# CLI:\ncognirepo user-prefs\n```\n\n### Example profile output\n\n```json\n{\n  \"depth_preference\": \"detailed\",\n  \"top_question_type\": \"how\",\n  \"question_type_distribution\": {\"how\": 12, \"why\": 8, \"fix\": 5},\n  \"top_terminology\": [\"auth\", \"token\", \"session\", \"middleware\", \"validate\"],\n  \"code_focus_percent\": 73,\n  \"framing_hints\": \"prefers detailed responses; often asks 'how' questions; domain vocabulary: auth, token, session\",\n  \"total_queries_tracked\": 47\n}\n```\n\nClaude receives `framing_hints` at session start and adjusts response length, code density,\nand terminology accordingly. The profile accumulates over time — more accurate the more you use it.\n\n---\n\n## Error tracking \u0026 prevention\n\nCogniRepo logs every error that occurs during sessions — whether it's a Python exception,\na failed build step, or a tool call that went wrong. Errors are stored with:\n\n- **Dedup signature** — prevents the same error from inflating the count\n- **Prevention hint** — a targeted suggestion to avoid the same error class\n- **Occurrence context** — last 5 occurrences with file path and error message\n- **Query context** — the query or action that triggered the error\n\n### Logging errors\n\n```bash\n# MCP tool (Claude calls after errors):\nrecord_error(\"TypeError\", \"expected str got int\", \"config/parser.py\", \"fix config loading\")\n```\n\n### Viewing error patterns\n\n```bash\n# MCP tool:\nget_error_patterns()\n```\n\nReturns:\n```json\n[\n  {\n    \"error_type\": \"TypeError\",\n    \"count\": 7,\n    \"files\": [\"config/parser.py\", \"api/handlers.py\"],\n    \"last_seen\": \"2026-04-22T10:30:00Z\",\n    \"prevention_hint\": \"Wrong type — validate inputs at function boundary.\",\n    \"recent_context\": \"expected str got int in parse_config\"\n  }\n]\n```\n\n### Built-in prevention hints\n\n| Error class | Prevention hint |\n|-------------|-----------------|\n| `NameError` | Undefined variable — check imports and scope before use |\n| `ImportError` | Import failed — verify package is installed and module path is correct |\n| `AttributeError` | Object missing attribute — check type, None-guard, or spelling |\n| `TypeError` | Wrong type — validate inputs at function boundary |\n| `KeyError` | Missing dict key — use `.get()` with default or check existence first |\n| `IndexError` | List out of range — guard with `len()` check before access |\n| `OSError` | File/IO error — always guard file ops with `try/except OSError` |\n| `SyntaxError` | Syntax error — run a linter before committing |\n| `Timeout` | Timeout — add explicit timeout parameter and retry logic |\n| `AssertionError` | Assertion failed — review invariants; do not use assert in prod |\n\n---\n\n## Session history\n\nEvery `cognirepo ask` exchange is persisted to `.cognirepo/sessions/`.\nSessions are indexed by UUID and retrievable via:\n\n```bash\n# List recent sessions:\ncognirepo sessions\n\n# MCP tool — Claude calls at session start to resume context:\nget_session_history(limit=5)\n```\n\nEach entry returns: session ID, created timestamp, message count, model used, and\nthe last user/assistant exchange for quick context scan.\n\n---\n\n## Architectural summaries\n\n`cognirepo init` automatically prompts to run `cognirepo summarize` after the first index.\nThis produces a 3-level LLM summary of the entire codebase:\n\n- **Level 1** — repo-wide summary (what the project does, key modules, entry points)\n- **Level 2** — per-directory summaries (what each package is responsible for)\n- **Level 3** — per-file summaries (what each file contains, key functions/classes)\n\nSummaries are stored in `.cognirepo/index/summaries.json` and served via the\n`architecture_overview` MCP tool — zero token cost for Claude to understand the big picture.\n\n```bash\n# Auto-prompted on first init. Run manually anytime:\ncognirepo summarize\n\n# Fully local — no API key required. Reads from ast_index.json, runs in \u003c 1 second.\n# File summaries are also embedded into FAISS for semantic architecture queries.\n```\n\n---\n\n## Multi-model orchestration\n\n`cognirepo ask` automatically picks the right model for each query:\n\n| Tier | Score | Default model | Use case |\n|------|-------|---------------|----------|\n| **QUICK** | ≤2 | local resolver | Single-token / trivial — zero API, fastest path |\n| **STANDARD** | ≤4 | Haiku | Quick lookup, factual, single symbol |\n| **COMPLEX** | ≤9 | Sonnet | Moderate reasoning |\n| **EXPERT** | \u003e9 | Opus | Cross-file, architectural, ambiguous — full context, best model |\n\n```bash\ncognirepo ask \"where is verify_token defined?\"       # → QUICK, answered locally\ncognirepo ask \"why is auth slow?\"                    # → EXPERT, Claude with full context\ncognirepo ask --verbose \"explain the circuit breaker\"  # show tier/score/signals\n```\n\nProvider fallback chain: Grok → Gemini → Anthropic → OpenAI.\nAll errors are logged to `.cognirepo/errors/\u003cdate\u003e.log` — no raw tracebacks shown to users.\n\n---\n\n## Language support\n\n| Language | Extensions | Install |\n|----------|------------|---------|\n| Python | `.py` | built-in |\n| JavaScript / TypeScript | `.js` `.ts` `.jsx` `.tsx` | `cognirepo[languages]` |\n| Java | `.java` | `cognirepo[languages]` |\n| Go | `.go` | `cognirepo[languages]` |\n| Rust | `.rs` | `cognirepo[languages]` |\n| C / C++ | `.c` `.cpp` `.h` | `cognirepo[languages]` |\n\nFull details and roadmap: [docs/LANGUAGES.md](docs/LANGUAGES.md)\n\n---\n\n## Storage layout\n\n```\n.cognirepo/\n  config.json              ← project settings (project_id, model, retrieval weights)\n  vector_db/\n    semantic.index         ← FAISS flat index for semantic memory\n    ast.index              ← FAISS IndexIDMap2 for code symbols\n    ast_metadata.json      ← parallel metadata for ast.index rows\n  graph/\n    graph.pkl              ← NetworkX DiGraph (optionally Fernet-encrypted)\n    behaviour.json         ← per-symbol hit counts, user profile, error patterns\n  index/\n    ast_index.json         ← reverse symbol index + file records\n    manifest.json          ← git SHA + platform info for integrity checks\n    summaries.json         ← LLM architectural summaries (Level 1–3)\n  memory/\n    episodic.json          ← append-only event journal\n  sessions/\n    \u003cuuid\u003e.json            ← conversation session files\n    current.json           ← pointer to most-recent session\n  errors/\n    \u003cdate\u003e.log             ← daily error logs (full tracebacks, never shown to users)\n  learnings/\n    learnings.json         ← structured learnings: decisions, bugs, prod issues\n```\n\nEverything under `.cognirepo/` is `.gitignore`d by default — never committed.\nFernet encryption is opt-in at `storage.encrypt: true` in `config.json`.\n\n---\n\n## CLI reference\n\n```bash\n# Setup\ncognirepo init                  # scaffold + configure; auto-indexes + auto-summarizes\ncognirepo setup-env             # interactive API key wizard\ncognirepo test-connection       # test API key connectivity\ncognirepo migrate-config        # migrate deprecated config keys\n\n# Indexing\ncognirepo index-repo [path]     # AST-index a codebase\ncognirepo summarize             # generate LLM architectural summaries (auto-prompted on init)\ncognirepo seed --from-git       # seed behaviour weights from git history\ncognirepo verify-index          # verify AST index integrity\ncognirepo coverage              # per-directory symbol counts\n\n# Querying\ncognirepo ask \u003cquery\u003e           # route through multi-model orchestrator\ncognirepo retrieve-memory \u003cq\u003e   # similarity search\ncognirepo search-docs \u003cq\u003e       # full-text search in .md files\ncognirepo log-episode \u003cevent\u003e   # append episodic event\ncognirepo history               # print recent episodic events\ncognirepo sessions              # list recent conversation sessions\n\n# Memory management\ncognirepo store-memory \u003ctext\u003e   # save a semantic memory\ncognirepo user-prefs            # view/set global user preferences\ncognirepo prune [--dry-run]     # prune low-score memories\n\n# Health \u0026 monitoring\ncognirepo prime                 # generate session bootstrap brief\ncognirepo status                # live retrieval signal weights + index health\ncognirepo doctor [--fix]        # full health check; --fix auto-repairs common issues\ncognirepo benchmark             # run quantitative value benchmarks\n\n# Organization\ncognirepo org create \u003cname\u003e     # create local organization\ncognirepo org link \u003corg\u003e [path] # link repo to organization\ncognirepo org list              # list organizations\n\n# Daemon management\ncognirepo list                  # list MCP servers, running daemons\ncognirepo watch                 # manage background file-watcher daemon\n```\n\n---\n\n## Future Plans\n\nPriorities drawn from the v0.3.0 benchmark findings and community feedback.\n\n### Near-term (v0.3.0)\n- **Go call-graph indexing** — tree-sitter-go grammar is loaded but call extraction is incomplete; Moby/Kubernetes tests (MO-3-5, K8-*) could not be completed without it. Adding Go-aware `who_calls` and IMPORTS edges is the single highest-impact unblocked item.\n- **`cognirepo ask`** — multi-model orchestrator (QUICK/STANDARD/COMPLEX/EXPERT tiers). Initial implementation stubbed in v0.2.0; orchestrator logic is implemented in `orchestrator/` and being wired to a working API key flow in v0.3.0.\n- **Incremental re-index on save** — file-watcher daemon exists (`cognirepo watch`) but re-index on write is not yet debounced correctly; large repos see spurious full re-indexes.\n- **CLAUDE.md mandatory-call relaxation** — benchmark feedback (Moby tests) flagged that forcing `context_pack` before every file read adds latency under memory pressure. Will add a `--fast` mode that skips the tool-first gate for files under 50 lines.\n\n### Medium-term (v0.4.0)\n- **Kubernetes / 2M-LOC scale validation** — K8-1 through K8-5 test suite not yet completed. Goal: full scheduling-decision trace at \u003c 8 000 tokens with CogniRepo vs. \u003e 50 000 without.\n- **Plugin-registry pattern detection** — Ansible AN-3/AN-4 (22-level variable precedence, strategy plugins) and Celery CE-3 (dynamic dispatch) returned NA. Plan: static heuristic pass that detects `register`, `entry_points`, and `__init_subclass__` patterns and annotates them as `DYNAMIC_DISPATCH` nodes in the graph.\n- **BM25 over symbol names** — current keyword search uses exact-word reverse index; adding BM25 TF-IDF ranking over symbol names and docstrings would improve partial-match recall (e.g. `HttpClient` matching `http_client`).\n- **Cross-session memory warm-up** — Ansible benchmark noted episodic/memory retrieval is low-value on fresh sessions. `cognirepo prime` exists but is not run automatically on `init`; will make it opt-in default.\n\n### Longer-term\n- **`cognirepo ask` streaming REPL** — full interactive session with tier routing, session persistence, and sub-agent delegation.\n- **Ruby, PHP, C#, Swift grammar support** — tree-sitter grammars exist; need `_TS_FUNCTION_TYPES`/`_TS_CLASS_TYPES` mappings and call-extraction rules per language.\n- **Similarity edges in knowledge graph** — embedding-distance clustering to connect semantically related symbols across files (not yet implemented).\n- **VS Code / JetBrains extension** — surface `lookup_symbol`, `context_pack`, and `who_calls` directly in the editor sidebar without requiring an MCP-capable host.\n\n---\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | System design, component responsibilities, data flow |\n| [docs/architecture/SPECIFICATION.md](docs/architecture/SPECIFICATION.md) | Technical spec, complexity signals, storage layout |\n| [docs/USAGE.md](docs/USAGE.md) | Complete CLI, MCP, and Docker reference |\n| [docs/METRICS.md](docs/METRICS.md) | Quantitative benchmarks: token reduction, lookup speedup, recall |\n| [CONTRIBUTING.md](CONTRIBUTING.md) | How to add adapters, tools, and language support |\n| [SECURITY.md](SECURITY.md) | Vulnerability reporting, data handling, trust model |\n| [docs/LANGUAGES.md](docs/LANGUAGES.md) | Language support details and roadmap |\n\n---\n\n## License\n\nCogniRepo is licensed under the **MIT License**.\n\n- Free to use, study, modify, and distribute\n- Use in proprietary products and commercial services — no restrictions\n- No requirement to open-source your application\n\nSee [LICENSE](LICENSE) for full details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashlesh-t%2Fcognirepo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashlesh-t%2Fcognirepo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashlesh-t%2Fcognirepo/lists"}