{"id":49029285,"url":"https://github.com/MikeRecognex/mcp-codebase-index","last_synced_at":"2026-06-03T16:00:47.302Z","repository":{"id":338654292,"uuid":"1158555340","full_name":"MikeRecognex/mcp-codebase-index","owner":"MikeRecognex","description":"17 MCP query tools for codebase navigation — functions, classes, imports, dependency graphs, change impact. Zero dependencies. 87% token reduction.","archived":false,"fork":false,"pushed_at":"2026-02-28T00:25:59.000Z","size":175,"stargazers_count":53,"open_issues_count":6,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-30T22:10:26.673Z","etag":null,"topics":["ai-coding","ast-parser","claude-code","code-analysis","code-navigation","codebase-indexer","dependency-graph","mcp","mcp-server","model-context-protocol","python","symbol-table","typescript"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MikeRecognex.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-15T15:13:54.000Z","updated_at":"2026-05-23T14:05:21.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MikeRecognex/mcp-codebase-index","commit_stats":null,"previous_names":["mikerecognex/mcp-codebase-index"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/MikeRecognex/mcp-codebase-index","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeRecognex%2Fmcp-codebase-index","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeRecognex%2Fmcp-codebase-index/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeRecognex%2Fmcp-codebase-index/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeRecognex%2Fmcp-codebase-index/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MikeRecognex","download_url":"https://codeload.github.com/MikeRecognex/mcp-codebase-index/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeRecognex%2Fmcp-codebase-index/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33872298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-coding","ast-parser","claude-code","code-analysis","code-navigation","codebase-indexer","dependency-graph","mcp","mcp-server","model-context-protocol","python","symbol-table","typescript"],"created_at":"2026-04-19T09:00:40.488Z","updated_at":"2026-06-03T16:00:47.296Z","avatar_url":"https://github.com/MikeRecognex.png","language":"Python","funding_links":[],"categories":["Repository \u0026 Code Analysis Mcp Servers","📦 Other"],"sub_categories":[],"readme":"\u003c!-- mcp-name: io.github.MikeRecognex/mcp-codebase-index --\u003e\n# mcp-codebase-index\n\n[![PyPI version](https://img.shields.io/pypi/v/mcp-codebase-index)](https://pypi.org/project/mcp-codebase-index/)\n[![CI](https://github.com/MikeRecognex/mcp-codebase-index/actions/workflows/ci.yml/badge.svg)](https://github.com/MikeRecognex/mcp-codebase-index/actions/workflows/ci.yml)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![MCP](https://img.shields.io/badge/MCP-compatible-purple.svg)](https://modelcontextprotocol.io)\n[![Zero Dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen.svg)]()\n\nA structural codebase indexer with an [MCP](https://modelcontextprotocol.io) server for AI-assisted development. Zero runtime dependencies — uses Python's `ast` module for Python analysis and regex-based parsing for TypeScript/JS, Go, Rust, and C#. Requires Python 3.11+.\n\n## What It Does\n\nIndexes codebases by parsing source files into structural metadata -- functions, classes, imports, dependency graphs, and cross-file call chains -- then exposes 18 query tools via the Model Context Protocol, enabling Claude Code and other MCP clients to navigate codebases efficiently without reading entire files.\n\n**Automatic incremental re-indexing:** In git repositories, the index stays up to date automatically. Before every query, the server checks `git diff` and `git status` (~1-2ms). If files changed, only those files are re-parsed and the dependency graph is rebuilt. No need to manually call `reindex` after edits, branch switches, or pulls.\n\n**Persistent disk cache:** The index is saved to a pickle cache file (`.codebase-index-cache.pkl`) after every build. On subsequent server starts, the cache is loaded and validated against the current git HEAD — if the ref matches, startup is instant. If a small number of files changed (≤20), the cached index is loaded and incrementally updated instead of rebuilt from scratch. This eliminates the cold-start penalty when restarting Claude Code sessions, restarting the MCP server, or resuming work after context compaction.\n\n## Language Support\n\n| Language | Method | Extracts |\n|----------|--------|----------|\n| Python (`.py`) | AST parsing | Functions, classes, methods, imports, dependency graph |\n| TypeScript/JS (`.ts`, `.tsx`, `.js`, `.jsx`) | Regex-based | Functions, arrow functions, classes, interfaces, type aliases, imports |\n| Go (`.go`) | Regex-based | Functions, methods (receiver-based), structs, interfaces, type aliases, imports, doc comments |\n| Rust (`.rs`) | Regex-based | Functions (`pub`/`async`/`const`/`unsafe`), structs, enums, traits, impl blocks, use statements, attributes, doc comments, macro_rules |\n| C# (`.cs`) | Regex-based | Classes, interfaces, structs, enums, records, methods, constructors, using directives, `[Attributes]`, `///` XML doc comments |\n| Markdown/Text (`.md`, `.txt`, `.rst`) | Heading detection | Sections (# headings, underlines, numbered, ALL-CAPS) |\n| Other | Generic | Line counts only |\n\n## Installation\n\n```bash\npip install \"mcp-codebase-index[mcp]\"\n```\n\nThe `[mcp]` extra includes the MCP server dependency. Omit it if you only need the programmatic API.\n\nFor development (from a local clone):\n\n```bash\npip install -e \".[dev,mcp]\"\n```\n\n## MCP Server\n\n### Running\n\n```bash\n# As a console script\nPROJECT_ROOT=/path/to/project mcp-codebase-index\n\n# As a Python module\nPROJECT_ROOT=/path/to/project python -m mcp_codebase_index.server\n```\n\n`PROJECT_ROOT` specifies which directory to index. Defaults to the current working directory.\n\n### Persistent Cache\n\nIn git repositories, the server automatically caches the index to `.codebase-index-cache.pkl` in the project root. On startup:\n\n1. **Cache hit (exact match):** If the cached git ref matches the current HEAD, the index loads instantly from disk — no parsing, no file walking.\n2. **Cache hit (small changeset):** If ≤20 files changed since the cached ref, the cached index is loaded and incrementally updated on the first query.\n3. **Cache miss:** If the changeset is large or no cache exists, a full rebuild runs and saves a new cache.\n\nAdd `.codebase-index-cache.pkl` to your `.gitignore` — it's a local-only build artifact.\n\n### Configuring with OpenClaw\n\nInstall the package on the machine where OpenClaw is running:\n\n```bash\n# Local install\npip install \"mcp-codebase-index[mcp]\"\n\n# Or inside a Docker container / remote VPS\ndocker exec -it openclaw bash\npip install \"mcp-codebase-index[mcp]\"\n```\n\nAdd the MCP server to your OpenClaw agent config (`openclaw.json`):\n\n```json\n{\n  \"agents\": {\n    \"list\": [{\n      \"id\": \"main\",\n      \"mcp\": {\n        \"servers\": [\n          {\n            \"name\": \"codebase-index\",\n            \"command\": \"mcp-codebase-index\",\n            \"env\": {\n              \"PROJECT_ROOT\": \"/path/to/project\"\n            }\n          }\n        ]\n      }\n    }]\n  }\n}\n```\n\nRestart OpenClaw and verify the connection:\n\n```bash\nopenclaw mcp list\n```\n\nAll 18 tools will be available to your agent.\n\n**Performance note:** The server automatically detects file changes via `git diff` before every query (~1-2ms) and incrementally re-indexes only what changed. However, OpenClaw's default MCP integration via mcporter spawns a fresh server process per tool call, which discards the in-memory index and forces a full rebuild each time (~1-2s for small projects, longer for large ones). With persistent caching, these cold starts are now significantly faster — the server loads from the disk cache instead of re-parsing the entire codebase. For persistent connections (avoiding even the cache load overhead), use the [openclaw-mcp-adapter](https://github.com/androidStern-personal/openclaw-mcp-adapter) plugin, which connects once at startup and keeps the server running:\n\n```bash\npip install openclaw-mcp-adapter\n```\n\n### Configuring with Claude Code\n\nAdd to your project's `.mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"codebase-index\": {\n      \"command\": \"mcp-codebase-index\",\n      \"env\": {\n        \"PROJECT_ROOT\": \"/path/to/project\"\n      }\n    }\n  }\n}\n```\n\nOr using the Python module directly (useful if installed in a virtualenv):\n\n```json\n{\n  \"mcpServers\": {\n    \"codebase-index\": {\n      \"command\": \"/path/to/.venv/bin/python3\",\n      \"args\": [\"-m\", \"mcp_codebase_index.server\"],\n      \"env\": {\n        \"PROJECT_ROOT\": \"/path/to/project\"\n      }\n    }\n  }\n}\n```\n\n#### Reinforcing Tool Usage with Hooks\n\nClaude Code tends to default to built-in Glob/Grep/Read tools even when codebase-index is available. In addition to CLAUDE.md instructions (see below), you can add hooks that fire on every prompt to reinforce the behavior. Add this to `.claude/settings.local.json`:\n\n```json\n{\n  \"hooks\": {\n    \"SessionStart\": [\n      {\n        \"hooks\": [\n          {\n            \"type\": \"command\",\n            \"command\": \"echo 'CRITICAL REMINDER: Use codebase-index MCP tools FIRST for ALL code navigation (find_symbol, get_function_source, search_codebase, get_dependencies, etc). Only fall back to Glob/Grep/Read for non-code files.'\"\n          }\n        ]\n      }\n    ],\n    \"UserPromptSubmit\": [\n      {\n        \"hooks\": [\n          {\n            \"type\": \"command\",\n            \"command\": \"echo 'Use codebase-index MCP tools first for code navigation.'\"\n          }\n        ]\n      }\n    ]\n  }\n}\n```\n\nHook stdout is injected as context Claude sees before responding. `SessionStart` fires on startup, resume, and context compaction. `UserPromptSubmit` fires on every turn.\n\n### Important: Make the AI Actually Use Indexed Tools\n\nBy default, AI assistants will ignore the indexed tools and fall back to reading entire files with Glob/Grep/Read. Soft language like \"prefer\" gets rationalized away. Add this to your project's `CLAUDE.md` (or equivalent instructions file) with **mandatory** language:\n\n```\n## Codebase Navigation — MANDATORY\n\nYou MUST use codebase-index MCP tools FIRST when exploring or navigating the codebase. This is not optional.\n\n- ALWAYS start with: get_project_summary, find_symbol, get_function_source, get_class_source,\n  get_structure_summary, get_dependencies, get_dependents, get_change_impact, get_call_chain, search_codebase\n- Only fall back to Read/Glob/Grep when codebase-index tools genuinely don't have what you need\n  (e.g. reading non-code files, config, frontmatter)\n- If you catch yourself reaching for Glob/Grep/Read to find or understand code, STOP and use\n  codebase-index instead\n```\n\nThe word \"prefer\" is too weak — models treat it as a suggestion and default to familiar tools. Mandatory language with explicit fallback criteria is what actually changes behavior.\n\n### Available Tools (18)\n\n| Tool | Description |\n|------|-------------|\n| `get_project_summary` | File count, packages, top classes/functions |\n| `list_files` | List indexed files with optional glob filter |\n| `get_structure_summary` | Structure of a file or the whole project |\n| `get_functions` | List functions with name, lines, params |\n| `get_classes` | List classes with name, lines, methods, bases |\n| `get_imports` | List imports with module, names, line |\n| `get_function_source` | Full source of a function/method |\n| `get_class_source` | Full source of a class |\n| `find_symbol` | Find where a symbol is defined (file, line, type) |\n| `get_dependencies` | What a symbol calls/uses |\n| `get_dependents` | What calls/uses a symbol |\n| `get_change_impact` | Direct + transitive dependents |\n| `get_call_chain` | Shortest dependency path (BFS) |\n| `get_file_dependencies` | Files imported by a given file |\n| `get_file_dependents` | Files that import from a given file |\n| `search_codebase` | Regex search across all files (max 100 results) |\n| `reindex` | Force full re-index (rarely needed — incremental updates happen automatically in git repos) |\n| `get_usage_stats` | Session efficiency stats: tool calls, characters returned vs total source, estimated token savings |\n\n## Benchmarks\n\nTested across four real-world projects on an M-series MacBook Pro, from a small project to CPython itself (1.1 million lines):\n\n### Index Build Performance\n\n| Project | Files | Lines | Functions | Classes | Index Time | Peak Memory |\n|---------|------:|------:|----------:|--------:|-----------:|------------:|\n| RMLPlus | 36 | 7,762 | 237 | 55 | 0.9s | 2.4 MB |\n| FastAPI | 2,556 | 332,160 | 4,139 | 617 | 5.7s | 55 MB |\n| Django | 3,714 | 707,493 | 29,995 | 7,371 | 36.2s | 126 MB |\n| **CPython** | **2,464** | **1,115,334** | **59,620** | **9,037** | **55.9s** | **197 MB** |\n\nWith persistent caching, subsequent startups bypass the full build entirely. Cache load time is negligible compared to parsing — a cache hit on CPython restores the full index in under a second instead of 56s.\n\n### Query Response Size vs Total Source\n\nQuerying CPython — 41 million characters of source code:\n\n| Query | Response | Total Source | Reduction |\n|-------|-------:|------------:|----------:|\n| `find_symbol(\"TestCase\")` | 67 chars | 41,077,561 chars | **99.9998%** |\n| `get_dependencies(\"compile\")` | 115 chars | 41,077,561 chars | **99.9997%** |\n| `get_change_impact(\"TestCase\")` | 16,812 chars | 41,077,561 chars | **99.96%** |\n| `get_function_source(\"compile\")` | 4,531 chars | 41,077,561 chars | **99.99%** |\n| `get_function_source(\"run_unittest\")` | 439 chars | 41,077,561 chars | **99.999%** |\n\n`find_symbol` returns 54-67 characters regardless of whether the project is 7K lines or 1.1M lines. Response size scales with the answer, not the codebase.\n\n`get_change_impact(\"TestCase\")` on CPython found **154 direct dependents and 492 transitive dependents** in 0.45ms — the kind of query that's impossible without a dependency graph. Use `max_direct` and `max_transitive` to cap output to your token budget.\n\n### Query Response Time\n\nAll targeted queries return in sub-millisecond time, even on CPython's 1.1M lines:\n\n| Query | RMLPlus | FastAPI | Django | CPython |\n|-------|--------:|--------:|-------:|--------:|\n| `find_symbol` | 0.01ms | 0.01ms | 0.03ms | 0.08ms |\n| `get_dependencies` | 0.00ms | 0.00ms | 0.00ms | 0.01ms |\n| `get_change_impact` | 0.02ms | 0.00ms | 2.81ms | 0.45ms |\n| `get_function_source` | 0.01ms | 0.02ms | 0.03ms | 0.10ms |\n\nRun the benchmarks yourself: `python benchmarks/benchmark.py`\n\n## How Is This Different from LSP?\n\nLSP answers \"where is this function?\" — mcp-codebase-index answers \"what happens if I change it?\" LSP is point queries: one symbol, one file, one position. It can tell you where `LLMClient` is defined and who references it. But ask \"what breaks transitively if I refactor `LLMClient`?\" and LSP has nothing. This tool returns 11 direct dependents and 31 transitive impacts in a single call — 204 characters. To get the same answer from LSP, the AI would need to chain dozens of find-reference calls recursively, reading files at every step, burning thousands of tokens to reconstruct what the dependency graph already knows.\n\nLSP also requires you to install a separate language server for every language in your project — pyright for Python, vtsls for TypeScript, gopls for Go. Each one is a heavyweight binary with its own dependencies and configuration. mcp-codebase-index is zero dependencies, handles Python + TypeScript/JS + Go + Rust + C# + Markdown out of the box, and every response has built-in token budget controls (`max_results`, `max_lines`). LSP was built for IDEs. This was built for AI.\n\n## Programmatic Usage\n\n```python\nfrom mcp_codebase_index.project_indexer import ProjectIndexer\nfrom mcp_codebase_index.query_api import create_project_query_functions\n\nindexer = ProjectIndexer(\"/path/to/project\", include_patterns=[\"**/*.py\"])\nindex = indexer.index()\nquery_funcs = create_project_query_functions(index)\n\n# Use query functions\nprint(query_funcs[\"get_project_summary\"]())\nprint(query_funcs[\"find_symbol\"](\"MyClass\"))\nprint(query_funcs[\"get_change_impact\"](\"some_function\"))\n```\n\n## Development\n\n```bash\npip install -e \".[dev,mcp]\"\npytest tests/ -v\nruff check src/ tests/\n```\n\n## References\n\nThe structural indexer was originally developed as part of the [RMLPlus](https://github.com/MikeRecognex/RMLPlus) project, an implementation of the [Recursive Language Models](https://arxiv.org/abs/2512.24601) framework.\n\n## License\n\nThis project is dual-licensed:\n\n- **AGPL-3.0** for open-source use — see [LICENSE](LICENSE)\n- **Commercial License** for proprietary use — see [COMMERCIAL-LICENSE.md](COMMERCIAL-LICENSE.md)\n\nIf you're using mcp-codebase-index as a standalone MCP server for development, the AGPL-3.0 license applies at no cost. If you're embedding it in a proprietary product or offering it as part of a hosted service, you'll need a commercial license. See [COMMERCIAL-LICENSE.md](COMMERCIAL-LICENSE.md) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMikeRecognex%2Fmcp-codebase-index","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMikeRecognex%2Fmcp-codebase-index","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMikeRecognex%2Fmcp-codebase-index/lists"}