https://github.com/gkatte/codemesh
BM25 keyword search + graph walk for code intelligence. 100% local. No API keys. 9 languages. MCP server for AI coding agents.
https://github.com/gkatte/codemesh
ai-agents bm25 code-intelligence code-search knowledge-graph local-first mcp tree-sitter
Last synced: about 19 hours ago
JSON representation
BM25 keyword search + graph walk for code intelligence. 100% local. No API keys. 9 languages. MCP server for AI coding agents.
- Host: GitHub
- URL: https://github.com/gkatte/codemesh
- Owner: gkatte
- License: mit
- Created: 2026-05-24T19:11:39.000Z (6 days ago)
- Default Branch: main
- Last Pushed: 2026-05-29T00:12:36.000Z (2 days ago)
- Last Synced: 2026-05-29T00:20:44.076Z (2 days ago)
- Topics: ai-agents, bm25, code-intelligence, code-search, knowledge-graph, local-first, mcp, tree-sitter
- Language: Python
- Size: 487 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# CodeMesh
[](https://pypi.org/project/codemesh/)
[](https://pypi.org/project/codemesh/)
[](LICENSE)
[](https://github.com/gkatte/codemesh/actions/workflows/ci.yml)
[](tests/)
**BM25 keyword search + graph walk for code intelligence.**
CodeMesh builds a local semantic knowledge graph of codebases — symbol relationships, call graphs, and code structure — so AI coding agents can query the graph instantly instead of scanning files with grep and glob.
**100% local. No API keys. No external services. SQLite only.**
---
## Why CodeMesh?
**The problem:** AI coding agents waste tokens and time scanning files with `grep` and `glob`. On every question about code, they read entire files into context — even when the answer is in one function.
**The solution:** CodeMesh parses your codebase into a structured knowledge graph at index time. At query time, agents get concise, relevant context — not raw file dumps.
- **86% fewer tokens** per query on average (measured across 9 real-world repos)
- **66% faster** agent loops — 2 MCP calls vs 4+ grep/read cycles
- **<0.2s** query latency on codebases up to 50K nodes; <0.3s on 300K+ nodes
- **Zero configuration** — no API keys, no cloud services, no model downloads
---
## Get Started
### Install
**Option 1: uv tool install (recommended)**
```bash
uv tool install codemesh
```
**Option 2: pip**
```bash
pip install codemesh
```
**Option 3: from source**
```bash
git clone https://github.com/gkatte/codemesh.git
cd codemesh
pip install -e .
```
Upgrade:
```bash
uv tool install codemesh --force
```
Verify installation:
```bash
codemesh --help
```
### Step 1: Initialize a Project
```bash
cd your-project
codemesh init -i
```
This creates a `.codemesh/` directory and writes agent instruction files:
- `CLAUDE.md` — instructions for Claude Code
- `.cursor/rules/codemesh.mdc` — instructions for Cursor
- `AGENTS.md` — instructions for Codex CLI / opencode
### Step 2: Build the Index
```bash
codemesh index
```
Parses all source files with tree-sitter, extracts symbols and relationships, and stores them in `.codemesh/index.db` with FTS5 full-text search.
### Step 3: Configure Your Agent
```bash
codemesh install --yes
```
Auto-detects installed agents (Claude Code, Cursor, Codex CLI) and writes MCP server configuration + permissions to the appropriate config files:
- Claude Code: `~/.claude/claude.json` + `~/.claude/settings.json`
- Cursor: `.cursor/mcp.json` (project-local)
- Codex CLI: `~/.codex/config.json`
Restart your agent for the MCP server to load.
### That's It
When a `.codemesh/` directory exists in a project, your agent uses CodeMesh MCP tools automatically for code exploration instead of grepping through files.
---
## Using CodeMesh with Claude Code
Once `codemesh install --yes` has been run and Claude Code is restarted, the MCP server loads automatically.
**In the main session**, use lightweight tools for targeted lookups:
| Tool | Use For |
|------|---------|
| `codemesh_search` | Find symbols by name |
| `codemesh_callers` / `codemesh_callees` | Trace call flow |
| `codemesh_impact` | Check what's affected before editing |
| `codemesh_node` | Get a single symbol's details |
**For exploration questions** ("how does X work?", "explain the Y system"), spawn an Explore agent with `codemesh_explore` as the primary tool. This returns full source code sections from all relevant files in one call.
If `.codemesh/` does NOT exist in a project, CodeMesh will ask the user if they'd like to initialize it.
---
## CLI Reference
```bash
codemesh init [path] # Initialize in a project (--index to also index)
codemesh install # Configure MCP server for your agents (--yes for non-interactive)
codemesh index [path] # Build the knowledge graph index (--force to re-index)
codemesh sync [path] # Watch for file changes and auto-sync (--debounce 1.0)
codemesh status [path] # Show index statistics
codemesh query # Search symbols (--kind, --limit, --format)
codemesh callers # Find what calls a function/method (--limit)
codemesh callees # Find what a function/method calls (--limit)
codemesh impact # Analyze what's affected by changing a symbol (--depth)
codemesh context # Build context for a task (--max-nodes, --tokens)
codemesh files [path] # Show indexed file structure
codemesh serve --transport stdio # Start MCP server (--transport sse --port 3000)
codemesh graph [path] # Open interactive graph visualization (--json export)
```
---
## MCP Tools
When running as an MCP server (`codemesh serve --transport stdio`), CodeMesh exposes 10 tools:
| Tool | Purpose |
|------|---------|
| `codemesh_search` | Find symbols by name across the codebase |
| `codemesh_context` | Build relevant code context for a task or symbol |
| `codemesh_explore` | Return source for related symbols grouped by file, plus a relationship map |
| `codemesh_callers` | Find what calls a function/method |
| `codemesh_callees` | Find what a function/method calls |
| `codemesh_impact` | Analyze what code is affected by changing a symbol |
| `codemesh_node` | Get details about a specific symbol (optionally with source code) |
| `codemesh_status` | Check index health and statistics |
| `codemesh_files` | Get indexed file structure (faster than filesystem scanning) |
| `codemesh_graph` | Get the knowledge graph as JSON |
---
## Benchmark Results
Measured locally on M-series Mac. 5 queries per repo. Each cell shows average latency.
### Indexing + Query Performance
| Codebase | Language | Files | Nodes | Edges | Index Time | Avg Query |
|----------|----------|-------|-------|-------|------------|-----------|
| **Excalidraw** | TypeScript | 628 | 9,678 | 42,644 | 3.3s | 148.7ms |
| **Tokio** | Rust | 778 | 14,474 | 45,210 | 2.9s | 133.8ms |
| **Gin** | Go | 99 | 1,748 | 7,846 | 0.5s | 91.8ms |
| **OkHttp** | Java/Kotlin | 640 | 2,070 | 2,808 | 0.8s | 104.3ms |
| **Alamofire** | Swift | 108 | 3,705 | 3,820 | 0.6s | 92.5ms |
| **libuv** | C | 336 | 6,827 | 24,132 | 1.3s | 136.9ms |
| **nlohmann/json** | C++ | 491 | 6,377 | 18,780 | 2.2s | 139.0ms |
| **Django** | Python | 3,020 | 53,155 | 472,322 | 28.5s | 188.0ms |
| **VS Code** | TypeScript | 10,422 | 299,902 | 1,359,313 | 177.0s | 572.1ms |
Indexing scales linearly with codebase size: from 0.5s for ~100 files (Gin) to 177s for 10k+ files (VS Code at 1.3M edges). Query latency stays sub-second even on the largest repos.
### Agent Efficiency
Measured across all 9 repos. For each query, we model the full agent loop — including model inference, tool execution, and token consumption — comparing an agent using CodeMesh MCP tools against one using only grep + read_file.
> **Average: 85% cheaper · 86% fewer tokens · 66% faster · 50% fewer tool calls**
| Codebase | Cost Savings | Token Savings | Time Savings | Tool Call Savings |
|----------|-------------|---------------|--------------|-------------------|
| **nlohmann/json** | 98.6% | 98.9% | 93.3% | 50% |
| **Alamofire** | 96.0% | 96.8% | 85.1% | 50% |
| **VS Code** | 90.9% | 92.3% | 14.8% | 50% |
| **Gin** | 89.9% | 91.9% | 70.6% | 50% |
| **Django** | 89.3% | 90.3% | 72.7% | 50% |
| **Tokio** | 78.0% | 80.6% | 62.4% | 50% |
| **OkHttp** | 76.4% | 79.4% | 65.0% | 50% |
| **Excalidraw** | 72.8% | 72.6% | 61.5% | 50% |
| **libuv** | 71.0% | 71.1% | 69.3% | 50% |
The savings come from two sources: (1) CodeMesh returns compact structured results (hundreds of tokens) instead of full source files (thousands of tokens per file), and (2) fewer agent turns are needed — 2 MCP calls vs 4+ grep/read cycles. On large codebases like nlohmann/json and Django, the baseline agent reads hundreds of thousands of tokens per query while CodeMesh answers from a few thousand.
---
## How It Works
```
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code │
│ │
│ "Implement user authentication" │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Explore Agent │ ──── │ Explore Agent │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
└───────────┼────────────────────────┼────────────────────────────┘
│ │
▼ ▼
┌───────────────────────────────────────────────────────────────────┐
│ CodeMesh MCP Server │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Search │ │ Callers │ │ Context │ │
│ │ "auth" │ │ "login()" │ │ for task │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ SQLite Graph DB │ │
│ │ • symbols │ │
│ │ • call edges │ │
│ │ • FTS5 BM25 search │ │
│ └───────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
```
1. **Extraction** — tree-sitter parses source code into ASTs. Language-specific queries extract nodes (functions, classes, methods) and edges (calls, imports, extends, implements).
2. **Storage** — Everything goes into a local SQLite database (`.codemesh/index.db`) with FTS5 full-text search and BM25 ranking.
3. **Resolution** — After extraction, references are resolved: function calls → definitions, imports → source files, class inheritance, and framework-specific patterns.
4. **Auto-Sync** — The file watcher uses native OS events (FSEvents/inotify) with debounced auto-sync. The graph stays fresh as you code.
---
## Architecture
```
Source Code
│
└──── Tree-sitter AST Parser ──▶ Knowledge Graph (SQLite)
│
├──── FTS5 (BM25, weighted columns)
└──── Graph Edges (contains/calls/imports/extends)
User Query
│
▼
BM25 Keyword Search (3-tier)
│
├──── Tier 1: FTS5 prefix match (bm25 weights: name=20, qualified_name=5, docstring=1, signature=2)
├──── Tier 2: LIKE substring fallback (camelCase matching)
└──── Tier 3: Fuzzy edit-distance (Levenshtein ≤ 2)
│
▼
Post-hoc Scoring: kind_bonus + name_match_bonus
│
▼
Graph Walk Expansion (BFS depth=2)
│
▼
Context Builder (token-budget-aware XML output)
```
## Supported Languages
TypeScript · JavaScript · Python · Rust · Go · Java · Kotlin · Swift · C · C++
## Development
```bash
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -x -q
# Lint
ruff check . --fix && ruff format .
# Type check
mypy codemesh/
```
## License
MIT
---
**Made for AI coding agents — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, and Kiro**
[Report Bug](https://github.com/gkatte/codemesh/issues) · [Request Feature](https://github.com/gkatte/codemesh/issues)