An open API service indexing awesome lists of open source software.

https://github.com/justrach/codedb

High-performance MCP server, code graph engine & evolutionary algorithm platform in Zig. 33 tools: GitHub project management, agent swarm orchestration, iterative review-fix loops, blast radius analysis, and code navigation via Model Context Protocol.
https://github.com/justrach/codedb

Last synced: about 2 months ago
JSON representation

High-performance MCP server, code graph engine & evolutionary algorithm platform in Zig. 33 tools: GitHub project management, agent swarm orchestration, iterative review-fix loops, blast radius analysis, and code navigation via Model Context Protocol.

Awesome Lists containing this project

README

          


codedb


Release
License
Zig 0.15
Alpha
Ask DeepWiki

codedb

Code intelligence server for AI agents. Zig core. MCP native. Zero dependencies.


Structural indexing · Trigram search · Word index · Dependency graph · File watching · MCP + HTTP


Status ·
Install ·
Quick Start ·
MCP Tools ·
Benchmarks ·
Architecture ·
Data & Privacy ·
Building

---

## Status

> **Alpha software — API is stabilizing but may change**
>
> codedb works and is used daily in production AI workflows, but:
> - **Parser support** — Zig, C/C++, Python, TypeScript/JavaScript, Rust, Go, PHP, Ruby, HCL, R, Dart/Flutter
> - **Lightweight outline support** — Java, Kotlin, Svelte, Vue, Astro, shell, CSS/SCSS, SQL, protobuf, Fortran, LLVM IR, MLIR, and TableGen
> - **No auth** — HTTP server binds to localhost only
> - **Snapshot format** may change between versions
> - **MCP protocol** is JSON-RPC 2.0 over stdio (stable)

| What works today | What's in progress |
|--------------------------------------------------------|------------------------------------------|
| 16 MCP tools for full codebase intelligence | Deeper parser coverage and edge-case handling |
| Trigram v2: integer doc IDs, batch-accumulate, merge intersect | Incremental segment-based indexing |
| 538x faster than ripgrep on pre-indexed queries | WASM target for Cloudflare Workers |
| O(1) inverted word index for identifier lookup | Multi-project support |
| Structural outlines (functions, structs, imports) | mmap-backed trigram index |
| Reverse dependency graph | |
| Atomic line-range edits with version tracking | |
| Auto-registration in Claude, Codex, Gemini, Cursor | |
| Polling file watcher with filtered directory walker | |
| Portable snapshot for instant MCP startup | |
| Singleton MCP with PID lock + 1h idle timeout | |
| Sensitive file blocking (.env, credentials, keys) | |
| Codesigned + notarized macOS binaries | |
| SHA256 checksum verification in installer | |
| Cross-platform: macOS (ARM/x86), Linux (ARM/x86) | |

---

## ⚡ Install

```bash
curl -fsSL https://codedb.codegraff.com/install.sh | bash
```

Downloads the binary for your platform and auto-registers codedb as an MCP server in **Claude Code**, **Codex**, **Gemini CLI**, and **Cursor**. The installer prints the exact `codedb mcp` command it registered plus hook setup pointers for Codex and Claude Code.

### Updating or repairing an older install

If `codedb update` fails on an older release, rerun the installer:

```bash
curl -fsSL https://codedb.codegraff.com/install.sh | bash
```

This replaces the `codedb` binary with the latest GitHub Release and keeps your existing MCP registrations, config, caches, and snapshots. Use this path for any release whose built-in updater cannot fetch release checksums.

### v0.2.579 MCP hotfix and release checksums

This note applies to `v0.2.579` only. Earlier `v0.2.579` binaries were rebuilt
and re-uploaded on May 2, 2026 because they passed the normal Zig test suite but
missed an MCP end-to-end regression: after `codedb_index` reported success,
follow-up MCP queries could still see an empty in-memory project (`files: 0`,
`scan: loading_snapshot`, empty `tree`/`find`/`search`, or `file not indexed`).

The fixed `v0.2.579` release assets were rebuilt from source commit
`1b634f0ba5cd1072e9ca54cabf442b573e034f53`. The values below are SHA256
checksums for the uploaded binaries, not Git commit SHAs:

| Binary | SHA256 |
|--------|--------|
| `codedb-darwin-arm64` | `b5bddba01767e38e9723f28c7b3ff55370c4eda5f9e0e84172aaec1ff5094cb2` |
| `codedb-darwin-x86_64` | `cf2a9ec511f99fd839d2349cc17e671cd9566260cf601b8b23dd649665c22999` |
| `codedb-linux-arm64` | `955b0288c5cfb5c360f7b814cd3cc288ecc42c63a569f65fac358bd9454d788b` |
| `codedb-linux-x86_64` | `201dfe26bec33b3569c44a3d4893c51822bc793e06fab69fd93e81c0354232ee` |

If you installed `v0.2.579` before this hotfix, rerun the installer above so the
binary matches the final uploaded checksum for your platform.

| Platform | Binary | Signed |
|----------|--------|--------|
| macOS ARM64 (Apple Silicon) | `codedb-darwin-arm64` | ✅ codesigned + notarized |
| macOS x86_64 (Intel) | `codedb-darwin-x86_64` | ✅ codesigned + notarized |
| Linux ARM64 | `codedb-linux-arm64` | — |
| Linux x86_64 | `codedb-linux-x86_64` | — |

Or install manually from [GitHub Releases](https://github.com/justrach/codedb/releases/latest).

---

## ⚡ Quick Start

### As an MCP server (recommended)

After installing, codedb is automatically registered. Just open a project and the 16 MCP tools are available to your AI agent.

```bash
# Manual MCP start (auto-configured by install script)
codedb mcp /path/to/your/project
```

### As an HTTP server

```bash
codedb serve /path/to/your/project
# listening on localhost:7719
```

### CLI

```bash
codedb tree /path/to/project # file tree with symbol counts
codedb outline src/main.zig # symbols in a file
codedb find AgentRegistry # find symbol definitions
codedb search "handleAuth" # full-text search (trigram-accelerated)
codedb word Store # exact word lookup (inverted index, O(1))
codedb hot # recently modified files
```

---

## 🔧 MCP Tools

16 tools over the Model Context Protocol (JSON-RPC 2.0 over stdio):

| Tool | Description |
|------|-------------|
| `codedb_tree` | Full file tree with language, line counts, symbol counts |
| `codedb_outline` | Symbols in a file: functions, structs, imports, with line numbers |
| `codedb_symbol` | Find where a symbol is defined across the codebase |
| `codedb_search` | Trigram-accelerated full-text search (supports regex, scoped results) |
| `codedb_word` | O(1) inverted index word lookup |
| `codedb_hot` | Most recently modified files |
| `codedb_deps` | Reverse dependency graph (which files import this file) |
| `codedb_read` | Read file content (supports line ranges, hash-based caching) |
| `codedb_edit` | Apply line-range edits (replace, insert, delete — atomic writes) |
| `codedb_changes` | Changed files since a sequence number |
| `codedb_status` | Index status (file count, current sequence) |
| `codedb_snapshot` | Full pre-rendered JSON snapshot of the codebase |
| `codedb_bundle` | Batch multiple read-only queries in one call (max 20 ops) |
| `codedb_remote` | Query indexed public repos via api.wiki.codes — no local clone needed |
| `codedb_projects` | List all locally indexed projects on this machine |
| `codedb_index` | Index a local folder and create a codedb.snapshot |

### `codedb_remote` — Cloud Intelligence

Query any indexed public GitHub repo without cloning it. `codedb_remote` always uses `api.wiki.codes`; the old `codegraff` backend name is no longer a supported route. Omit `backend`, or keep `backend="wiki"` only for older prompts.

```
# Check what the remote slug supports
codedb_remote repo="vercel/next.js" action="actions"

# Get a compact directory summary instead of dumping a huge file list
codedb_remote repo="vercel/next.js" action="tree" expand=false

# Page a file tree by prefix and limit
codedb_remote repo="vercel/next.js" action="tree" prefix="packages/" limit=100

# Search for code in a dependency
codedb_remote repo="justrach/merjs" action="search" query="handleRequest"

# Read a small file slice
codedb_remote repo="openai/codex" action="read" path="codex-rs/core/src/codex.rs" lines="1-80"

# Exact symbol lookup
codedb_remote repo="justrach/codedb" action="symbol" query="buildSnapshot"

# Check dependency CVE evidence; scope can be runtime or all
codedb_remote repo="axios/axios" action="cves" scope="runtime"

# Raw wiki slugs are accepted for repos that are indexed that way
codedb_remote repo="chromium" action="policy"
```

**Remote actions:** `actions`, `tree`, `outline`, `search`, `read`, `symbol`, `policy`, `deps`, `score`, `cves`, `commits`, `branches`, `dep-history`

For Codex and Claude Code hook examples around `codedb_remote`, see [`docs/hooks-labs.md`](docs/hooks-labs.md).

**Note:** This tool calls `https://api.wiki.codes`. No API key required. The repo must already be indexed by the public service.

### CLI Commands

| Command | Description |
|---------|-------------|
| `codedb tree` | Show file tree with language and symbol counts |
| `codedb outline ` | List all symbols in a file |
| `codedb find ` | Find where a symbol is defined |
| `codedb search ` | Full-text search (trigram, case-insensitive) |
| `codedb search --regex ` | Regex search |
| `codedb word ` | Exact word lookup via inverted index |
| `codedb hot` | Recently modified files |
| `codedb snapshot` | Write codedb.snapshot to project root |
| `codedb serve` | HTTP daemon on :7719 |
| `codedb mcp [path]` | JSON-RPC/MCP server over stdio |
| `codedb update` | Self-update to the latest release; if it fails on an older build, rerun the curl installer above |
| `codedb nuke` | Uninstall codedb, remove caches/snapshots, and deregister MCP integrations |
| `codedb --version` | Print version |

**Options:** `--no-telemetry` (or set `CODEDB_NO_TELEMETRY` env var)

### Example: agent explores a codebase

```bash
# 1. Get the file tree
curl localhost:7719/tree
# → src/main.zig (zig, 55L, 4 symbols)
# src/store.zig (zig, 156L, 12 symbols)
# src/agent.zig (zig, 135L, 8 symbols)

# 2. Drill into a file
curl "localhost:7719/outline?path=src/store.zig"
# → L20: struct_def Store
# L30: function init
# L55: function recordSnapshot

# 3. Find a symbol across the codebase
curl "localhost:7719/symbol?name=AgentRegistry"
# → {"path":"src/agent.zig","line":30,"kind":"struct_def"}

# 4. Full-text search
curl "localhost:7719/search?q=handleAuth&max=10"

# 5. Check what changed
curl "localhost:7719/changes?since=42"
```

---

## 📊 Benchmarks

Measured on Apple M4 Pro, 48GB RAM. MCP = pre-indexed warm queries (20 iterations avg). CLI/external tools include process startup (3 iterations avg). Ground truth verified against Python reference implementation.

### Latency — codedb MCP vs codedb CLI vs ast-grep vs ripgrep vs grep

**codedb repo** (20 files, 12.6k lines):

| Query | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | MCP speedup |
|-------|-----------|-----------|----------|---------|------|-------------|
| File tree | **0.04 ms** | 52.9 ms | — | — | — | **1,253x** vs CLI |
| Symbol search (`init`) | **0.10 ms** | 54.1 ms | 3.2 ms | 6.3 ms | 6.5 ms | **549x** vs CLI |
| Full-text search (`allocator`) | **0.05 ms** | 60.7 ms | 3.2 ms | 5.3 ms | 6.6 ms | **1,340x** vs CLI |
| Word index (`self`) | **0.04 ms** | 59.7 ms | n/a | 7.2 ms | 6.5 ms | **1,404x** vs CLI |
| Structural outline | **0.05 ms** | 53.5 ms | 3.1 ms | — | 2.4 ms | **1,143x** vs CLI |
| Dependency graph | **0.05 ms** | 2.2 ms | n/a | n/a | n/a | **45x** vs CLI |

**merjs repo** (100 files, 17.3k lines):

| Query | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | MCP speedup |
|-------|-----------|-----------|----------|---------|------|-------------|
| File tree | **0.05 ms** | 54.0 ms | — | — | — | **1,173x** vs CLI |
| Symbol search (`init`) | **0.07 ms** | 54.4 ms | 3.4 ms | 6.3 ms | 3.6 ms | **758x** vs CLI |
| Full-text search (`allocator`) | **0.03 ms** | 54.1 ms | 2.9 ms | 5.1 ms | 3.7 ms | **1,554x** vs CLI |
| Word index (`self`) | **0.04 ms** | 54.7 ms | n/a | 6.3 ms | 4.2 ms | **1,518x** vs CLI |
| Structural outline | **0.04 ms** | 54.9 ms | 3.4 ms | — | 2.5 ms | **1,243x** vs CLI |

**rtk-ai/rtk repo** (329 files) — codedb vs rtk vs ripgrep vs grep:

| Tool | Search "agent" | Speedup |
|------|---------------|---------|
| codedb (pre-indexed) | **0.065 ms** | baseline |
| rtk | 37 ms | 569x slower |
| ripgrep | 45 ms | 692x slower |
| grep | 80 ms | 1,231x slower |

### Token Efficiency

codedb returns structured, relevant results — not raw line dumps. For AI agents, this means dramatically fewer tokens per query:

| Repo | codedb MCP | ripgrep / grep | Reduction |
|------|-----------|---------------|-----------|
| codedb (search `allocator`) | ~20 tokens | ~32,564 tokens | **1,628x fewer** |
| merjs (search `allocator`) | ~20 tokens | ~4,007 tokens | **200x fewer** |

### Indexing Speed

codedb v0.2.57 uses worker-local parallel scan with deterministic merge — each worker builds its own partial index, then results are merged on the main thread:

| Repo | Files | Cold start | Per file | vs v0.2.56 |
|------|-------|-----------|----------|-----------|
| codedb | 20 | **17 ms** | 0.85 ms | — |
| merjs | 100 | **16 ms** | 0.16 ms | — |
| 5,200 mixed files | 5,200 | **310 ms** | 0.06 ms | — |
| [openclaw/openclaw](https://github.com/openclaw/openclaw) | 6,315 | **346 ms** | 0.05 ms | **10× faster** |

Indexes are built once on startup. After that, the file watcher keeps them updated incrementally (single-file re-index: **<2ms**). Queries never re-scan the filesystem. For repos >1000 files, file contents are released after indexing to save ~300-500MB.

### Background Resource Usage (`openclaw`, 6,315 files, Apple M4 Pro)

| Metric | v0.2.56 | v0.2.57 | Delta |
|--------|---------|---------|-------|
| Steady-state RSS | 1,867 MB | 1,706 MB | −161 MB |
| `git` subprocesses / min (idle) | ~30 | ~0 | **mtime-gated** |

The watcher now stats `.git/HEAD` mtime before forking `git rev-parse HEAD`. On an idle repo the subprocess never fires.
### Why codedb is fast

- **MCP server** indexes once on startup → all queries hit in-memory data structures (O(1) hash lookups)
- **CLI** pays ~55ms process startup + full filesystem scan on every invocation
- **ast-grep** re-parses all files through tree-sitter on every call (~3ms)
- **ripgrep/grep** brute-force scan every file on every call (~5-7ms)
- The MCP advantage: **index once, query thousands of times at sub-millisecond latency**

### Feature Matrix

| Feature | codedb MCP | codedb CLI | ast-grep | ripgrep | grep | ctags |
|---------|-----------|-----------|----------|---------|------|-------|
| Structural parsing | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| Trigram search index | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Inverted word index | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Dependency graph | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Version tracking | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Multi-agent locking | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Pre-indexed (warm) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| No process startup | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| MCP protocol | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Full-text search | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Atomic file edits | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| File watcher | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |

> **codedb = tree-sitter + search index + dependency graph + agent runtime.** Zero external dependencies. Pure Zig. Single binary.

---

## 🏗️ Architecture

```
┌─────────────┐ ┌─────────────┐
│ HTTP :7719 │ │ MCP stdio │
│ server.zig │ │ mcp.zig │
└──────┬──────┘ └──────┬──────┘
│ │
└───────┬───────────┘

┌──────────▼──────────┐
│ Explorer │
│ explore.zig │
│ ┌───────────────┐ │
│ │ WordIndex │ │
│ │ TrigramIndex │ │
│ │ Outlines │ │
│ │ Contents │ │
│ │ DepGraph │ │
│ └───────────────┘ │
└──────────┬──────────┘

┌──────────▼──────────┐
│ Store │──── data.log
│ store.zig │
└──────────┬──────────┘

┌──────────▼──────────┐
│ Watcher │ ← polls every 2s
│ watcher.zig │
│ (FilteredWalker) │
└─────────────────────┘
```

**No SQLite. No dependencies.** Purpose-built data model:

- **Explorer** — structural index engine. Parses Zig, Python, TypeScript/JavaScript, Rust, Go, PHP, Ruby, HCL, R, and Dart. Maintains outlines, trigram index, inverted word index, content cache, and dependency graph behind a single mutex.
- **Store** — append-only version log. Every mutation (snapshot, edit, delete) gets a monotonically increasing sequence number. Version history capped at 100 per file.
- **Watcher** — polling file watcher (2s interval). `FilteredWalker` prunes `.git`, `node_modules`, `zig-cache`, `__pycache__`, etc. before descending.
- **Agents** — first-class structs with cursors, heartbeats, and exclusive file locks. Stale agents reaped after 30s.

### Threading Model

| Thread | Role |
|--------|------|
| Main | HTTP accept loop or MCP read loop |
| Watcher | Polls filesystem every 2s via `FilteredWalker` |
| ISR | Rebuilds snapshot when stale flag is set |
| Reap | Cleans up stale agents every 5s |
| Per-connection | HTTP server spawns a thread per connection |

All threads share a `shutdown: atomic.Value(bool)` for graceful termination.

---

## 🔒 Data & Privacy

codedb collects anonymous usage telemetry to improve the tool. Telemetry is **on by default** — written to `~/.codedb/telemetry.ndjson` and periodically synced to the codedb analytics endpoint. **No source code, file contents, file paths, or search queries are collected** — only aggregate tool call counts, latency, and startup stats.

| Location | Contents | Purpose |
|----------|----------|---------|
| `~/.codedb/projects//` | Trigram index, frequency table, data log | Persistent index cache |
| `~/.codedb/telemetry.ndjson` | Aggregate tool calls and startup stats | Local telemetry log |
| `./codedb.snapshot` | File tree, outlines, content, frequency table | Portable snapshot for instant MCP startup |

**Not stored:** No source code is sent anywhere. No file contents, file paths, or search queries are collected in telemetry. Sensitive files auto-excluded (`.env*`, `credentials.json`, `secrets.*`, `.pem`, `.key`, SSH keys, AWS configs).

To disable telemetry: set `CODEDB_NO_TELEMETRY=1` or pass `--no-telemetry`.

To sync the local NDJSON file into Postgres for analysis or dashboards, use [`scripts/sync-telemetry.py`](./scripts/sync-telemetry.py) with the schema in [`docs/telemetry/postgres-schema.sql`](./docs/telemetry/postgres-schema.sql). The data flow is documented in [`docs/telemetry.md`](./docs/telemetry.md).

```bash
codedb nuke # uninstall binary, clear caches/snapshots, remove MCP registrations
rm -rf ~/.codedb/ # cache-only cleanup if you want to keep the binary installed
rm -f codedb.snapshot # remove snapshot from current project only
```

---

## 🔨 Building from Source

**Requirements:** Zig 0.15+

```bash
git clone https://github.com/justrach/codedb.git
cd codedb
zig build # debug build
zig build -Doptimize=ReleaseFast # release build
zig build test # run tests
zig build bench # run benchmarks
```

Binary: `zig-out/bin/codedb`

### Cross-compilation

```bash
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-macos
```

### Releasing

```bash
./release.sh 0.2.0 # build, codesign, notarize, upload to GitHub Releases
./release.sh 0.2.0 --dry-run # preview without executing
```

---

## License

See [LICENSE](LICENSE) for details.