https://github.com/optave/codegraph

Always-fresh code dependency graph — sub-second incremental rebuilds, function-level tracing across 11 languages, 20-tool MCP server for AI agents, git diff impact with co-change analysis, A→B pathfinding, node role classification, local semantic search. Zero API keys required.
https://github.com/optave/codegraph
ai-agents cli code-analysis dependency-graph impact-analysis incremental-builds mcp-server semantic-search sqlite static-analysis tree-sitter
Last synced: about 2 months ago
JSON representation
Host: GitHub
URL: https://github.com/optave/codegraph
Owner: optave
License: apache-2.0
Created: 2026-02-21T09:31:55.000Z (2 months ago)
Default Branch: main
Last Pushed: 2026-03-02T02:15:14.000Z (2 months ago)
Last Synced: 2026-03-02T02:18:16.395Z (2 months ago)
Topics: ai-agents, cli, code-analysis, dependency-graph, impact-analysis, incremental-builds, mcp-server, semantic-search, sqlite, static-analysis, tree-sitter
Language: JavaScript
Size: 5.04 MB
Stars: 20
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Support: SUPPORT.md
- Roadmap: docs/roadmap/BACKLOG.md
Awesome Lists containing this project

README

          


  



codegraph




  Give your AI the map before it starts exploring.





  

  

  

  





  The Problem ·

  What It Does ·

  Quick Start ·

  Commands ·

  Languages ·

  AI Integration ·

  How It Works ·

  Practices ·

  Roadmap



---

## The Problem

AI agents face an impossible trade-off. They either spend thousands of tokens reading files to understand a codebase's structure — blowing up their context window until quality degrades — or they assume how things work, and the assumptions are often wrong. Either way, things break. The larger the codebase, the worse it gets.

An agent modifies a function without knowing 9 files import it. It misreads what a helper does and builds logic on top of that misunderstanding. It leaves dead code behind after a refactor. The PR gets opened, and your reviewer — human or automated — flags the same structural issues again and again: _"this breaks 14 callers,"_ _"that function already exists,"_ _"this export is now dead."_ If the reviewer catches it, that's multiple rounds of back-and-forth. If they don't, it can ship to production. Multiply that by every PR, every developer, every repo.

The information to prevent these issues exists — it's in the code itself. But without a structured map, agents lack the context to get it right consistently, reviewers waste cycles on preventable issues, and architecture degrades one unreviewed change at a time.

## What Codegraph Does

Codegraph builds a function-level dependency graph of your entire codebase — every function, every caller, every dependency — and keeps it current with sub-second incremental rebuilds.

It parses your code with [tree-sitter](https://tree-sitter.github.io/) (native Rust or WASM), stores the graph in SQLite, and exposes it where it matters most:

- **MCP server** — AI agents query the graph directly through 30 tools — one call instead of 30 `grep`/`find`/`cat` invocations

- **CLI** — developers and agents explore, query, and audit code from the terminal

- **CI gates** — `check` and `manifesto` commands enforce quality thresholds with exit codes

- **Programmatic API** — embed codegraph in your own tools via `npm install`

Instead of an agent editing code without structural context and letting reviewers catch the fallout, it knows _"this function has 14 callers across 9 files"_ before it touches anything. Dead exports, circular dependencies, and boundary violations surface during development — not during review. The result: PRs that need fewer review rounds.

**Free. Open source. Fully local.** Zero network calls, zero telemetry. Your code stays on your machine. When you want deeper intelligence, bring your own LLM provider — your code only goes where you choose to send it.

**Three commands to a queryable graph:**

```bash

npm install -g @optave/codegraph

cd your-project

codegraph build

```

No config files, no Docker, no JVM, no API keys, no accounts. Point your agent at the MCP server and it has structural awareness of your codebase.

### Why it matters

| | Without codegraph | With codegraph |

|---|---|---|

| **Code review** | Reviewers flag broken callers, dead code, and boundary violations round after round | Structural issues are caught during development — PRs pass review with fewer rounds |

| **AI agents** | Modify `parseConfig()` without knowing 9 files import it — reviewer catches it | `fn-impact parseConfig` shows every caller before the edit — agent fixes it proactively |

| **AI agents** | Leave dead exports and duplicate helpers behind after refactors | Dead code, cycles, and duplicates surface in real time via hooks and MCP queries |

| **AI agents** | Produce code that works but doesn't fit the codebase structure | `context  -T` returns source, deps, callers, and tests — the agent writes code that fits |

| **CI pipelines** | Catch test failures but miss structural degradation | `check --staged` fails the build when blast radius or complexity thresholds are exceeded |

| **Developers** | Inherit a codebase and grep for hours to understand what calls what | `context handleAuth -T` gives the same structured view agents use |

| **Architects** | Draw boundary rules that erode within weeks | `manifesto` and `boundaries` enforce architecture rules on every commit |

### Feature comparison

_{Comparison last verified: March 2026. Full analysis: COMPETITIVE_ANALYSIS.md}

| Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) |

|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|

| MCP / AI agent support | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |

| Batch querying | **Yes** | — | — | — | — | — | — | — |

| Composite audit command | **Yes** | — | — | — | — | — | — | — |

| Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |

| Multi-language | **11** | **14** | **32** | **11** | **~10** | **12** | **12** | **3** |

| Semantic search | **Yes** | — | **Yes** | **Yes** | — | **Yes** | — | **Yes** |

| Hybrid BM25 + semantic | **Yes** | — | — | — | — | **Yes** | — | **Yes** |

| CODEOWNERS integration | **Yes** | — | — | — | — | — | — | — |

| Architecture boundary rules | **Yes** | — | — | — | — | — | — | — |

| CI validation predicates | **Yes** | — | — | — | — | — | — | — |

| Graph snapshots | **Yes** | — | — | — | — | — | — | — |

| Git diff impact | **Yes** | — | — | — | — | **Yes** | **Yes** | **Yes** |

| Branch structural diff | **Yes** | — | — | — | — | — | — | **Yes** |

| Git co-change analysis | **Yes** | — | — | — | — | — | — | **Yes** |

| Watch mode | **Yes** | — | **Yes** | **Yes** | — | — | **Yes** | **Yes** |

| Dead code / role classification | **Yes** | — | **Yes** | — | — | — | **Yes** | **Yes** |

| Cycle detection | **Yes** | — | — | — | — | — | — | — |

| Incremental rebuilds | **O(changed)** | — | O(n) Merkle | — | — | — | Go only | **Yes** |

| Zero config | **Yes** | — | **Yes** | — | — | **Yes** | — | **Yes** |

| Embeddable JS library (`npm install`) | **Yes** | — | — | — | — | — | — | — |

| LLM-optional (works without API keys) | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** |

| Dataflow analysis | **Yes** | **Yes** | — | — | **Yes** | — | — | — |

| Control flow graph (CFG) | **Yes** | **Yes** | — | — | **Yes** | — | — | — |

| AST node querying | **Yes** | **Yes** | — | — | **Yes** | — | — | — |

| Expanded node/edge types | **Yes** | **Yes** | — | — | **Yes** | — | — | — |

| GraphML / Neo4j export | **Yes** | **Yes** | — | — | — | — | — | — |

| Interactive graph viewer | **Yes** | — | — | — | — | — | — | — |

| Commercial use allowed | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | No | Paid | **Yes** |

| Open source | **Yes** | Yes | Yes | Yes | Yes | No | No | Yes |

### What makes codegraph different

| | Differentiator | In practice |

|---|---|---|

| **🤖** | **AI-first architecture** | 30-tool [MCP server](https://modelcontextprotocol.io/) — agents query the graph directly instead of scraping the filesystem. One call replaces 20+ grep/find/cat invocations |

| **🏷️** | **Role classification** | Every symbol auto-tagged as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` — agents understand a symbol's architectural role without reading surrounding code |

| **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |

| **⚡** | **Always-fresh graph** | Three-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds — agents work with current data |

| **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — enriched with historically coupled files from git co-change analysis. Ships with a GitHub Actions workflow |

| **🌐** | **Multi-language, one graph** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph — agents don't need per-language tools |

| **🧠** | **Hybrid search** | BM25 keyword + semantic embeddings fused via RRF — `hybrid` (default), `semantic`, or `keyword` mode; multi-query via `"auth; token; JWT"` |

| **🔬** | **Dataflow + CFG** | Track how data flows through functions (`flows_to`, `returns`, `mutates`) and visualize intraprocedural control flow graphs for all 11 languages |

| **🔓** | **Fully local, zero cost** | No API keys, no accounts, no network calls. Optionally bring your own LLM provider — your code only goes where you choose |

---

## 🚀 Quick Start

```bash

npm install -g @optave/codegraph

cd your-project

codegraph build        # → .codegraph/graph.db created

```

That's it. The graph is ready. Now connect your AI agent.

### For AI agents (primary use case)

Connect directly via MCP — your agent gets 30 tools to query the graph:

```bash

codegraph mcp          # 30-tool MCP server — AI queries the graph directly

```

Or add codegraph to your agent's instructions (e.g. `CLAUDE.md`):

```markdown

Before modifying code, always:

1. `codegraph where ` — find where the symbol lives

2. `codegraph context  -T` — get full context (source, deps, callers)

3. `codegraph fn-impact  -T` — check blast radius before editing

After modifying code:

4. `codegraph diff-impact --staged -T` — verify impact before committing

```

Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) · [CLAUDE.md template](docs/guides/ai-agent-guide.md#claudemd-template)

### For developers

The same graph is available via CLI:

```bash

codegraph map          # see most-connected files

codegraph query myFunc # find any function, see callers & callees

codegraph deps src/index.ts  # file-level import/export map

```

Or install from source:

```bash

git clone https://github.com/optave/codegraph.git

cd codegraph && npm install && npm link

```

> **Dev builds:** Pre-release tarballs are attached to [GitHub Releases](https://github.com/optave/codegraph/releases). Install with `npm install -g `. Note that `npm install -g ` does not work because npm cannot resolve optional platform-specific dependencies from a URL — download the `.tgz` first, then install from the local file.

---

## ✨ Features

| | Feature | Description |

|---|---|---|

| 🤖 | **MCP server** | 30-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |

| 🎯 | **Deep context** | `context` gives agents source, deps, callers, signature, and tests for a function in one call; `audit --quick` gives structural summaries |

| 🏷️ | **Node role classification** | Every symbol auto-tagged as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` based on connectivity — agents instantly know architectural role |

| 📦 | **Batch querying** | Accept a list of targets and return all results in one JSON payload — enables multi-agent parallel dispatch |

| 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |

| 🧬 | **Function-level tracing** | Call chains, caller trees, function-level impact, and A→B pathfinding with qualified call resolution |

| 📍 | **Fast lookup** | `where` shows exactly where a symbol is defined and used — minimal, fast |

| 🔍 | **Symbol search** | Find any function, class, or method by name — exact match priority, relevance scoring, `--file` and `--kind` filters |

| 📁 | **File dependencies** | See what a file imports and what imports it |

| 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |

| 🔗 | **Co-change analysis** | Analyze git history for files that always change together — surfaces hidden coupling the static graph can't see; enriches `diff-impact` with historically coupled files |

| 🗺️ | **Module map** | Bird's-eye view of your most-connected files |

| 🏗️ | **Structure & hotspots** | Directory cohesion scores, fan-in/fan-out hotspot detection, module boundaries |

| 🔄 | **Cycle detection** | Find circular dependencies at file or function level |

| 📤 | **Export** | DOT, Mermaid, JSON, GraphML, GraphSON, and Neo4j CSV graph export |

| 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |

| 👀 | **Watch mode** | Incrementally update the graph as files change |

| ⚡ | **Always fresh** | Three-tier incremental detection — sub-second rebuilds even on large codebases |

| 🔬 | **Data flow analysis** | Intraprocedural parameter tracking, return consumers, argument flows, and mutation detection — all 11 languages |

| 🧮 | **Complexity metrics** | Cognitive, cyclomatic, nesting depth, Halstead, and Maintainability Index per function |

| 🏘️ | **Community detection** | Louvain clustering to discover natural module boundaries and architectural drift |

| 📜 | **Manifesto rule engine** | Configurable pass/fail rules with warn/fail thresholds for CI gates via `check` (exit code 1 on fail) |

| 👥 | **CODEOWNERS integration** | Map graph nodes to CODEOWNERS entries — see who owns each function, ownership boundaries in `diff-impact` |

| 💾 | **Graph snapshots** | `snapshot save`/`restore` for instant DB backup and rollback — checkpoint before refactoring, restore without rebuilding |

| 🔎 | **Hybrid BM25 + semantic search** | FTS5 keyword search + embedding-based semantic search fused via Reciprocal Rank Fusion — `hybrid`, `semantic`, or `keyword` modes |

| 📄 | **Pagination & NDJSON streaming** | Universal `--limit`/`--offset` pagination on all MCP tools and CLI commands; `--ndjson` for newline-delimited JSON streaming |

| 🔀 | **Branch structural diff** | Compare code structure between two git refs — added/removed/changed symbols with transitive caller impact |

| 🛡️ | **Architecture boundaries** | User-defined dependency rules between modules with onion architecture preset — violations flagged in manifesto and CI |

| ✅ | **CI validation predicates** | `check` command with configurable gates: complexity, blast radius, cycles, boundary violations — exit code 0/1 for CI |

| 📋 | **Composite audit** | Single `audit` command combining explain + impact + health metrics per function — one call instead of 3-4 |

| 🚦 | **Triage queue** | `triage` merges connectivity, hotspots, roles, and complexity into a ranked audit priority queue |

| 🔬 | **Dataflow analysis** | Track how data moves through functions with `flows_to`, `returns`, and `mutates` edges — all 11 languages, included by default, skip with `--no-dataflow` |

| 🧩 | **Control flow graph** | Intraprocedural CFG construction for all 11 languages — `cfg` command with text/DOT/Mermaid output, included by default, skip with `--no-cfg` |

| 🔎 | **AST node querying** | Stored queryable AST nodes (calls, `new`, string, regex, throw, await) — `ast` command with SQL GLOB pattern matching |

| 🧬 | **Expanded node/edge types** | `parameter`, `property`, `constant` node kinds with `parent_id` for sub-declaration queries; `contains`, `parameter_of`, `receiver` edge kinds |

| 📊 | **Exports analysis** | `exports ` shows all exported symbols with per-symbol consumers, re-export detection, and counts |

| 📈 | **Interactive viewer** | `codegraph plot` generates an interactive HTML graph viewer with hierarchical/force/radial layouts, complexity overlays, and drill-down |

| 🏷️ | **Stable JSON schema** | `normalizeSymbol` utility ensures consistent 7-field output (name, kind, file, line, endLine, role, fileHash) across all commands |

See [docs/examples](docs/examples) for real-world CLI and MCP usage examples.

## 📦 Commands

### Build & Watch

```bash

codegraph build [dir]          # Parse and build the dependency graph

codegraph build --no-incremental  # Force full rebuild

codegraph build --dataflow     # Extract data flow edges (flows_to, returns, mutates)

codegraph build --engine wasm  # Force WASM engine (skip native)

codegraph watch [dir]          # Watch for changes, update graph incrementally

```

### Query & Explore

```bash

codegraph query          # Find a symbol — shows callers and callees

codegraph deps           # File imports/exports

codegraph map                  # Top 20 most-connected files

codegraph map -n 50 --no-tests # Top 50, excluding test files

codegraph where          # Where is a symbol defined and used?

codegraph where --file src/db.js  # List symbols, imports, exports for a file

codegraph stats                # Graph health: nodes, edges, languages, quality score

codegraph roles                # Node role classification (entry, core, utility, adapter, dead, leaf)

codegraph roles --role dead -T # Find dead code (unreferenced, non-exported symbols)

codegraph roles --role core --file src/  # Core symbols in src/

codegraph exports src/queries.js  # Per-symbol consumer analysis (who calls each export)

codegraph children          # List parameters, properties, constants of a symbol

```

### Deep Context (designed for AI agents)

```bash

codegraph context        # Full context: source, deps, callers, signature, tests

codegraph context  --depth 2 --no-tests  # Include callee source 2 levels deep

codegraph audit  --quick    # Structural summary: public API, internals, data flow

codegraph audit  --quick  # Function summary: signature, calls, callers, tests

```

### Impact Analysis

```bash

codegraph impact         # Transitive reverse dependency trace

codegraph query          # Function-level: callers, callees, call chain

codegraph query  --no-tests --depth 5

codegraph fn-impact      # What functions break if this one changes

codegraph path              # Shortest path between two symbols (A calls...calls B)

codegraph path   --reverse  # Follow edges backward

codegraph path   --depth 5 --kinds calls,imports

codegraph diff-impact          # Impact of unstaged git changes

codegraph diff-impact --staged # Impact of staged changes

codegraph diff-impact HEAD~3   # Impact vs a specific ref

codegraph diff-impact main --format mermaid -T  # Mermaid flowchart of blast radius

codegraph branch-compare main feature-branch    # Structural diff between two refs

codegraph branch-compare main HEAD --no-tests   # Symbols added/removed/changed vs main

codegraph branch-compare v2.4.0 v2.5.0 --json   # JSON output for programmatic use

codegraph branch-compare main HEAD --format mermaid  # Mermaid diagram of structural changes

```

### Co-Change Analysis

Analyze git history to find files that always change together — surfaces hidden coupling the static graph can't see. Requires a git repository.

```bash

codegraph co-change --analyze          # Scan git history and populate co-change data

codegraph co-change src/queries.js     # Show co-change partners for a file

codegraph co-change                    # Show top co-changing file pairs globally

codegraph co-change --since 6m         # Limit to last 6 months of history

codegraph co-change --min-jaccard 0.5  # Only show strong coupling (Jaccard >= 0.5)

codegraph co-change --min-support 5    # Minimum co-commit count

codegraph co-change --full             # Include all details

```

Co-change data also enriches `diff-impact` — historically coupled files appear in a `historicallyCoupled` section alongside the static dependency analysis.

### Structure & Hotspots

```bash

codegraph structure            # Directory overview with cohesion scores

codegraph triage --level file  # Files with extreme fan-in, fan-out, or density

codegraph triage --level directory --sort coupling --no-tests

```

### Code Health & Architecture

```bash

codegraph complexity              # Per-function cognitive, cyclomatic, nesting, MI

codegraph complexity --health -T  # Full Halstead health view (volume, effort, bugs, MI)

codegraph complexity --sort mi -T # Sort by worst maintainability index

codegraph complexity --above-threshold -T  # Only functions exceeding warn thresholds

codegraph communities             # Louvain community detection — natural module boundaries

codegraph communities --drift -T  # Drift analysis only — split/merge candidates

codegraph communities --functions # Function-level community detection

codegraph check                   # Pass/fail rule engine (exit code 1 on fail)

codegraph check -T                # Exclude test files from rule evaluation

```

### Dataflow, CFG & AST

```bash

codegraph dataflow              # Data flow edges for a function (flows_to, returns, mutates)

codegraph dataflow  --impact    # Transitive data-dependent blast radius

codegraph cfg                   # Control flow graph (text format)

codegraph cfg  --format dot     # CFG as Graphviz DOT

codegraph cfg  --format mermaid # CFG as Mermaid diagram

codegraph ast                         # List all stored AST nodes

codegraph ast "handleAuth"            # Search AST nodes by pattern (GLOB)

codegraph ast -k call                 # Filter by kind: call, new, string, regex, throw, await

codegraph ast -k throw --file src/    # Combine kind and file filters

```

> **Note:** Dataflow and CFG are included by default for all 11 languages. Use `--no-dataflow` / `--no-cfg` for faster builds.

### Audit, Triage & Batch

Composite commands for risk-driven workflows and multi-agent dispatch.

```bash

codegraph audit     # Combined structural summary + impact + health in one report

codegraph audit  --quick      # Structural summary only (skip impact and health)

codegraph audit src/queries.js -T     # Audit all functions in a file

codegraph triage                      # Ranked audit priority queue (connectivity + hotspots + roles)

codegraph triage -T --limit 20        # Top 20 riskiest functions, excluding tests

codegraph triage --level file -T      # File-level hotspot analysis

codegraph triage --level directory -T # Directory-level hotspot analysis

codegraph batch target1 target2 ...   # Batch query multiple targets in one call

codegraph batch --json targets.json   # Batch from a JSON file

```

### CI Validation

`codegraph check` provides configurable pass/fail predicates for CI gates and state machines. Exit code 0 = pass, 1 = fail.

```bash

codegraph check                             # Run manifesto rules on whole codebase

codegraph check --staged                    # Check staged changes (diff predicates)

codegraph check --staged --rules            # Run both diff predicates AND manifesto rules

codegraph check --no-new-cycles             # Fail if staged changes introduce cycles

codegraph check --max-complexity 30         # Fail if any function exceeds complexity threshold

codegraph check --max-blast-radius 50       # Fail if blast radius exceeds limit

codegraph check --no-boundary-violations    # Fail on architecture boundary violations

codegraph check main                        # Check current branch vs main

```

### CODEOWNERS

Map graph symbols to CODEOWNERS entries. Shows who owns each function and surfaces ownership boundaries.

```bash

codegraph owners                   # Show ownership for all symbols

codegraph owners src/queries.js    # Ownership for symbols in a specific file

codegraph owners --boundary        # Show ownership boundaries between modules

codegraph owners --owner @backend  # Filter by owner

```

Ownership data also enriches `diff-impact` — affected owners and suggested reviewers appear alongside the static dependency analysis.

### Snapshots

Lightweight SQLite DB backup and restore — checkpoint before refactoring, instantly rollback without rebuilding.

```bash

codegraph snapshot save before-refactor   # Save a named snapshot

codegraph snapshot list                   # List all snapshots

codegraph snapshot restore before-refactor  # Restore a snapshot

codegraph snapshot delete before-refactor   # Delete a snapshot

```

### Export & Visualization

```bash

codegraph export -f dot        # Graphviz DOT format

codegraph export -f mermaid    # Mermaid diagram

codegraph export -f json       # JSON graph

codegraph export -f graphml    # GraphML (XML standard)

codegraph export -f graphson   # GraphSON (TinkerPop v3 / Gremlin)

codegraph export -f neo4j      # Neo4j CSV (bulk import, separate nodes/relationships files)

codegraph export --functions -o graph.dot  # Function-level, write to file

codegraph plot                 # Interactive HTML viewer with force/hierarchical/radial layouts

codegraph cycles               # Detect circular dependencies

codegraph cycles --functions   # Function-level cycles

```

### Semantic Search

Local embeddings for every function, method, and class — search by natural language. Everything runs locally using [@huggingface/transformers](https://huggingface.co/docs/transformers.js) — no API keys needed.

```bash

codegraph embed                # Build embeddings (default: nomic-v1.5)

codegraph embed --model nomic  # Use a different model

codegraph search "handle authentication"

codegraph search "parse config" --min-score 0.4 -n 10

codegraph search "parseConfig" --mode keyword   # BM25 keyword-only (exact names)

codegraph search "auth flow" --mode semantic    # Embedding-only (conceptual)

codegraph search "auth flow" --mode hybrid      # BM25 + semantic RRF fusion (default)

codegraph models               # List available models

```

#### Multi-query search

Separate queries with `;` to search from multiple angles at once. Results are ranked using [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — items that rank highly across multiple queries rise to the top.

```bash

codegraph search "auth middleware; JWT validation"

codegraph search "parse config; read settings; load env" -n 20

codegraph search "error handling; retry logic" --kind function

codegraph search "database connection; query builder" --rrf-k 30

```

A single trailing semicolon is ignored (falls back to single-query mode). The `--rrf-k` flag controls the RRF smoothing constant (default 60) — lower values give more weight to top-ranked results.

#### Available Models

| Flag | Model | Dimensions | Size | License | Notes |

|---|---|---|---|---|---|

| `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |

| `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |

| `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |

| `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | Best for code search, trained on code+text (requires HF token) |

| `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |

| `nomic-v1.5` (default) | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | **Improved nomic, Matryoshka dimensions** |

| `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |

The model used during `embed` is stored in the database, so `search` auto-detects it — no need to pass `--model` when searching.

### Multi-Repo Registry

Manage a global registry of codegraph-enabled projects. The registry stores paths to your built graphs so the MCP server can query them when multi-repo mode is enabled.

```bash

codegraph registry list        # List all registered repos

codegraph registry list --json # JSON output

codegraph registry add    # Register a project directory

codegraph registry add  -n my-name  # Custom name

codegraph registry remove   # Unregister

```

`codegraph build` auto-registers the project — no manual setup needed.

### Common Flags

| Flag | Description |

|---|---|

| `-d, --db ` | Custom path to `graph.db` |

| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on most query commands including `query`, `fn-impact`, `path`, `context`, `where`, `diff-impact`, `search`, `map`, `roles`, `co-change`, `deps`, `impact`, `complexity`, `communities`, `branch-compare`, `audit`, `triage`, `check`, `dataflow`, `cfg`, `ast`, `exports`, `children`) |

| `--depth ` | Transitive trace depth (default varies by command) |

| `-j, --json` | Output as JSON |

| `-v, --verbose` | Enable debug output |

| `--engine ` | Parser engine: `native`, `wasm`, or `auto` (default: `auto`) |

| `-k, --kind ` | Filter by kind: `function`, `method`, `class`, `interface`, `type`, `struct`, `enum`, `trait`, `record`, `module`, `parameter`, `property`, `constant` |

| `-f, --file ` | Scope to a specific file (`fn`, `context`, `where`) |

| `--mode ` | Search mode: `hybrid` (default), `semantic`, or `keyword` (`search`) |

| `--ndjson` | Output as newline-delimited JSON (one object per line) |

| `--table` | Output as auto-column aligned table |

| `--csv` | Output as CSV (RFC 4180, nested objects flattened) |

| `--limit ` | Limit number of results |

| `--offset ` | Skip first N results (pagination) |

| `--rrf-k ` | RRF smoothing constant for multi-query search (default 60) |

## 🌐 Language Support

| Language | Extensions | Coverage |

|---|---|---|

| ![JavaScript](https://img.shields.io/badge/-JavaScript-F7DF1E?style=flat-square&logo=javascript&logoColor=black) | `.js`, `.jsx`, `.mjs`, `.cjs` | Full — functions, classes, imports, call sites, dataflow |

| ![TypeScript](https://img.shields.io/badge/-TypeScript-3178C6?style=flat-square&logo=typescript&logoColor=white) | `.ts`, `.tsx` | Full — interfaces, type aliases, `.d.ts`, dataflow |

| ![Python](https://img.shields.io/badge/-Python-3776AB?style=flat-square&logo=python&logoColor=white) | `.py` | Functions, classes, methods, imports, decorators, dataflow |

| ![Go](https://img.shields.io/badge/-Go-00ADD8?style=flat-square&logo=go&logoColor=white) | `.go` | Functions, methods, structs, interfaces, imports, call sites, dataflow |

| ![Rust](https://img.shields.io/badge/-Rust-000000?style=flat-square&logo=rust&logoColor=white) | `.rs` | Functions, methods, structs, traits, `use` imports, call sites, dataflow |

| ![Java](https://img.shields.io/badge/-Java-ED8B00?style=flat-square&logo=openjdk&logoColor=white) | `.java` | Classes, methods, constructors, interfaces, imports, call sites, dataflow |

| ![C#](https://img.shields.io/badge/-C%23-512BD4?style=flat-square&logo=dotnet&logoColor=white) | `.cs` | Classes, structs, records, interfaces, enums, methods, constructors, using directives, invocations, dataflow |

| ![PHP](https://img.shields.io/badge/-PHP-777BB4?style=flat-square&logo=php&logoColor=white) | `.php` | Functions, classes, interfaces, traits, enums, methods, namespace use, calls, dataflow |

| ![Ruby](https://img.shields.io/badge/-Ruby-CC342D?style=flat-square&logo=ruby&logoColor=white) | `.rb` | Classes, modules, methods, singleton methods, require/require_relative, include/extend, dataflow |

| ![Terraform](https://img.shields.io/badge/-Terraform-844FBA?style=flat-square&logo=terraform&logoColor=white) | `.tf`, `.hcl` | Resource, data, variable, module, output blocks |

## ⚙️ How It Works

```

┌──────────┐    ┌───────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐

│  Source  │──▶│ tree-sitter│──▶│  Extract  │──▶│  Resolve │──▶│ SQLite  │

│  Files   │    │   Parse   │    │  Symbols  │    │  Imports │    │   DB    │

└──────────┘    └───────────┘    └───────────┘    └──────────┘    └─────────┘

                                                                       │

                                                                       ▼

                                                                 ┌─────────┐

                                                                 │  Query  │

                                                                 └─────────┘

```

1. **Parse** — tree-sitter parses every source file into an AST (native Rust engine or WASM fallback)

2. **Extract** — Functions, classes, methods, interfaces, imports, exports, call sites, parameters, properties, and constants are extracted

3. **Resolve** — Imports are resolved to actual files (handles ESM conventions, `tsconfig.json` path aliases, `baseUrl`)

4. **Store** — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries, plus structural edges (`contains`, `parameter_of`, `receiver`)

5. **Analyze** (opt-in) — Complexity metrics, control flow graphs (`--cfg`), dataflow edges (`--dataflow`), and AST node storage

6. **Query** — All queries run locally against the SQLite DB — typically under 100ms

### Incremental Rebuilds

The graph stays current without re-parsing your entire codebase. Three-tier change detection ensures rebuilds are proportional to what changed, not the size of the project:

1. **Tier 0 — Journal (O(changed)):** If `codegraph watch` was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files — zero filesystem scanning

2. **Tier 1 — mtime+size (O(n) stats, O(changed) reads):** No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte

3. **Tier 2 — Hash (O(changed) reads):** Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted

**Result:** change one file in a 3,000-file project and the rebuild completes in under a second. Put it in a commit hook, a file watcher, or let your AI agent trigger it.

### Dual Engine

Codegraph ships with two parsing engines:

| Engine | How it works | When it's used |

|--------|-------------|----------------|

| **Native** (Rust) | napi-rs addon built from `crates/codegraph-core/` — parallel multi-core parsing via rayon | Auto-selected when the prebuilt binary is available |

| **WASM** | `web-tree-sitter` with pre-built `.wasm` grammars in `grammars/` | Fallback when the native addon isn't installed |

Both engines produce identical output. Use `--engine native|wasm|auto` to control selection (default: `auto`).

### Call Resolution

Calls are resolved with **qualified resolution** — method calls (`obj.method()`) are distinguished from standalone function calls, and built-in receivers (`console`, `Math`, `JSON`, `Array`, `Promise`, etc.) are filtered out automatically. Import scope is respected: a call to `foo()` only resolves to functions that are actually imported or defined in the same file, eliminating false positives from name collisions.

| Priority | Source | Confidence |

|---|---|---|

| 1 | **Import-aware** — `import { foo } from './bar'` → link to `bar` | `1.0` |

| 2 | **Same-file** — definitions in the current file | `1.0` |

| 3 | **Same directory** — definitions in sibling files (standalone calls only) | `0.7` |

| 4 | **Same parent directory** — definitions in sibling dirs (standalone calls only) | `0.5` |

| 5 | **Method hierarchy** — resolved through `extends`/`implements` | varies |

Method calls on unknown receivers skip global fallback entirely — `stmt.run()` will never resolve to a standalone `run` function in another file. Duplicate caller/callee edges are deduplicated automatically. Dynamic patterns like `fn.call()`, `fn.apply()`, `fn.bind()`, and `obj["method"]()` are also detected on a best-effort basis.

Codegraph also extracts symbols from common callback patterns: Commander `.command().action()` callbacks (as `command:build`), Express route handlers (as `route:GET /api/users`), and event emitter listeners (as `event:data`).

## 📊 Performance

Self-measured on every release via CI ([build benchmarks](generated/benchmarks/BUILD-BENCHMARKS.md) | [embedding benchmarks](generated/benchmarks/EMBEDDING-BENCHMARKS.md)):

| Metric | Latest |

|---|---|

| Build speed (native) | **3.5 ms/file** |

| Build speed (WASM) | **9.6 ms/file** |

| Query time | **3ms** |

| No-op rebuild (native) | **9ms** |

| 1-file rebuild (native) | **265ms** |

| Query: fn-deps | **0.9ms** |

| Query: path | **0.9ms** |

| ~50,000 files (est.) | **~175.0s build** |

Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.

### Lightweight Footprint



Only **3 runtime dependencies** — everything else is optional or a devDependency:

| Dependency | What it does | | |

|---|---|---|---|

| [better-sqlite3](https://github.com/WiseLibs/better-sqlite3) | Fast, synchronous SQLite driver | ![GitHub stars](https://img.shields.io/github/stars/WiseLibs/better-sqlite3?style=flat-square&label=%E2%AD%90) | ![npm downloads](https://img.shields.io/npm/dw/better-sqlite3?style=flat-square&label=%F0%9F%93%A5%2Fwk) |

| [commander](https://github.com/tj/commander.js) | CLI argument parsing | ![GitHub stars](https://img.shields.io/github/stars/tj/commander.js?style=flat-square&label=%E2%AD%90) | ![npm downloads](https://img.shields.io/npm/dw/commander?style=flat-square&label=%F0%9F%93%A5%2Fwk) |

| [web-tree-sitter](https://github.com/tree-sitter/tree-sitter) | WASM tree-sitter bindings | ![GitHub stars](https://img.shields.io/github/stars/tree-sitter/tree-sitter?style=flat-square&label=%E2%AD%90) | ![npm downloads](https://img.shields.io/npm/dw/web-tree-sitter?style=flat-square&label=%F0%9F%93%A5%2Fwk) |

Optional: `@huggingface/transformers` (semantic search), `@modelcontextprotocol/sdk` (MCP server) — lazy-loaded only when needed.

## 🤖 AI Agent Integration (Core)

### MCP Server

Codegraph is built around a [Model Context Protocol](https://modelcontextprotocol.io/) server with 30 tools (31 in multi-repo mode) — the primary way agents consume the graph:

```bash

codegraph mcp                  # Single-repo mode (default) — only local project

codegraph mcp --multi-repo     # Enable access to all registered repos

codegraph mcp --repos a,b      # Restrict to specific repos (implies --multi-repo)

```

**Single-repo mode (default):** Tools operate only on the local `.codegraph/graph.db`. The `repo` parameter and `list_repos` tool are not exposed to the AI agent.

**Multi-repo mode (`--multi-repo`):** All tools gain an optional `repo` parameter to target any registered repository, and `list_repos` becomes available. Use `--repos` to restrict which repos the agent can access.

### CLAUDE.md / Agent Instructions

Add this to your project's `CLAUDE.md` to help AI agents use codegraph. Full template with all commands in the [AI Agent Guide](docs/guides/ai-agent-guide.md#claudemd-template).

```markdown

## Codegraph

This project uses codegraph for dependency analysis. The graph is at `.codegraph/graph.db`.

### Before modifying code:

1. `codegraph where ` — find where the symbol lives

2. `codegraph audit --quick ` — understand the structure

3. `codegraph context  -T` — get full context (source, deps, callers)

4. `codegraph fn-impact  -T` — check blast radius before editing

### After modifying code:

5. `codegraph diff-impact --staged -T` — verify impact before committing

### Other useful commands

- `codegraph build .` — rebuild graph (incremental by default)

- `codegraph map` — module overview · `codegraph stats` — graph health

- `codegraph query  -T` — call chain · `codegraph path   -T` — shortest path

- `codegraph deps ` — file deps · `codegraph exports  -T` — export consumers

- `codegraph audit  -T` — full risk report · `codegraph triage -T` — priority queue

- `codegraph check --staged` — CI gate · `codegraph batch t1 t2 -T --json` — batch query

- `codegraph search ""` — semantic search · `codegraph cycles` — cycle detection

- `codegraph roles --role dead -T` — dead code · `codegraph complexity -T` — metrics

- `codegraph dataflow  -T` — data flow · `codegraph cfg  -T` — control flow

### Flags

- `-T` — exclude test files (use by default) · `-j` — JSON output

- `-f, --file ` — scope to file · `-k, --kind ` — filter kind

```

## 📋 Recommended Practices

See **[docs/guides/recommended-practices.md](docs/guides/recommended-practices.md)** for integration guides:

- **Git hooks** — auto-rebuild on commit, impact checks on push, commit message enrichment

- **CI/CD** — PR impact comments, threshold gates, graph caching

- **AI agents** — MCP server, CLAUDE.md templates, Claude Code hooks

- **Developer workflow** — watch mode, explore-before-you-edit, semantic search

- **Secure credentials** — `apiKeyCommand` with 1Password, Bitwarden, Vault, macOS Keychain, `pass`

For AI-specific integration, see the **[AI Agent Guide](docs/guides/ai-agent-guide.md)** — a comprehensive reference covering the 6-step agent workflow, complete command-to-MCP mapping, Claude Code hooks, and token-saving patterns.

## 🔁 CI / GitHub Actions

Codegraph ships with a ready-to-use GitHub Actions workflow that comments impact analysis on every pull request.

Copy `.github/workflows/codegraph-impact.yml` to your repo, and every PR will get a comment like:

> **3 functions changed** → **12 callers affected** across **7 files**

## 🛠️ Configuration

Create a `.codegraphrc.json` in your project root to customize behavior:

```json

{

  "include": ["src/**", "lib/**"],

  "exclude": ["**/*.test.js", "**/__mocks__/**"],

  "ignoreDirs": ["node_modules", ".git", "dist"],

  "extensions": [".js", ".ts", ".tsx", ".py"],

  "aliases": {

    "@/": "./src/",

    "@utils/": "./src/utils/"

  },

  "build": {

    "incremental": true

  },

  "query": {

    "excludeTests": true

  }

}

```

> **Tip:** `excludeTests` can also be set at the top level as a shorthand — `{ "excludeTests": true }` is equivalent to nesting it under `query`. If both are present, the nested `query.excludeTests` takes precedence.

### Manifesto rules

Configure pass/fail thresholds for `codegraph check` (manifesto mode):

```json

{

  "manifesto": {

    "rules": {

      "cognitive_complexity": { "warn": 15, "fail": 30 },

      "cyclomatic_complexity": { "warn": 10, "fail": 20 },

      "nesting_depth": { "warn": 4, "fail": 6 },

      "maintainability_index": { "warn": 40, "fail": 20 },

      "halstead_bugs": { "warn": 0.5, "fail": 1.0 }

    }

  }

}

```

When any function exceeds a `fail` threshold, `codegraph check` exits with code 1 — perfect for CI gates.

### LLM credentials

Codegraph supports an `apiKeyCommand` field for secure credential management. Instead of storing API keys in config files or environment variables, you can shell out to a secret manager at runtime:

```json

{

  "llm": {

    "provider": "openai",

    "apiKeyCommand": "op read op://vault/openai/api-key"

  }

}

```

The command is split on whitespace and executed with `execFileSync` (no shell injection risk). Priority: **command output > `CODEGRAPH_LLM_API_KEY` env var > file config**. On failure, codegraph warns and falls back to the next source.

Works with any secret manager: 1Password CLI (`op`), Bitwarden (`bw`), `pass`, HashiCorp Vault, macOS Keychain (`security`), AWS Secrets Manager, etc.

## 📖 Programmatic API

Codegraph also exports a full API for use in your own tools:

```js

import { buildGraph, queryNameData, findCycles, exportDOT, normalizeSymbol } from '@optave/codegraph';

// Build the graph

buildGraph('/path/to/project');

// Query programmatically

const results = queryNameData('myFunction', '/path/to/.codegraph/graph.db');

// All query results use normalizeSymbol for a stable 7-field schema

```

```js

import { parseFileAuto, getActiveEngine, isNativeAvailable } from '@optave/codegraph';

// Check which engine is active

console.log(getActiveEngine());      // 'native' or 'wasm'

console.log(isNativeAvailable());    // true if Rust addon is installed

// Parse a single file (uses auto-selected engine)

const symbols = await parseFileAuto('/path/to/file.ts');

```

```js

import { searchData, multiSearchData, buildEmbeddings } from '@optave/codegraph';

// Build embeddings (one-time)

await buildEmbeddings('/path/to/project');

// Single-query search

const { results } = await searchData('handle auth', dbPath);

// Multi-query search with RRF ranking

const { results: fused } = await multiSearchData(

  ['auth middleware', 'JWT validation'],

  dbPath,

  { limit: 10, minScore: 0.3 }

);

// Each result has: { name, kind, file, line, rrf, queryScores[] }

```

## ⚠️ Limitations

- **No full type inference** — parses `.d.ts` interfaces but doesn't use TypeScript's type checker for overload resolution

- **Dynamic calls are best-effort** — complex computed property access and `eval` patterns are not resolved

- **Python imports** — resolves relative imports but doesn't follow `sys.path` or virtual environment packages

- **Dataflow analysis** — intraprocedural (single-function scope), not interprocedural

## 🗺️ Roadmap

See **[ROADMAP.md](docs/roadmap/ROADMAP.md)** for the full development roadmap and **[STABILITY.md](STABILITY.md)** for the stability policy and versioning guarantees. Current plan:

1. ~~**Rust Core**~~ — **Complete** (v1.3.0) — native tree-sitter parsing via napi-rs, parallel multi-core parsing, incremental re-parsing, import resolution & cycle detection in Rust

2. ~~**Foundation Hardening**~~ — **Complete** (v1.4.0) — parser registry, 12-tool MCP server with multi-repo support, test coverage 62%→75%, `apiKeyCommand` secret resolution, global repo registry

3. ~~**Deep Analysis**~~ — **Complete** (v3.0.0) — dataflow analysis (flows_to, returns, mutates), intraprocedural CFG for all 11 languages, stored AST nodes, expanded node/edge types (parameter, property, constant, contains, parameter_of, receiver), GraphML/GraphSON/Neo4j CSV export, interactive HTML viewer, CLI consolidation, stable JSON schema

4. ~~**Architectural Refactoring**~~ — **Complete** (v3.1.5) — unified AST analysis, composable MCP, domain errors, builder pipeline, embedder subsystem, graph model, qualified names, presentation layer, InMemoryRepository, domain directory grouping, CLI composability

5. **Natural Language Queries** — `codegraph ask` command, conversational sessions

6. **Expanded Language Support** — 8 new languages (12 → 20)

7. **GitHub Integration & CI** — reusable GitHub Action, PR review, SARIF output

8. **TypeScript Migration** — gradual migration from JS to TypeScript

## 🤝 Contributing

Contributions are welcome! See **[CONTRIBUTING.md](CONTRIBUTING.md)** for the full guide — setup, workflow, commit convention, testing, and architecture notes.

```bash

git clone https://github.com/optave/codegraph.git

cd codegraph

npm install

npm test

```

Looking to add a new language? Check out **[Adding a New Language](docs/guides/adding-a-language.md)**.

## 📄 License

[Apache-2.0](LICENSE)

---



  _{Built with tree-sitter and better-sqlite3. Your code stays on your machine.}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/optave/codegraph

Awesome Lists containing this project

README

codegraph