An open API service indexing awesome lists of open source software.

https://github.com/cognitx-leyton/graphrag-code

πŸ•ΈοΈ Code knowledge graph for Claude Code & AI coding agents β€” index TypeScript, NestJS, React into Neo4j and query architecture in Cypher
https://github.com/cognitx-leyton/graphrag-code

ai-agents ast claude claude-code code-analysis code-knowledge-graph codebase-search coding-agents cypher dependency-graph graphrag knowledge-graph mcp neo4j nestjs python react retrieval-augmented-generation static-analysis typescript

Last synced: 10 days ago
JSON representation

πŸ•ΈοΈ Code knowledge graph for Claude Code & AI coding agents β€” index TypeScript, NestJS, React into Neo4j and query architecture in Cypher

Awesome Lists containing this project

README

          

# πŸ•ΈοΈ graphrag-code

***A Neo4j code knowledge graph for TypeScript codebases β€” index NestJS and React code, then answer architecture questions with Cypher.***

![License](https://img.shields.io/badge/license-Apache%202.0-D22128?style=flat-square)
![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white)
![Neo4j](https://img.shields.io/badge/neo4j-5.24-008CC1?style=flat-square&logo=neo4j&logoColor=white)
![TypeScript](https://img.shields.io/badge/typescript-ready-3178C6?style=flat-square&logo=typescript&logoColor=white)
![NestJS](https://img.shields.io/badge/nestjs-aware-E0234E?style=flat-square&logo=nestjs&logoColor=white)
![React](https://img.shields.io/badge/react-aware-61DAFB?style=flat-square&logo=react&logoColor=black)

`graphrag-code` turns a TypeScript/TSX repository into a queryable **code knowledge graph** β€” a structured retrieval backend for **[Claude Code](https://www.anthropic.com/claude-code)**, **Claude**, and other **AI coding agents**. It walks the AST, recognises framework constructs (NestJS controllers, modules, DI; React components and hooks), and loads the result into Neo4j. Your agent can then ask *architectural* questions β€” dependency chains, endpoint inventories, component usage, hubs of DI β€” in Cypher, instead of fuzzy-matching code chunks with embeddings.

Built at **[Leyton CognitX](https://cognitx.leyton.com/)** to make large TypeScript monorepos legible to humans, to Claude, and to LLM agents alike.

## πŸš€ Quickstart: use in your repo

```bash
pipx install cognitx-codegraph
cd /path/to/your-repo
codegraph init
```

`codegraph init` asks 4-5 short questions (which packages to index, which package boundaries to enforce, whether to install the Claude Code surface + GitHub Actions gate + local Neo4j) and then:

1. Writes `.claude/commands/` (7 slash commands), `.github/workflows/arch-check.yml`, `.arch-policies.toml`, `docker-compose.yml`, and a `CLAUDE.md` snippet.
2. Starts a local Neo4j container via `docker compose up -d`.
3. Runs the first index.
4. Prints what to query next.

You're fully set up in ~2 minutes. Want everything without prompts? `codegraph init --yes`. Want just the files and no Docker? `codegraph init --yes --skip-docker --skip-index`.

Full walkthrough: [codegraph/docs/init.md](./codegraph/docs/init.md). Policy reference: [codegraph/docs/arch-policies.md](./codegraph/docs/arch-policies.md).

## ✨ Highlights

- **Framework-aware parsing** β€” not just imports: controllers, injectables, modules, entities, React components and hooks are first-class nodes.
- **Neo4j-backed** β€” every relationship is a Cypher query away. Dependency walks, shortest paths, DI chains, orphan detection, all out of the box.
- **Claude Code & AI agent native** β€” the typed graph is a structured retrieval backend for Claude Code, Claude, and other coding agents that need architectural context, not just nearest-neighbour code chunks.
- **Monorepo-friendly** β€” scope indexing to specific packages (`twenty-server`, `twenty-front`, …) and exclude build/test artefacts by default.
- **Batteries included** β€” a Typer CLI (`index`, `query`, `validate`), Docker Compose for Neo4j, and a library of example Cypher queries.

## πŸ“‘ Table of Contents

- [Why a code knowledge graph?](#-why-a-code-knowledge-graph)
- [Using with Claude Code & AI agents](#-using-with-claude-code--ai-agents)
- [Architecture](#-architecture)
- [Quickstart](#-quickstart)
- [Graph schema](#-graph-schema)
- [Example queries](#-example-queries)
- [Configuration](#-configuration)
- [Roadmap](#-roadmap)
- [Contributing](#-contributing)
- [Contributors](#-contributors)
- [Star history](#-star-history)
- [License](#-license)

## 🧠 Why a code knowledge graph?

Vector search over raw code chunks is a blunt instrument. It finds lexically similar snippets, not *architecturally relevant* ones. Questions like *"which services does this controller transitively depend on?"*, *"who injects `AuthService`?"*, or *"which React components use this hook?"* are graph queries, not similarity queries.

`graphrag-code` gives an LLM (or a human) the structured backbone it needs:

- **Retrieval-augmented generation (RAG)** over a TypeScript codebase with typed traversals instead of opaque embeddings.
- **Architecture audits** β€” find hubs, cycles, orphans, tangled modules.
- **Safer refactors** β€” understand the blast radius of a change before you make it.
- **Onboarding** β€” let new engineers query the codebase in plain Cypher instead of reading files top-to-bottom.

## πŸ€– Using with Claude Code & AI agents

`graphrag-code` is designed as a drop-in retrieval backend for agentic coding workflows. The typical pattern for [Claude Code](https://www.anthropic.com/claude-code) (and any other LLM coding agent β€” Cursor, Aider, Continue, custom MCP clients):

1. **Index your repo once** (see [Quickstart](#-quickstart)) β€” `codegraph.cli index` walks the AST and loads the graph into Neo4j.
2. **Expose the graph to your agent** β€” either via a thin MCP server, a CLI wrapper the agent can shell out to, or direct Bolt queries from tool-call handlers.
3. **Let the agent ask architectural questions** in Cypher *before* editing code.

### Why this beats embedding-only RAG for coding agents

Claude Code and other coding agents work best with **structured, low-noise context**. Vector search over code chunks pulls back things that *look* similar; a typed graph answers the question the agent is *actually* asking:

| Agent question | Graph query |
| --- | --- |
| *"What would break if I rename `AuthService`?"* | Reverse `INJECTS` + `IMPORTS*` traversal |
| *"What endpoints does `UserController` expose?"* | `EXPOSES` direct lookup |
| *"Which React components call `useAuth`?"* | `USES_HOOK` lookup |
| *"How is this file reached from the auth entrypoint?"* | `shortestPath` on `IMPORTS` |
| *"Which services are DI hubs I should treat as core?"* | `INJECTS` aggregation |

All answered in single-digit milliseconds, with zero tokens spent on retrieving irrelevant snippets.

### Exposing the graph to Claude via MCP

codegraph ships a first-class **[Model Context Protocol](https://modelcontextprotocol.io/)** stdio server. Install the optional extra, add one block to Claude Code's config, and five typed tools appear in the agent's tool menu β€” no more shelling out to `codegraph query`.

```bash
pip install "codegraph[mcp]"
```

In `~/.claude.json` (or your Claude Desktop config):

```json
{
"mcpServers": {
"codegraph": {
"command": "codegraph-mcp",
"type": "stdio",
"env": {
"CODEGRAPH_NEO4J_URI": "bolt://localhost:7688",
"CODEGRAPH_NEO4J_USER": "neo4j",
"CODEGRAPH_NEO4J_PASS": "codegraph123"
}
}
}
}
```

Restart Claude Code. Five tools become available:

| Tool | Purpose |
| --- | --- |
| `query_graph(cypher, limit)` | Read-only Cypher escape hatch. Writes are rejected at the session level, so an LLM-generated `DROP`/`DELETE` can't mutate the graph. |
| `describe_schema()` | Labels, relationship types, and per-label node counts β€” cheap way for an agent to learn what's in the graph at session start. |
| `list_packages()` | Every indexed monorepo package with its detected framework, version, TypeScript flag, package manager, and detection confidence. |
| `callers_of_class(class_name, max_depth)` | Blast-radius traversal over `INJECTS` / `EXTENDS` / `IMPLEMENTS`. The canonical "what breaks if I rename X" query. |
| `endpoints_for_controller(controller_name)` | HTTP routes exposed by a NestJS controller class (method + path + handler). |
| `files_in_package(name, limit)` | List files belonging to a `:Package` by name. |
| `hook_usage(hook_name, limit)` | Which components / functions use a given React hook. |
| `gql_operation_callers(op_name, op_type, limit)` | Who calls a GraphQL query / mutation / subscription, optionally narrowed by type. |
| `most_injected_services(limit)` | Rank `@Injectable` classes by number of unique callers β€” the classic "DI hub detection" query. |
| `find_class(name_pattern, limit)` | Case-sensitive substring search over class names, backed by the `class_name` index. |

All ten tools share a single long-lived Neo4j driver and open sessions in `READ_ACCESS` mode. Configuration is env-var only (the same `CODEGRAPH_NEO4J_*` vars the CLI uses). The server is stdio-only β€” no network exposure.

## πŸ—οΈ Architecture

```
TypeScript repo Parser Graph loader Neo4j
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ *.ts / *.tsx β”‚ ───► β”‚ AST walk β”‚ ───► β”‚ Typed nodes │───► β”‚ Property β”‚
β”‚ packages/*/src β”‚ β”‚ + framework β”‚ β”‚ + edges β”‚ β”‚ graph β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ detection β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚ (NestJS / React) β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό
Cypher / RAG
```

All indexing is local: your code never leaves the machine, and Neo4j runs in a Docker container alongside the CLI.

## πŸš€ Quickstart

```bash
cd codegraph

# 1. Python environment
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

# 2. Neo4j (Docker)
docker compose up -d
# Browser UI: http://localhost:7475 (neo4j / codegraph123)
# Bolt: bolt://localhost:7688

# 3. Tell codegraph which packages in your monorepo to index.
# Either drop a codegraph.toml at the repo root (see Configuration below)
# or pass --package flags:
.venv/bin/python -m codegraph.cli index /path/to/your-monorepo \
--package packages/server --package packages/web

# 4. Sanity-check the load
.venv/bin/python -m codegraph.cli validate /path/to/your-monorepo

# 5. Ask a question
.venv/bin/python -m codegraph.cli query \
"MATCH (e:Endpoint) RETURN e.method, e.path LIMIT 10"
```

## 🧩 Graph schema

**Nodes**

| Kind | Examples / notes |
| --- | --- |
| `Package` | One per configured monorepo package with detected `framework` (React / Next.js / Vue / Angular / Svelte / SvelteKit / **NestJS** / Odoo), `framework_version`, `typescript`, `styling`, `router`, `state_management`, `ui_library`, `build_tool`, `package_manager`, and a detection `confidence`. Framework detection walks up to the monorepo root for lockfiles + workspace-hoisted dependencies. |
| `File` | TS/TSX files with language, LOC, and framework flags (`is_controller`, `is_component`, …) |
| `Class` | NestJS controllers, injectables, modules, entities, resolvers |
| `Function` | Exported functions and React components |
| `Interface` | TypeScript interfaces |
| `Endpoint` | HTTP routes exposed by controllers (method + path + handler) |
| `Hook` | React hooks (custom and built-in usage sites) |
| `Decorator` | Framework decorators applied to classes/methods |
| `External` | Symbols imported from `node_modules` |

**Edges**

`IMPORTS`, `IMPORTS_EXTERNAL`, `DEFINES_CLASS`, `DEFINES_FUNC`, `DEFINES_IFACE`, `EXPOSES`, `INJECTS`, `EXTENDS`, `IMPLEMENTS`, `RENDERS`, `USES_HOOK`, `DECORATED_BY`, `BELONGS_TO` (File β†’ Package).

## πŸ”Ž Example queries

A handful of the queries in [`codegraph/queries.md`](codegraph/queries.md):

```cypher
// 1. Every HTTP endpoint with its controller
MATCH (c:Class {is_controller:true})-[:EXPOSES]->(e:Endpoint)
RETURN c.name, e.method, e.path, e.handler
ORDER BY c.name, e.path;

// 2. Most-injected services (DI hubs)
MATCH (svc:Class {is_injectable:true})<-[:INJECTS]-(caller:Class)
RETURN svc.name, count(caller) AS injections
ORDER BY injections DESC LIMIT 20;

// 3. Which React components use a given hook?
MATCH (:Hook {name:'useAuth'})<-[:USES_HOOK]-(c:Function)
RETURN c.name, c.file;

// 4. Transitive dependencies of a file
MATCH (:File {path:$start})-[:IMPORTS*1..3]->(d:File)
RETURN DISTINCT d.path;
```

See [`codegraph/queries.md`](codegraph/queries.md) for the full catalogue.

## βš™οΈ Configuration

### Project config β€” `codegraph.toml`

`codegraph` has **no hardcoded packages**. You tell it which packages to index via a `codegraph.toml` at the repo root, a `[tool.codegraph]` block in your existing `pyproject.toml`, or `--package` flags on the CLI. Config file values are loaded first; CLI flags override them.

**`codegraph.toml`** (preferred β€” a standalone file, no interference with Python tooling):

```toml
# Paths are relative to the repo root. Each entry should be a TypeScript
# package directory (i.e. contain a package.json / tsconfig.json so path
# aliases can be resolved).
packages = [
"packages/server",
"packages/web",
]

# Optional β€” these extend the built-in defaults, they don't replace them.
exclude_dirs = ["custom-build", "fixtures"]
exclude_suffixes = [".gen.ts"]
```

**`pyproject.toml`** (if you already have one and want everything in one place):

```toml
[tool.codegraph]
packages = ["packages/server", "packages/web"]
```

**CLI override** β€” wins over either file:

```bash
codegraph index . --package packages/server --package packages/web
```

If no config file exists and no `--package` flags are passed, `index` stops with a clear error. There are no Twenty-specific or other defaults.

### Python support (`.py` indexing)

codegraph also indexes Python codebases. The detector auto-picks the language based on the package directory: if the directory contains `__init__.py`, it's parsed as Python; otherwise it's parsed as TypeScript (with `tsconfig.json` / `package.json`).

Install the optional `[python]` extra to enable the Python frontend:

```bash
pip install "codegraph[python]"
```

Then point `--package` at a Python package root (the directory containing `__init__.py`):

```bash
codegraph index . --package src/my_package
```

Stage 1 indexes: modules (`.py` files), classes, functions, methods, imports (relative + absolute + `import x as y`), class inheritance, and decorators. Framework detection (FastAPI / Flask / Django / Typer / pytest) and route extraction land in Stage 2.

### Neo4j connection

Controlled via environment variables (defaults match the bundled `docker-compose.yml`):

| Variable | Default |
| --- | --- |
| `CODEGRAPH_NEO4J_URI` | `bolt://localhost:7688` |
| `CODEGRAPH_NEO4J_USER` | `neo4j` |
| `CODEGRAPH_NEO4J_PASS` | `codegraph123` |

### File walk exclusions

Indexing always skips `node_modules`, `dist`, `build`, `.next`, `.turbo`, `.nuxt`, `.svelte-kit`, `.vercel`, `coverage`, `generated`, `__generated__`, `.cache`, `.parcel-cache`, plus `*.d.ts` and `*.stories.{ts,tsx}`. Add to these via `exclude_dirs` / `exclude_suffixes` in your config β€” those keys **extend** the defaults, they don't replace them.

### `.codegraphignore`

For **confidential routes, components, or files** that shouldn't reach the graph (and therefore shouldn't reach any LLM agent querying it), drop a `.codegraphignore` file at the repo root. Syntax is gitignore-style, plus two codegraph extensions:

```gitignore
# Standard gitignore β€” file paths
**/admin/**
**/*.secret.ts
!**/admin/public/** # negation β€” re-include a subtree

# Route patterns β€” match RouteNode.path
@route:/admin/*
@route:/settings/system/*

# Component patterns β€” match React component / NestJS class names
@component:*Admin*
@component:*UserManagement*
```

Override the default location with `--ignore-file PATH` on the CLI or `ignore_file = "custom/.ignore"` in `codegraph.toml`. `.codegraphignore` is **additive** on top of `BASE_EXCLUDE_DIRS` β€” it doesn't replace them.

## πŸ›£οΈ Roadmap

- Incremental re-indexing on file changes
- Python and Go language frontends
- ~~First-class MCP server exposing the graph to LLM agents~~ β€” **shipped** (see [Exposing the graph to Claude via MCP](#exposing-the-graph-to-claude-via-mcp))
- Pre-built RAG retrievers for common architecture questions

## 🀝 Contributing

PRs welcome. The repository uses protected branches:

- **`main`** β€” production-ready code. All changes land here via PR.
- **`release`** β€” release-candidate branch. Stabilisation before tagging.
- **`hotfix`** β€” urgent fixes that need to skip the normal cycle.

Every PR into `main`, `release`, or `hotfix` requires a Code Owner review (see [`CODEOWNERS`](CODEOWNERS)). Please open an issue before a large refactor so we can align on direction.

## πŸ‘₯ Contributors

Thanks to everyone who has helped shape `graphrag-code`:


Avatar grid of graphrag-code contributors

*Made with [contrib.rocks](https://contrib.rocks).*

## ⭐ Star history

If `graphrag-code` helps you make sense of a TypeScript monorepo, a star helps others find it too.





Star History Chart

## πŸ“„ License

Licensed under the [Apache License 2.0](LICENSE). Copyright Β© [Leyton CognitX](https://cognitx.leyton.com/) and contributors.