https://github.com/abhigyanpatwari/GitNexus

GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration
https://github.com/abhigyanpatwari/GitNexus

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/abhigyanpatwari/GitNexus
Owner: abhigyanpatwari
Created: 2025-08-02T23:20:31.000Z (12 months ago)
Default Branch: main
Last Pushed: 2026-01-18T00:57:04.000Z (6 months ago)
Last Synced: 2026-01-18T03:44:52.566Z (6 months ago)
Language: TypeScript
Homepage: https://gitnexus.vercel.app
Size: 10.8 MB
Stars: 319
Watchers: 4
Forks: 28
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-ai-coding - GitNexus - square&label=%E2%98%85)](https://github.com/abhigyanpatwari/GitNexus) | Abhigyan Patwari | Client-side, zero-server repository intelligence engine. | Runs in-browser, maps code dependencies, outputs AI-friendly knowledge maps. | (🛠️ LLM DevTools & Orchestration Infrastructure)
AiTreasureBox - abhigyanpatwari/GitNexus - 03-09_11388_330](https://img.shields.io/github/stars/abhigyanpatwari/GitNexus.svg)|GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph with a built-in Graph RAG Agent. Perfect for code exploration| (Repos)
awesome-agentic-knowledge-base - abhigyanpatwari/GitNexus - compiler | "Zero-Server Code Intelligence Engine"; CLI+MCP + browser zero-server from one repo ([survey](surveys/abhigyanpatwari__GitNexus.md)) | (Open-source repos)
awesome-github-projects - GitNexus - GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a git repository (Github, Gitlab, Azure, Local) or ZIP file, and get an interactive knowledge graph with a built in Graph RAG Agent. Perfect for code exploration ⭐43,639 `TypeScript` 🔥 (🛠️ Developer Tools)
awesome-ai-coding-agent-tools - GitNexus - Client-side knowledge-graph engine for repos with zero-server, in-browser indexing. (Supporting Infrastructure / Codebase Context)
my-awesome - abhigyanpatwari/GitNexus - 07 star:44.1k fork:4.9k GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a git repository (Github, Gitlab, Azure, Local) or ZIP file, and get an interactive knowledge graph with a built in Graph RAG Agent. Perfect for code exploration (TypeScript)
awesome - abhigyanpatwari/GitNexus - GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a git repository (Github, Gitlab, Azure, Local) or ZIP file, and get an interactive knowledge graph with a built in Graph RAG Agent. Perfect for code exploration (TypeScript)
Awesome-GitHub-Repo - GitNexus - 浏览器端代码知识图谱引擎，丢入 GitHub 项目链接即可生成交互式知识图谱，内置 Graph RAG Agent，支持理解代码结构、追踪调用链、分析依赖关系。 (好玩项目 / 黑科技)
stars - GitNexus - Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a git repository (Github, Gitlab, Azure, Local) or ZIP file, and get an interactive knowledge graph with a built in Graph RAG Agent. Perfect for code exploration | abhigyanpatwari | 44017 | (TypeScript)
awesome-AI-driven-development - GitNexus V2 - a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration (Code Analysis & Search / Other IDEs)

README

          # GitNexus V2

**Zero-Server, Graph-Based Code Intelligence Engine**

Works fully in-browser through WebAssembly. (DB engine, Embeddings model, AST parsing, all happens inside browser)

https://github.com/user-attachments/assets/2fb7c522-20d1-48f6-9583-36c3969aa4dc

https://gitnexus.vercel.app

Being client sided, it costs me zero to deploy, so you can use it for free :-) (would love a ⭐ though)

> *Like DeepWiki, but deeper.* 😉

DeepWiki helps you *understand* code. GitNexus lets you *analyze* it—because a knowledge graph tracks every dependency, call chain, and relationship. 

That's the difference between:

- "What does this function do?" → *understanding*

- "What breaks if I change this function?" → *analysis*

**Some quick tech jargon:**

- **Enhanced Search**: BM25 + Semantic + 1-hop graph expansion via Cypher

- **Full WASM Stack**: Tree-sitter parsing + KuzuDB graph database, all in-browser

- **Repo Map**: Complete code knowledge graph with CALLS, IMPORTS, EXTENDS relations

- **Vector Index**: HNSW embeddings for semantic similarity search

- **Cypher Queries**: Relational analysis for accurate context retrieval

- **Grounded AI**: Every answer cites `[[file:line]]` as proof

**What you can do:**

| Capability | Description |

|------------|-------------|

| **Codebase-wide audits** | Find layer violations, forbidden dependencies |

| **Blast radius analysis** | See every function affected by a change |

| **Dead code detection** | Identify orphaned nodes with zero incoming calls |

| **Dependency tracing** | Follow import chains across the entire codebase |

| **AI analyses with citations** | Ask questions, analyze, get answers with `[[file:line]]` proof |

**100% client-side.** Your code never leaves your browser.

**Supports:** TypeScript, JavaScript, Python (Go, Java, C in progress)



---

## 🔍 The Problem with AI Coding Tools

Tools like **Cursor**, **Claude Code**, **Cline**, **Roo Code**, and **Windsurf** are powerful—but they share a fundamental limitation: **they don't truly know your codebase structure**.

| Tool | Context Strategy | The Gap |

|------|------------------|---------|

| **Cursor** | Files in tabs + embeddings | No call graph. Can't trace "what calls this?" |

| **Claude Code** | File search + grep | Text-based. Misses semantic connections |

| **Cline/Roo Code** | Repo map + tree-sitter | Static structure. No runtime dependencies tracked |

| **Windsurf** | Cascade context | Limited dependency depth |

**What happens:**

1. AI edits `UserService.validate()` 

2. Doesn't know 47 functions depend on its return type

3. **Breaking changes ship** 💥

### The Solution: Graph Coverage

A knowledge graph tracks **actual relationships**, not just file contents:

```mermaid

graph LR

    EDIT[AI wants to edit UserService.validate] --> QUERY[Graph Query: What depends on this?]

    QUERY --> DEPS["47 callers across 12 files"]

    DEPS --> SAFE[AI sees full blast radius first]

```

**Current state:** GitNexus is a standalone tool—a better DeepWiki that's 100% client-side with graph-powered analysis.

**Future goal (MCP):** Expose GitNexus as an MCP server so tools like Cursor and Claude Code can query it for accurate context. They ask "what calls X?", GitNexus returns the actual call graph. No more guessing.

---

## 🚀 Quick Start

```bash

git clone 

cd gitnexus

npm install

npm run dev

```

Open http://localhost:5173, drag & drop a ZIP of your codebase, and start exploring.

---

## 🏗️ Indexing Architecture

Two-phase indexing: **Knowledge Graph** (blocking) → **Embeddings** (background).

### Phase 1-5: Knowledge Graph Creation

```mermaid

flowchart TD

    subgraph P1["Phase 1: Extract (0-15%)"]

        E1[Decompress ZIP] --> E2[Collect file paths]

    end

    

    subgraph P2["Phase 2: Structure (15-30%)"]

        S1[Build folder tree] --> S2[Create CONTAINS edges]

    end

    

    subgraph P3["Phase 3: Parse (30-70%)"]

        PA1[Load Tree-sitter WASM] --> PA2[Generate ASTs]

        PA2 --> PA3[Extract symbols]

        PA3 --> PA4[Populate Symbol Table]

    end

    

    subgraph P4["Phase 4: Imports (70-82%)"]

        I1[Find import statements] --> I2[Resolve paths]

        I2 --> I3[Create IMPORTS edges]

    end

    

    subgraph P5["Phase 5: Calls + Heritage (82-100%)"]

        C1[Find function calls] --> C2[Resolve via Symbol Table]

        C2 --> C3[Create CALLS edges]

        C3 --> H1[Find extends/implements]

        H1 --> H2[Create EXTENDS/IMPLEMENTS edges]

    end

    

    P1 --> P2 --> P3 --> P4 --> P5

    P5 --> DB[(KuzuDB WASM)]

    DB --> READY[Graph Ready!]

```

### Symbol Table: Dual HashMap

Resolution strategy for function calls:

```mermaid

flowchart TD

    CALL[Found call: validateUser] --> CHECK1{In Import Map?}

    CHECK1 -->|Yes| FOUND1[Use imported definition]

    CHECK1 -->|No| CHECK2{In Current File?}

    CHECK2 -->|Yes| FOUND2[Use local definition]

    CHECK2 -->|No| CHECK3{Global Search}

    CHECK3 -->|Found| FOUND3[Use first match]

    CHECK3 -->|Not Found| SKIP[Skip - unresolved]

    

    FOUND1 --> EDGE[Create CALLS edge]

    FOUND2 --> EDGE

    FOUND3 --> EDGE

```

**Data structure:**

```

File-Scoped: Map>

Global:      Map

```

### Phase 6+: Background Embeddings

```mermaid

flowchart LR

    subgraph BG["Background (Non-blocking)"]

        M1[Load snowflake-arctic-embed-xs] --> M2[Initialize WebGPU/WASM]

        M2 --> E1[Batch embed nodes]

        E1 --> E2[INSERT into CodeEmbedding table]

        E2 --> V1[Create HNSW Vector Index]

        V1 --> B1[Build BM25 Index]

    end

    

    BG --> AI[AI Search Ready!]

```

User can explore the graph during embedding. AI features unlock when complete.

---

## 📊 Graph Schema

### Node Types

| Label | Description | Properties |

|-------|-------------|------------|

| `Folder` | Directory | `name`, `filePath` |

| `File` | Source file | `name`, `filePath`, `language` |

| `Function` | Function def | `name`, `filePath`, `startLine`, `endLine`, `isExported` |

| `Class` | Class def | `name`, `filePath`, `startLine`, `endLine` |

| `Interface` | Interface def | `name`, `filePath`, `startLine`, `endLine` |

| `Method` | Class method | `name`, `filePath`, `startLine`, `endLine` |

| `CodeElement` | Generic symbol | `name`, `filePath` |

### Relationship Table: `CodeRelation`

Single edge table with `type` property:

| Type | From | To | Description |

|------|------|-----|-------------|

| `CONTAINS` | Folder | File/Folder | Directory structure |

| `DEFINES` | File | Function/Class/etc | Code definitions |

| `IMPORTS` | File | File | Module dependencies |

| `CALLS` | Function/Method | Function/Method | Call graph |

| `EXTENDS` | Class | Class | Inheritance |

| `IMPLEMENTS` | Class | Interface | Interface implementation |

---

## 🛠️ Agent Tools Architecture

The LangChain ReAct agent has **5 tools** for code exploration. These tools **use the graph** built during indexing.

### Tool 1: `search` — Hybrid Search with Graph Context

Combines **BM25** (keyword) + **Semantic** (vector) + **1-hop expansion**:

```mermaid

flowchart TD

    Q[Query: auth middleware] --> BM25[BM25 Keyword Search]

    Q --> SEM[Semantic Vector Search]

    

    BM25 --> RRF[Reciprocal Rank Fusion]

    SEM --> RRF

    

    RRF --> TOP[Top K Results]

    TOP --> HOP[1-Hop Graph Expansion]

    

    HOP --> OUT["Each result includes:

    • ID, file, score

    • Incoming connections (who calls this)

    • Outgoing connections (what this calls)"]

```

**How 1-hop works:**

```cypher

MATCH (n {id: $nodeId})

OPTIONAL MATCH (n)-[r1:CodeRelation]->(dst)

OPTIONAL MATCH (src)-[r2:CodeRelation]->(n)

RETURN collect(dst.name), collect(src.name)

```

The agent sees not just *what matches*, but *what connects to it*.

---

### Tool 2: `cypher` — Raw Graph Queries with Auto-Embedding

Execute Cypher directly. If you include `{{QUERY_VECTOR}}`, it auto-embeds:

```mermaid

flowchart LR

    CQ[Cypher with placeholder] --> CHECK{Contains QUERY_VECTOR?}

    CHECK -->|Yes| EMBED[Embed query text]

    EMBED --> REPLACE[Replace placeholder with vector]

    CHECK -->|No| EXEC

    REPLACE --> EXEC[Execute Cypher]

    EXEC --> RES[Return Results]

```

**Example with auto-embedding:**

```cypher

CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'idx', {{QUERY_VECTOR}}, 10)

YIELD node, distance

WHERE distance < 0.4

MATCH (caller:Function)-[:CodeRelation {type: 'CALLS'}]->(n:Function {id: node.nodeId})

RETURN caller.name, n.name

```

The agent provides `query: "authentication"` → system embeds it → injects the vector.

---

### Tool 3: `grep` — Regex Pattern Matching

For exact strings, error codes, TODOs:

```mermaid

flowchart LR

    PAT["Pattern: TODO|FIXME"] --> REGEX[Compile Regex]

    REGEX --> SCAN[Scan all files]

    SCAN --> MATCH[Match per line]

    MATCH --> RES["file:line: content"]

```

---

### Tool 4: `read` — Smart File Reader

Fuzzy path matching with suggestions:

```mermaid

flowchart TD

    REQ[Request: src/utils.ts] --> EXACT{Exact match?}

    EXACT -->|Yes| RET[Return content]

    EXACT -->|No| FUZZY[Fuzzy match by segments]

    FUZZY --> FOUND{Found?}

    FOUND -->|Yes| RET

    FOUND -->|No| SUGGEST[Suggest similar files]

```

---

### Tool 5: `highlight` — Visual Graph Feedback

Emits a marker that the UI parses to highlight nodes:

```

[HIGHLIGHT_NODES:Function:src/auth.ts:validate,Class:src/user.ts:UserService]

```

---

## 💡 Key Discovery: Unified Vector + Graph

Most Graph RAG systems use **separate databases**—vector DB for semantic search, graph DB for traversal.

KuzuDB supports **native vector indexing (HNSW)**, so we do both in **one Cypher query**:

```cypher

-- Semantic search + graph traversal in ONE query

CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'code_embedding_idx', $queryVector, 20)

YIELD node AS emb, distance

WITH emb, distance WHERE distance < 0.4

MATCH (n:Function {id: emb.nodeId})<-[:CodeRelation {type: 'CALLS'}]-(caller:Function)

RETURN n.name, caller.name, distance

ORDER BY distance

```

**Why this matters:**

- 🎯 **Single query execution** — No round-trips between systems

- 📊 **Built-in relevance ranking** — Distance IS the score

- ⚡ **No separate vector DB** — One database, one query language

- 🌳 **LLM-friendly** — Agent writes one Cypher, gets semantic + structural results

---

## 🔬 Deep Dive: Copy-on-Write Memory Issue

Hit an interesting problem storing embeddings worth documenting.

**Setup:** Store 384-dim embeddings alongside code nodes.

```cypher

MATCH (n:CodeNode {id: $id}) SET n.embedding = $vec

```

**Problem:** Worked for ~20 nodes, exploded at ~1000:

```

Buffer manager exception: Unable to allocate memory!

```

**Root cause: Copy-on-Write.** Each `UPDATE` copies the entire record (~2KB of code content). 1000 updates = massive memory duplication in WASM.

```mermaid

flowchart LR

    subgraph COW["Copy-on-Write Effect"]

        OLD[Old: 2KB] --> NEW[New: 3.5KB]

    end

    COW -->|"× 1000 nodes"| BOOM[💥 Buffer Exhausted]

```

**Fix:** Separate `CodeEmbedding` table with `INSERT` only:

```mermaid

flowchart TD

    subgraph Old["❌ Single Table"]

        CN1[CodeNode with embedding
UPDATE triggers COW]

    end

    

    subgraph New["✅ Separate Table"]

        CN2[CodeNode
id, name, content]

        CE[CodeEmbedding
nodeId, embedding
INSERT only]

    end

    

    Old -->|"Memory explosion"| FAIL

    New -->|"Works at scale"| WIN

```

**Lesson:** In-memory WASM DBs have hard limits. Profile at scale, not happy path.

---

## ⚡ V2 Technical Improvements

### Sigma.js + WebGL

- V1: D3.js, choked at ~3k nodes

- V2: Sigma.js + GPU rendering, smooth at 10k+

### Dual HashMap Symbol Table

- V1: Trie (prefix tree) - clever but slow

- V2: File-scoped + Global hashmaps - **~2x speedup**

### LRU AST Cache

- Tree-sitter ASTs live in WASM memory

- LRU cache (50 slots) with `tree.delete()` for cleanup

- Memory stays bounded even for huge codebases

### ForceAtlas2 in Web Worker

- Layout algorithm runs off main thread

- UI stays responsive during graph positioning

---

## 🚧 Roadmap

### Actively Building

- [ ] **MCP Support** - Model Context Protocol for tool extensibility

- [ ] **External DB Support** - Connect to Neo4j (hosted or Docker)

- [ ] **Blast Radius Analysis Tool** - Dedicated UI for impact analysis

- [ ] **Multi-Worker Pool** - Parallel parsing across Web Workers

- [ ] **Ollama Support** - Local LLM integration

- [ ] **CSV Export** - Export node/relationship tables

### 🎯 The Vision: Browser-Based MCP Server

**Goal:** Expose GitNexus as a local MCP server directly from the browser.

This would let AI coding tools like **Cursor**, **Claude Code**, **Windsurf**, etc. connect to your running GitNexus instance and use its knowledge graph for:

- 🔍 **Reliable context gathering** — AI gets actual dependencies, not grep guesses

- 💥 **Blast radius detection** — Before making changes, query what would break

- 🔐 **Codebase-wide audits** — Find violations, dead code, circular dependencies

- 🧠 **Grounded answers** — Every response backed by graph traversal, not hallucination

```mermaid

graph LR

    subgraph Browser["GitNexus (Browser)"]

        KG[Knowledge Graph]

        MCP[MCP Server]

    end

    

    subgraph Tools["AI Coding Tools"]

        CURSOR[Cursor]

        CLAUDE[Claude Code]

        WIND[Windsurf]

    end

    

    KG --> MCP

    MCP <-->|localhost| CURSOR

    MCP <-->|localhost| CLAUDE

    MCP <-->|localhost| WIND

```

**Why this matters:** Current AI coding tools are blind to real dependencies. They use grep or embeddings—better than nothing, but not enough to prevent breaking changes. A knowledge graph MCP would give them the accurate, structural context they need.

### Recently Completed ✅

- [x] Graph RAG Agent with 5 tools (search, cypher, grep, read, highlight)

- [x] Browser embeddings (snowflake-arctic-embed-xs, 22M params)

- [x] Vector index with HNSW in KuzuDB

- [x] Hybrid search (BM25 + semantic + RRF)

- [x] Streaming AI chat with tool visibility

- [x] Grounded citations (`[[file:line]]` format)

- [x] Multiple LLM providers (OpenAI, Azure, Gemini, Anthropic)

---

## 🛠 Tech Stack

| Layer | Technology |

|-------|------------|

| **Frontend** | React 18, TypeScript, Vite, Tailwind v4 |

| **Visualization** | Sigma.js, Graphology, ForceAtlas2 (WebGL) |

| **Parsing** | Tree-sitter WASM (TS, JS, Python) |

| **Database** | KuzuDB WASM (graph + vector HNSW) |

| **Embeddings** | transformers.js, snowflake-arctic-embed-xs (22M) |

| **AI** | LangChain ReAct agent, streaming |

| **Concurrency** | Web Workers + Comlink |

---

## 🔐 Security & Privacy

- All processing happens in your browser

- No code uploaded to any server

- API keys stored in localStorage only

- Open source—audit the code yourself

---

## 📝 License

MIT License

---

## 🙏 Acknowledgments

- [Tree-sitter](https://tree-sitter.github.io/) - AST parsing

- [KuzuDB](https://kuzudb.com/) - Embedded graph database with vector support

- [Sigma.js](https://www.sigmajs.org/) - WebGL graph rendering

- [transformers.js](https://huggingface.co/docs/transformers.js) - Browser ML

- [LangChain](https://langchain.com/) - Agent orchestration

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhigyanpatwari/GitNexus

Awesome Lists containing this project

README