https://github.com/intentweave/intentweave
Semantic knowledge extraction for code and docs — CARI local index + LLM knowledge graph
https://github.com/intentweave/intentweave
ast cari code-analysis copilot developer-tools knowledge-graph mcp neo4j sqlite typescript
Last synced: 2 months ago
JSON representation
Semantic knowledge extraction for code and docs — CARI local index + LLM knowledge graph
- Host: GitHub
- URL: https://github.com/intentweave/intentweave
- Owner: intentweave
- License: apache-2.0
- Created: 2026-03-07T15:51:23.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-04-08T09:01:58.000Z (3 months ago)
- Last Synced: 2026-04-08T11:04:16.170Z (3 months ago)
- Topics: ast, cari, code-analysis, copilot, developer-tools, knowledge-graph, mcp, neo4j, sqlite, typescript
- Language: TypeScript
- Homepage: https://intentweave.org
- Size: 2.21 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Notice: NOTICE
- Cla: CLA.md
Awesome Lists containing this project
README
# IntentWeave
**Semantic knowledge extraction platform** — build queryable knowledge graphs from documents and code,
with a zero-cost code-aware retrieval index for everyday use.
IntentWeave provides two complementary systems:
1. **CARI (Code-Aware Retrieval Index)** — Builds a lightweight SQLite index from your code's AST,
document keywords, and git history. No LLM calls, no external services, no cost. Produces ranked
file retrieval, cross-layer connection discovery, CI drift detection, and **interactive architecture
visualization** with automatically inferred layers, communities, and dependency analysis.
2. **Knowledge Graph (KG)** — Uses LLMs to extract entities, decisions, and relationships from
natural-language documents. Persists to Neo4j for rich semantic queries, impact analysis, and
documentation health checks.
Both are available through CLI, MCP tools (GitHub Copilot), REST API, and a React UI.
[](LICENSE)
---
## Quick Start
### CARI — Zero-Cost Index (no LLM, no Neo4j)
```bash
npm install -g @intentweave/cli
cd /path/to/your/project
iw init
iw index build # < 3 seconds for most projects
iw index retrieve "authentication" # ranked file retrieval
iw index connections "AuthService" # cross-layer connection discovery
iw index check --changed src/auth.ts # CI drift detection
iw index report # coverage, staleness, hidden couplings
```
### Architecture Analysis & Visualization
```bash
# Auto-infer architectural layers from your import graph
iw index layers-infer
# Validate imports against inferred layer boundaries
iw index layers-check
# Generate a standalone interactive HTML architecture report
iw index export --html
# With LLM-generated layer and directory names (optional)
iw index export --html --provider openai --model gpt-4o-mini
```
The HTML report renders a **layered, spatial architecture view**:
- Files positioned in their inferred architectural tier (foundation at bottom, entry points at top)
- Node size proportional to transitive dependents — bigger = higher impact
- Colour-coded community clusters via label-propagation detection, with **three switchable modes**:
structural (imports + co-changes), semantic (full co-occurrence), temporal (git co-changes only)
- Import edges with layer violations drawn as red reverse-arrows
- Three views: **Layers** (tiered layout), **Communities** (force-directed), **Dependencies** (root-focused)
- Vertical slice detection — click a community to highlight its cross-layer feature slice
- Hierarchical sub-layering within architectural tiers
- Optional LLM pass names layers ("HTTP Layer", "Data Access") and directories ("CLI Subcommands",
"Pipeline Stages") with architectural descriptions
- Zero server dependency — shareable as a single self-contained HTML file
#### Layers View — Auto-Inferred Architectural Tiers

Files arranged into automatically inferred layers with LLM-generated names and descriptions.
Node size reflects transitive dependents; colours indicate community clusters.
#### Communities View — Force-Directed Graph

Force-directed layout revealing community clusters, doc-code links, and import relationships.
#### Dependencies View — Root-Focused Dependency Tree

Explore the full dependency tree from any root file, colour-coded by risk level.
Two depth modes:
- `--depth structured` (default) — headings, bold text, code spans only. Fast and precise.
- `--depth full` — adds body text scanning with IDF noise filtering. +72% more annotations.
### Knowledge Graph — LLM Semantic Extraction
### Install from npm
```bash
npm install -g @intentweave/cli
iw --help
```
Or use `npx` without installing:
```bash
npx @intentweave/cli run docs/*.md --track open -i -v
```
### First project setup
```bash
cd /path/to/your/project
# Initialize workspace
iw init
# Start Neo4j (requires Docker)
docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/intentweave \
neo4j:5
# Run the extraction pipeline on your docs
export NEO4J_PASSWORD=intentweave
export OPENAI_API_KEY=sk-...
iw run docs/*.md --track open --provider openai -i --persist -v
# Query the knowledge graph
iw query "What are the main components?"
```
> **Full CLI documentation:** [docs/CLI-USAGE.md](docs/CLI-USAGE.md)
### From source (development)
```bash
git clone https://github.com/intentweave/intentweave.git
cd intentweave
pnpm install && pnpm build
# Use the dev wrapper (no build needed for changes)
./iw.sh run docs/*.md --track open -i -v
```
### Start the Server
```bash
cd apps/server
cp .env.example .env # edit NEO4J_PASSWORD and OPENAI_API_KEY
pnpm dev
# → 🧠 IntentWeave server listening on http://0.0.0.0:3000
# → 📖 API docs: http://localhost:3000/docs
# → ❤️ Health: http://localhost:3000/health
```
---
## REST API
All endpoints live under `/api/`. The server runs on port 3000 by default.
### Query the Knowledge Graph
**Natural language** (requires `OPENAI_API_KEY`):
```bash
curl -X POST http://localhost:3000/api/query \
-H 'Content-Type: application/json' \
-H 'x-session-id: my-project' \
-d '{"question": "What decisions were made about the database?"}'
```
```json
{
"results": [
{
"decision": "Neo4j",
"type": "decision",
"predicate": "DECIDED_FOR",
"target": "graph database"
}
],
"cypher": "MATCH (a:Canon)-[r:CANON_REL {predicate: \"DECIDED_FOR\"}]->(b:Canon) WHERE ...",
"summary": "- **Neo4j** was decided for as the graph database\n- ...",
"count": 3
}
```
**Raw Cypher** (no LLM needed):
```bash
curl -X POST http://localhost:3000/api/query \
-H 'Content-Type: application/json' \
-d '{"cypher": "MATCH (n:Canon:Entity) RETURN n.name, n.type LIMIT 10"}'
```
### Build RAG Context
**Topic-based** (requires `OPENAI_API_KEY`):
```bash
curl -X POST http://localhost:3000/api/context \
-H 'Content-Type: application/json' \
-H 'x-session-id: my-project' \
-d '{"topic": "authentication architecture"}'
```
**Entity-seeded** (no LLM needed):
```bash
curl -X POST http://localhost:3000/api/context \
-H 'Content-Type: application/json' \
-H 'x-session-id: my-project' \
-d '{"entity": "React", "hops": 3}'
```
**Dump all** entities:
```bash
curl -X POST http://localhost:3000/api/context \
-H 'x-session-id: my-project' \
-H 'Content-Type: application/json' \
-d '{"all": true}'
```
### List Entities
```bash
# All entities in a session
curl 'http://localhost:3000/api/entities?session=my-project'
# Filter by type
curl 'http://localhost:3000/api/entities?session=my-project&type=decision&limit=20'
# Search by name
curl 'http://localhost:3000/api/entities?session=my-project&search=auth'
```
### Run Extraction Pipeline
```bash
curl -X POST http://localhost:3000/api/run \
-H 'Content-Type: application/json' \
-d '{
"files": ["docs/*.md"],
"track": "open",
"provider": "openai",
"incremental": true,
"persist": true,
"verbose": true
}'
```
Returns 202 with a run summary including `runId`, artifact count, entity/relationship totals, and duration.
### Persist to Neo4j
```bash
# Persist latest run
curl -X POST http://localhost:3000/api/persist \
-H 'Content-Type: application/json' \
-d '{"latest": true}'
# Persist specific run
curl -X POST http://localhost:3000/api/persist \
-H 'Content-Type: application/json' \
-d '{"runId": "run-2026-03-08-abc12345"}'
```
### Impact Analysis
```bash
curl -X POST http://localhost:3000/api/impact \
-H 'Content-Type: application/json' \
-H 'x-session-id: my-project' \
-d '{"files": ["src/auth.ts"], "hops": 2}'
```
### Documentation Health
```bash
curl -X POST http://localhost:3000/api/doc-health \
-H 'Content-Type: application/json' \
-H 'x-session-id: my-project' \
-d '{"files": ["docs/ARCHITECTURE.md"]}'
```
### Graph Schema
```bash
curl http://localhost:3000/api/schema
```
Returns canonical predicates, entity types, and relationship documentation.
---
## CLI
```bash
# Run extraction pipeline
iw run docs/*.md --track open --provider openai -i -v
# Query the knowledge graph (natural language)
iw query "What are the main components?"
# Query with raw Cypher
iw query --cypher "MATCH (n:Canon:Entity) RETURN n.name, n.type LIMIT 20"
# Build RAG context
iw context "authentication architecture" -s my-project
# Entity-seeded context
iw context -e "React" --hops 3 -s my-project
# Impact analysis
iw impact src/auth.ts -s my-project
# Documentation health check (CARI default — no Neo4j needed)
iw doc-health
iw doc-health --neo4j -s my-project # full KG mode
# Cross-layer code linking
iw xlink . --session my-project --persist
# Persist to Neo4j
iw persist --latest -v
# --- CARI (no LLM, no Neo4j) ---
# Build the lightweight index
iw index build
iw index build --depth full # include body text with IDF filtering
# Query the index
iw index retrieve "authentication" # ranked file retrieval
iw index connections "AuthService" # cross-layer connections + gaps
iw index check --changed src/auth.ts # CI drift detection
iw index report # corpus-wide health dashboard
# Incremental update (only changed files)
iw index update
```
Additional CARI queries are available as CLI subcommands, MCP tools, and via the programmatic API:
| CLI Command | MCP Tool | What It Does |
| ------------------------------------------ | -------------------------- | -------------------------------------------------------------- |
| `iw index clones` | `cari_clones` | Exact code clone detection (identical body hash) |
| `iw index structural-clones` | `cari_structural_clones` | Type 2 clones (same control flow, different identifiers) |
| `iw index circular-imports` | `cari_circular_imports` | Detect import cycles (A → B → C → A) |
| `iw index unused-exports` | `cari_unused_exports` | Exported symbols never imported anywhere |
| `iw index hotspot-priority` | `cari_hotspot_priority` | High-churn + low-doc files ranked by documentation urgency |
| `iw index todos` | `cari_todos` | TODO/FIXME/HACK/XXX inventory with file, line, and kind |
| `iw index module-coverage` | `cari_module_coverage` | Documentation coverage % per directory |
| `iw index orphaned-sections` | `cari_orphaned_sections` | Doc sections where all mentions are unresolved |
| `iw index doc-completeness` | `cari_doc_completeness` | Per-doc score: covered vs. total exports from referenced files |
| `iw index cross-group-drift` | `cari_cross_group_drift` | Entity coverage conflicts between doc groups |
| `iw index mentions-of ` | `cari_mentions_of` | Find doc mentions of a code or external entity |
| `iw index annotations-for ` | `cari_annotations_for` | List all annotations for a documentation file |
| `iw index test-coverage` | `cari_test_coverage` | Map test files to source files, find untested exports |
| `iw index hubs` | `cari_hubs` | God-node / hub analysis (degree centrality) |
| `iw index communities` | `cari_communities` | Community detection (structural / semantic / temporal modes) |
| `iw index surprises` | `cari_surprises` | Surprising connection ranking (composite score) |
| `iw index rationale` | `cari_rationale` | WHY/NOTE/IMPORTANT/DESIGN rationale inventory |
| `iw index terminology` | `cari_terminology` | Terminology inconsistency detection |
| `iw index dep-depth` | `cari_dep_depth` | Transitive import depth + fan-in/fan-out risk |
| `iw index boundary-violations` | `cari_boundary_violations` | Cross-package internal import detection |
| `iw index layers-infer` | `cari_layers_infer` | Auto-infer architectural layers from import graph |
| `iw index layers-check` | `cari_layers_check` | Validate imports against layer configuration |
| `iw index export --html` | — | Generate standalone interactive architecture report |
| `iw index export --html --provider openai` | `cari_layers_name` | LLM-generated layer & directory names for the report |
> See [docs/CLI-USAGE.md](docs/CLI-USAGE.md) for the full command reference, workflows, and troubleshooting.
### MCP (GitHub Copilot Integration)
IntentWeave exposes MCP tools for use in VS Code Copilot:
| Tool | Purpose | Key Parameters |
| --------------- | -------------------------------- | ------------------------------- |
| `kg_query` | Natural language or Cypher query | `question`, `cypher?`, `limit?` |
| `kg_context` | Build RAG context from graph | `topic?`, `entity?`, `hops?` |
| `kg_entities` | List/search entities | `type?`, `search?`, `limit?` |
| `kg_impact` | Semantic impact analysis | `files`, `hops?` |
| `kg_doc_health` | Documentation freshness | `files?` |
| `kg_schema` | Graph schema description | _(none)_ |
**CARI tools** (no Neo4j or LLM needed):
| Tool | Purpose | Key Parameters |
| -------------------------- | ------------------------------------------- | -------------------------------- |
| `cari_retrieve` | Ranked file retrieval by topic or symbol | `query`, `scope?`, `limit?` |
| `cari_connections` | Cross-layer connection discovery + gaps | `entity`, `include?`, `limit?` |
| `cari_check` | CI drift detection for changed files | `changed`, `severity?` |
| `cari_clones` | Exact code clone detection | _(none)_ |
| `cari_structural_clones` | Type 2 clone detection | _(none)_ |
| `cari_circular_imports` | Import cycle detection | _(none)_ |
| `cari_unused_exports` | Unused exported symbols | `limit?` |
| `cari_hotspot_priority` | High-churn low-doc file ranking | `limit?` |
| `cari_todos` | TODO/FIXME/HACK/XXX inventory | `kind?`, `limit?` |
| `cari_module_coverage` | Documentation coverage % per directory | _(none)_ |
| `cari_orphaned_sections` | Doc sections with all-ungrounded mentions | _(none)_ |
| `cari_doc_completeness` | Per-doc completeness vs. referenced exports | _(none)_ |
| `cari_cross_group_drift` | Cross-group entity coverage conflicts | _(none)_ |
| `cari_mentions_of` | Entity → doc mentions | `entityId`, `minConfidence?` |
| `cari_annotations_for` | File → all annotations | `filePath`, `minConfidence?` |
| `cari_test_coverage` | Test→source mapping + gaps | `limit?` |
| `cari_hubs` | God-node / hub analysis | `limit?` |
| `cari_communities` | Community detection (3 modes) | `mode?`, `resolution?`, `limit?` |
| `cari_surprises` | Surprising connection ranking | `limit?` |
| `cari_rationale` | WHY/NOTE/IMPORTANT/DESIGN inventory | `kind?`, `limit?` |
| `cari_terminology` | Terminology inconsistency detection | `limit?` |
| `cari_dep_depth` | Transitive import depth analysis | `limit?` |
| `cari_boundary_violations` | Package boundary violation detection | _(none)_ |
| `cari_layers_infer` | Auto-infer architectural layers | _(none)_ |
| `cari_layers_check` | Validate imports against layer config | `allowSkipLayer?` |
| `cari_layers_name` | LLM-generated layer & directory names | `provider`, `model?`, `api_key?` |
Start the MCP server:
```bash
iw mcp --session my-project -v
```
VS Code auto-discovers via `.vscode/mcp.json`:
```json
{
"servers": {
"intentweave-kg": {
"command": "npx",
"args": ["@intentweave/cli", "mcp", "--session", "my-project", "-v"]
}
}
}
```
---
## Architecture
```
apps/
server/ → Runnable server (composes core + open)
packages/
core/ → @intentweave/core — types, predicates, interfaces
analyzer/ → @intentweave/analyzer — pipeline engine (IN→FX→KX→GX)
index/ → @intentweave/index — CARI SQLite index (annotator, IDF, queries)
cli/ → @intentweave/cli — `iw` commands + MCP server
server-core/ → @intentweave/server-core — Fastify + Neo4j + middleware
server-open/ → @intentweave/server-open — open track API routes
profiles/ → @intentweave/profiles — extraction profile packs
ast-extractor/ → @intentweave/ast-extractor — tree-sitter TS/JS extraction
swift-parser/ → @intentweave/swift-parser — tree-sitter Swift extraction
python-parser/ → @intentweave/python-parser — tree-sitter Python extraction
```
### Server Plugin Architecture
The server is built on a layered plugin model:
```
┌──────────────────────────────────────────┐
│ @intentweave/server-core │
│ Fastify 5 + Neo4j pool + context MW │
│ Health + SSE + OpenAPI (Swagger) │
└──────────┬───────────────────────────────┘
│
┌──────────▼───────────────────────────────┐
│ @intentweave/server-open │
│ POST /api/query — KG query (NL+Cypher)│
│ POST /api/context — RAG context │
│ POST /api/run — pipeline execution │
│ POST /api/persist — Neo4j persistence │
│ POST /api/impact — impact analysis │
│ POST /api/doc-health — doc freshness │
│ GET /api/entities — entity listing │
│ GET /api/schema — graph schema │
│ POST /api/xlink — code linking │
└──────────────────────────────────────────┘
```
---
## Pipeline
### Open Track (IN → FX → KX → GX)
Schema-free knowledge extraction:
1. **IN** — Chunk documents (semantic markdown splitting, ~16k chars/chunk)
2. **FX** — Free extraction (LLM extracts raw triples per chunk, parallel)
3. **KX** — Canonicalization (normalize entities + predicates, batch of 40)
4. **GX** — Global merge (cross-document entity deduplication)
### Features
- **Incremental caching** — SHA-256 content-addressed, skip unchanged files
- **Fast keyword scanning** — parallel file I/O (64 concurrent reads), combined regex pre-filter, single-pass `indexOf` matching, early termination. Scans 3500+ files in seconds, not minutes
- **Batch failure detection** — 3 consecutive failures = abort
- **Network resilience** — two-phase retry, batch cooldown
- **Token/cost estimation** — before committing to LLM calls
- **Delta persistence** — only write changes to Neo4j
- **Profile packs** — domain-specific extraction rules
---
## Configuration
### Environment Variables
| Variable | Default | Description |
| ------------------- | ----------------------- | ------------------------------------- |
| `NEO4J_URI` | `bolt://localhost:7687` | Neo4j bolt URI |
| `NEO4J_USERNAME` | `neo4j` | Neo4j username |
| `NEO4J_PASSWORD` | _(required)_ | Neo4j password |
| `NEO4J_DATABASE` | `neo4j` | Neo4j database name |
| `IW_SESSION` | `default` | Default session ID |
| `IW_WORKSPACE_ROOT` | _(optional)_ | Workspace root (enables run/persist) |
| `OPENAI_API_KEY` | _(optional)_ | OpenAI key (enables NL query + topic) |
| `IW_LLM_MODEL` | `gpt-4o-mini` | LLM model for NL queries |
| `PORT` | `3000` | Server port |
| `HOST` | `0.0.0.0` | Server host |
| `LOG_LEVEL` | `info` | Log level |
| `CORS_ORIGIN` | `*` | CORS origin(s), comma-separated |
---
## Development
```bash
pnpm install # Install all packages
pnpm build # Build all (uses Turbo)
pnpm test # Run all tests (1200+ tests)
pnpm dev # Dev mode with hot reload
pnpm typecheck # Type check all packages
pnpm format # Format with Prettier
pnpm format:check # Verify formatting
```
### Publishing
All `@intentweave/*` packages are publishable to npm:
```bash
# Build everything first
pnpm build
# Publish all packages (pnpm resolves workspace:* → real versions)
pnpm -r publish --access public
# Or publish individual packages
pnpm --filter @intentweave/cli publish --access public
```
### Project Stats
- **11 packages** + 1 app
- **1200+ tests**, all passing
- **TypeScript 5.6**, ESM, strict mode
- **Fastify 5**, Neo4j 5, SQLite (better-sqlite3), Turbo, pnpm workspaces
- **27 CARI query modes** + interactive HTML architecture report with multi-view community modes
- **33 MCP tools** for GitHub Copilot integration
---
## License
Apache-2.0 — see [LICENSE](LICENSE)
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md). All contributions require signing the [CLA](CLA.md).