https://github.com/harf-promo/indexa
The local context engine for AI. Indexa reads your code or your disk once, on your machine, and serves any AI tool — Claude Code, Cursor, or a local Ollama model — the relevant slice on demand. Saves cloud tokens; lets local models punch above their context window. Local-first, MCP-native. Apache-2.0.
https://github.com/harf-promo/indexa
claude-code context-engine developer-tools embeddings local-first mcp ollama rag retrieval rust
Last synced: 10 days ago
JSON representation
The local context engine for AI. Indexa reads your code or your disk once, on your machine, and serves any AI tool — Claude Code, Cursor, or a local Ollama model — the relevant slice on demand. Saves cloud tokens; lets local models punch above their context window. Local-first, MCP-native. Apache-2.0.
- Host: GitHub
- URL: https://github.com/harf-promo/indexa
- Owner: harf-promo
- License: other
- Created: 2026-05-27T18:24:23.000Z (27 days ago)
- Default Branch: main
- Last Pushed: 2026-06-13T19:06:09.000Z (10 days ago)
- Last Synced: 2026-06-13T20:00:28.182Z (10 days ago)
- Topics: claude-code, context-engine, developer-tools, embeddings, local-first, mcp, ollama, rag, retrieval, rust
- Language: Rust
- Size: 1.96 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Roadmap: ROADMAP.md
- Notice: NOTICE
Awesome Lists containing this project
README
# Indexa
**The local context engine for AI.**
Your AI meets your codebase cold on every session — burning paid tokens to relearn what it knew yesterday, or choking on a context window that can't hold your repo at all. Indexa reads your code or your entire disk **once, on your machine**, builds a persistent hierarchical context store, and serves any AI tool the precise, ranked slice it needs — instantly, every time.
```bash
indexa index ~/code/my-repo # scan + embed + summarize in one command
indexa ask "where is auth handled?" # grounded answer with sources
indexa export ~/code/my-repo --format xml > .context.xml
claude "given @.context.xml, find the auth flow and add MFA"
# the model spends its budget on the work you asked for — not on re-reading your tree
```
*The index is the substrate; context is the product. Local-first · model-agnostic · Apache-2.0.*
> Indexa is production-ready for daily use on one or more repos. Whole-disk indexing is fast; the storage format stabilises before 1.0. New here? Start with the **[Usage Guide](USAGE.md)** or **[Quickstart](docs/quickstart.md)**. Something broken? **[Troubleshooting](docs/TROUBLESHOOTING.md)**.
---
## See it work
Four commands. One context. Every session.
```console
$ indexa index ~/code/my-repo
Scanning ~/code/my-repo
1,284 entries indexed.
Embedded 6,470 new chunks.
318 summaries generated.
Context is ready.
$ indexa ask "where is auth handled?"
Searching 6,470 indexed chunks...
Answer:
Authentication is handled in src/auth/middleware.rs (the `require_auth` route
guard) and src/auth/login.rs (the `login` entry point). Session tokens are
minted and validated in src/auth/session.rs. [1, 2, 3]
Sources:
[1] src/auth/middleware.rs — require_auth
[2] src/auth/login.rs — login
[3] src/auth/session.rs — mint_token
```
Then hand the context to your AI tool — or let your agent pull it live over MCP:
```bash
indexa export ~/code/my-repo --format xml > .context.xml # the artifact, built on your machine
indexa serve # open the web workspace at :7620
indexa mcp # expose the live index to any agent
```
---
## Why Indexa
**Stop paying to re-teach your AI your own codebase.** Every coding assistant wakes up amnesiac. Before it helps, it reads its way back to orientation — burning context window, paid tokens, and your patience on a lesson it learned five minutes ago. Indexa teaches it *once*: it builds a persistent, hierarchical context store on your machine and serves a small ranked slice on demand, so the model spends its budget on the work you asked for.
> **What that's worth — a worked example** *(illustrative, ~4 chars/token — run `indexa status`
> after a few sessions to see your own measured usage and estimated savings)*: an agent orienting itself in a medium repo
> typically reads ~40 files at ~2,000 tokens each ≈ **80K tokens — every session**. The same
> orientation through Indexa: one `search` (~300 tokens) + ten L0 one-line abstracts (~30 each) +
> the two files it actually needs in full (~4K) ≈ **5K tokens**. That's roughly **94% less**,
> before the session's real work starts — and the index was built locally, for zero tokens.
**There are two kinds of context, and almost everyone conflates them.** *Working context* is what's in the model's window right now — scarce, paid, gone when the session ends. *Searchable context* is everything your AI could know: the store on disk. Indexa separates them. The model never holds your repo; it holds the ~2–4K characters that actually matter, retrieved from a store that can be gigabytes.
**Local isn't the compromise — it's the unlock.** Your data never leaves the machine unless *you* point it at a cloud model, and zero tokens leave while Indexa builds context. A small local model stops being small: feed it a retrieved slice instead of the whole project and a 4K-window model reasons over a 100 MB codebase — fast, even on CPU, with a KV-cache sized by your choice, not your repo. One engine, two wins: it saves cloud tools their tokens **and** gives local models the context they can't hold.
*(How retrieval keeps that slice relevant — hybrid search, the ANN index, the honest trade-offs — is in [docs/methodology.md](docs/methodology.md).)*
---
## The only tool of its kind
**No other tool does this.** Indexa is the first local context engine that builds a persistent, hierarchically-summarised, locally-queryable index over your entire disk — and exposes it simultaneously through a CLI, a live web workspace, a native desktop app, and an MCP server that any AI agent can call in real time.
This is not a repo-to-prompt converter. It is not a document chat app. It is not an IDE extension. It is a persistent, on-device intelligence layer that any AI tool can reach — without re-reading your files, without sending your data to a server, and without starting from zero every session.
**Indexa ends the cold-start problem, permanently.**
---
## What Indexa does
- **One-command context** — `indexa index ` scans, embeds, and summarises in a single step. Context is ready immediately; no pipeline to manage.
- **Two-phase context** — an instant surface scan (zero AI, classifies code vs media vs build artifacts) then deep context: parse → chunk → embed → LLM file summaries rolled up bottom-up into a hierarchical context graph, with **L0 / L1 / L2 progressive disclosure** (one-line abstract → full summary → raw content).
- **Hybrid retrieval** — keyword (BM25) + semantic (vector) fused with RRF, plus an **opt-in ANN index** that keeps dense search fast on 50K-plus-chunk corpora.
- **Local multimodal** *(opt-in, on-device)* — caption images with a local vision model and transcribe audio with a local whisper CLI, so you can find media by what's *in* it, not just its filename.
- **Code intelligence** — a code-relationship graph (imports + defined symbols + call edges) across Rust, Python, JS/TS, Go, and Java, plus `who_calls`, `blast_radius`, and an interactive **call-graph visualization** in the web Map tab — all queryable over MCP.
- **Four ways to reach the index** — a CLI, a live web workspace with an Engine status bar, a native macOS desktop app (menu-bar, auto-updating), and an MCP server (42 tools) that AI agents call directly.
- **Resource-aware** — a memory watchdog that won't freeze your machine, and a hardware-aware model picker that annotates every model with its memory footprint, fit against your live RAM, and a per-job ETA.
- **Use your Claude subscription** — the `claude-code` provider runs summaries and answers on your Claude Pro/Max plan (no per-token billing); embeddings always stay local.
- **Export** — XML (the format Anthropic's own docs recommend for context windows), Markdown, or JSON. **Watch** keeps the context current as files change.
---
## Four ways to use it
- **CLI** — `index · ask · export · scan · deep · summarize · watch · serve · mcp · doctor · classify · completion · update`. Scriptable, pipeable, zero services required (`indexa completion ` emits tab-completions).
- **Web workspace** — `indexa serve` → `http://localhost:7620`. Responsive (collapses to a drawer on a phone), keyboard-navigable, and honest about impact — a **token-savings panel** shows what retrieval saved your AI tools this week. A live Engine status bar shows what the machine is doing while it builds:
```
Engine Building · 42 files/s · ETA 1m12s · gemma3:4b CPU 38% RAM 9.1 / 16 GB pressure: ok
```
- **Desktop app** — a native macOS app that lives in the menu bar. Auto-updates silently. Bundles the web workspace — no separate `indexa serve` needed.
- **MCP server** — `indexa mcp` exposes the live index to any [MCP](https://modelcontextprotocol.io) client (Claude Desktop, Cursor, Claude Code) over **42 tools** — retrieval (`search · browse_tree · get_summary · read_file · get_chunk_context · ask`), code graph (`dependencies · who_imports · who_calls · blast_radius · code_graph`), Context Packs, importance weights, smart classification, insights, decision review, config introspection, and indexing triggers.
---
## Code intelligence
Deep-indexing records each code file's graph edges — what it **imports**, the symbols it **defines**, and what it **calls** — across Rust, Python, JS/TS, Go, and Java. Query it over MCP, so your agent reasons about structure without reading every file:
```text
dependencies("src/auth/session.rs")
→ imports: crate::store::Db
→ defines: Session, mint_token, validate
who_imports("crate::store::Db")
→ src/auth/session.rs
blast_radius("mint_token")
→ src/auth/session.rs, src/auth/login.rs, src/auth/middleware.rs
```
Or **see** the whole call graph: `indexa graph ` on the CLI, or the **Graph** sub-view in the
web Map tab — an interactive force-directed view of which files call into which.
---
## Install
Download a pre-built binary from [Releases](../../releases):
```bash
# macOS (Apple Silicon)
curl -L -o /usr/local/bin/indexa \
https://github.com/harf-promo/indexa/releases/latest/download/indexa-aarch64-apple-darwin
chmod +x /usr/local/bin/indexa
xattr -d com.apple.quarantine /usr/local/bin/indexa # bypass Gatekeeper if prompted
# macOS (Intel): indexa-x86_64-apple-darwin
# Linux x86_64: indexa-x86_64-linux-gnu · Linux arm64: indexa-aarch64-linux-gnu
# Windows x64: indexa-x86_64-windows.exe
```
Pull the local models (one-time, ~11 GB total; everything runs offline after this):
```bash
ollama pull nomic-embed-text # embeddings (~270 MB)
ollama pull gemma3:4b # file summaries (~2.5 GB)
ollama pull gemma3:12b # answers + directory roll-ups (~8 GB)
```
Or build from source (Rust ≥ 1.82): `git clone … && cargo build --release` → `target/release/indexa`.
---
## Bring your own model
No model is bundled — Indexa works with whatever you run. Configure in `~/.indexa/config.toml`:
| Adapter | How it runs |
|---|---|
| **Ollama** | Local, fully offline (default). Point elsewhere with `OLLAMA_HOST`. |
| **Claude subscription** | `provider = "claude-code"` — synthesis on your Claude Pro/Max plan, no per-token billing. Embeddings stay local. |
| **llama.cpp** | Local via its HTTP server. |
| **Google Gemini · OpenAI · Anthropic** | Cloud — data leaves your device; API key required. |
Defaults: `nomic-embed-text` (embeddings) · `gemma3:4b` (file context) · `gemma3:12b` (answers + roll-ups). Optional cross-encoder reranking *fails open* — a model hiccup falls back to the original order, so it can never make `ask` worse.
---
## What's coming
- **Mobile companion** — browse your index from a phone on the same network. The read-only API is already LAN-ready (`indexa serve --host 0.0.0.0`); a native client is next.
- **Plugin marketplace** — the parser SDK is shipped (`indexa_parsers::Registry`); next is discovery and distribution of third-party parsers.
Recently shipped: **agentic `ask`** (multi-hop plan→search→refine), **PageRank centrality** (hub files in the code graph), **signature graph visualization**, **universal macOS desktop build**, **Importance weighting**, **Insights** (duplicates / stale / weekly diff), **video captioning**, **Plugin SDK**, **LAN serve**.
Ideas and votes in [Discussions](../../discussions/categories/ideas). Full detail in [ROADMAP.md](ROADMAP.md).
---
## Contributing
Indexa is built in the open and welcomes contributors of every level. Read [CONTRIBUTING.md](CONTRIBUTING.md), browse [`good first issue`](../../issues?q=label%3A%22good+first+issue%22), and join [Discussions](../../discussions). Commits sign off with the [DCO](https://developercertificate.org/) (`git commit -s`); no CLA.
## License
Apache License 2.0 — see [LICENSE](LICENSE). Copyright 2025 Harf Promo.