{"id":50610629,"url":"https://github.com/provasign/grove","last_synced_at":"2026-06-14T04:01:06.455Z","repository":{"id":361962195,"uuid":"1256636360","full_name":"provasign/grove","owner":"provasign","description":"Code knowledge graph — Tree-sitter parsing, SQLite storage, BFS traversal. Powers Prism; embeddable Go API, CLI, and MCP. MIT licensed.","archived":false,"fork":false,"pushed_at":"2026-06-13T02:35:21.000Z","size":28404,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-13T03:25:28.821Z","etag":null,"topics":["ai-agents","code-graph","golang","sqlite","tree-sitter"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/provasign.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-02T00:53:05.000Z","updated_at":"2026-06-13T02:35:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/provasign/grove","commit_stats":null,"previous_names":["provasign/grove"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/provasign/grove","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/provasign%2Fgrove","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/provasign%2Fgrove/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/provasign%2Fgrove/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/provasign%2Fgrove/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/provasign","download_url":"https://codeload.github.com/provasign/grove/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/provasign%2Fgrove/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34308622,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","code-graph","golang","sqlite","tree-sitter"],"created_at":"2026-06-06T03:06:22.623Z","updated_at":"2026-06-14T04:01:06.435Z","avatar_url":"https://github.com/provasign.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grove\n\n\u003e **Your codebase's persistent long-term memory — queryable by any AI agent.**\n\n\u003e **Embedded mode (current):** Grove is a Go library at `github.com/provasign/grove/pkg/grove`. Prism and Fuse link it directly and open the on-disk index in-process. There is no `grove serve` daemon, no port (7777/7778), and no `.grove/.token`. The CLI is available for explicit indexing plus read queries (`grove index .`, `grove symbols main`) and stdio MCP (`grove mcp`).\n\n---\n\nGrep answers \"does this string appear somewhere?\" A language server answers \"where is this symbol defined?\" Grove answers the harder questions AI agents actually need:\n\n- *What does changing this function break — across the entire codebase?*\n- *Which tests cover this method, directly or transitively?*\n- *What is the full dependency chain from this file?*\n- *What symbols are semantically related to this task description?*\n\nThe difference is a graph. Grove indexes your source files into a persistent SQLite graph — 11 languages, 8 edge types, BFS traversal — and keeps it live with delta indexing (files whose content hash hasn't changed are never re-parsed). The graph is queryable through the embedded Go API, CLI, and MCP stdio.\n\nGrove is the foundation the rest of the toolchain is built on. Prism uses it to focus context. Fuse uses it to resolve conflicts. Shale will use it for intent-to-diff conformance. Without Grove, they fall back to line-level operations.\n\nGrove also exposes a conservative certification report for unified diffs. This mode is additive: consumers see no change unless they explicitly opt into the report. The report labels heuristic evidence, returns `manual_review` for unsupported or unmapped changes, and only returns `allow` when changed code maps cleanly to indexed symbols with required test evidence.\n\n---\n\n## Architecture\n\n```\nSource files\n     │\n     ▼\n┌─────────────────────────────────────────────────────────┐\n│  internal/parser/                                       │\n│  Tree-sitter AST walkers (11 languages)                 │\n│  Regex fallback for syntax-error recovery               │\n│  Native semantic analyzers for Go, Python, Java,       │\n│  Rust, C, C++, C#, PHP, JS, and TS                      │\n│  All CGO is isolated to this package                    │\n└────────────────────────┬────────────────────────────────┘\n                         │ []SymbolRecord\n                         ▼\n┌─────────────────────────────────────────────────────────┐\n│  internal/store/                                        │\n│  SQLite WAL                                             │\n│  Delta indexing by content SHA                          │\n│  Stale-file pruning                                     │\n└────────────────────────┬────────────────────────────────┘\n                         │\n                         ▼\n┌─────────────────────────────────────────────────────────┐\n│  internal/graph/                                        │\n│  In-memory CodeGraph                                    │\n│  8 edge types                                           │\n│  BFS traversal                                          │\n└──────┬─────────────────┬───────────────────────────────┘\n       │\n       ├───────────────────────┐\n       ▼                       ▼\n┌────────────┐        ┌────────────────────┐\n│ internal/  │        │ pkg/grove          │\n│ mcp/       │        │ Embedded Go API    │\n│ 9 tools    │        │ Query / impact /   │\n│ JSON-RPC   │        │ deps / tests / ICR │\n│ stdio      │        │ / certify / diff   │\n└────────────┘        └────────────────────┘\n```\n\n---\n\n## Design Decisions\n\n**Single binary, zero runtime dependencies.** SQLite is embedded via `modernc.org/sqlite` — a pure-Go port — which avoids a CGO linker conflict with tree-sitter. Tree-sitter itself (in `internal/parser/`) is the only CGO dependency.\n\n**Delta indexing by content hash.** Grove hashes each file before parsing. If the stored hash matches, the file is skipped entirely. Indexing a 5000-file repo after a one-line change touches one file, not 5000.\n\n**AST-first with native enrichment.** Tree-sitter produces a complete AST even for files with syntax errors, but marks broken subtrees as `ERROR` nodes. When `root.HasError()` is true, Grove runs both the AST extractor and the regex fallback, then merges the results with AST taking precedence. On top of that, language-native analyzers add type-use, inheritance, and import edges when the local toolchain is available. (Call edges are owned by the call-site resolver in `internal/graph/`, which narrows by inferred receiver type — text-matching call edges in the native pass exploded fan-out on overload- and same-name-heavy code, so they were retired in favor of the narrowed graph-layer resolution.) Files that are actively being edited mid-keystroke are still indexed usefully, and the graph gets richer when the repository can be resolved with native tooling.\n\n**Scoped edges prevent false positives.** `calls` and `uses-type` edges are only created between symbols in the same file or in files connected by an `imports` edge. Without this constraint, a function named `parse` in one package would appear to call a `parse` function in an unrelated package, producing roughly 5× the false-positive edges.\n\n**Symbol ID format.** Every symbol has a canonical ID: `{filePath}::{qualifiedName}@{contentSHA}` (SHA-1 of the file content). Qualified names include the parent — `Service.Login`, `User.__init__` — so same-named members on different receivers or classes in one file stay distinct; any residual collision is disambiguated deterministically. The content-SHA component means that if you rename a function, the old symbol ID disappears and a new one is created — stale references in the graph don't survive a reindex.\n\n---\n\n## Language Support\n\n| Language | Extension(s) | Extraction |\n|----------|-------------|-----------|\n| Go | `.go` | AST walker + native semantic enrichment |\n| TypeScript | `.ts` | AST walker + native semantic enrichment |\n| TSX | `.tsx` | AST walker + native semantic enrichment |\n| JavaScript | `.js .jsx .mjs .cjs` | AST walker + native semantic enrichment |\n| Python | `.py` | AST walker + native semantic enrichment |\n| Java | `.java` | AST walker + native semantic enrichment |\n| Rust | `.rs` | AST walker + native semantic enrichment |\n| C | `.c .h` | AST walker + native semantic enrichment |\n| C++ | `.cc .cpp .cxx .hh .hpp` | AST walker + native semantic enrichment |\n| C# | `.cs` | AST walker + native semantic enrichment |\n| PHP | `.php .phtml` | AST walker + native semantic enrichment |\n\nNon-code files (`.md`, `.yaml`, `.json`, `.xml`, `.sh`, `.toml`, `.proto`, `.sql`, `Makefile`, `Dockerfile`, and more) are indexed as `document` symbols whose content feeds the semantic and lexical search indexes. Agents can query them alongside code symbols.\n\n---\n\n## Measured Accuracy\n\nGrove's edges are scored against typed-toolchain ground truth on pinned\nreal-world repos, and CI fails any change that regresses below the recorded\nfloors (see [eval/README.md](eval/README.md) for methodology, oracles, and\nthe full progression history). Calls-edge accuracy across **ten languages**,\n2026-06-13:\n\n| Repo (pin) | Language | Oracle | Precision | Recall | F1 |\n|---|---|---|---|---|---|\n| gin | Go | go/ssa + VTA callgraph | 0.93 | 0.95 | **0.94** |\n| socket.io | TypeScript | TypeScript compiler API | 0.85 | 0.96 | **0.90** |\n| commons-lang | Java | javac + javap bytecode | 0.69 | 0.84 | **0.76** |\n| express | JavaScript (CJS) | TypeScript compiler API (checkJs) | 0.75 | 0.71 | **0.73** |\n| flask | Python | dynamic (pytest runtime trace) | 0.85 | 0.61 | **0.71** |\n| ripgrep | Rust | rust-analyzer SCIP index | 0.85 | 0.60 | **0.70** |\n| jansson | C / C++ | scip-clang SCIP index | 0.88 | 0.56 | **0.69** |\n| Newtonsoft.Json | C# | Roslyn semantic model | 0.66 | 0.70 | **0.68** |\n| PHP-Parser | PHP | dynamic (xdebug runtime trace) | 0.77 | 0.54 | **0.63** |\n\nNotes on reading these honestly:\n\n- Oracles match each ecosystem: typed static analysis where the language has\n  it (Go SSA/VTA, the TypeScript checker, Roslyn, rust-analyzer and\n  scip-clang SCIP, Java bytecode), and **runtime execution** for the\n  dynamically-dispatched languages (Python and PHP test-suite traces).\n- A runtime oracle records edges no static tool can see (registry dispatch,\n  dunder/magic protocols, proxies, virtual dispatch), so for Python and PHP\n  recall against it is a conservative lower bound and precision is the lever\n  that matters. The static-oracle languages are the inverse — precision is\n  the lower bound where a true edge runs through an untested or dynamically\n  dispatched path the structural graph can't pin to one target.\n- Test-relatedness edges are scored against per-test runtime coverage:\n  flask edge precision 0.74, with a truly-covering test suggested for 36%\n  of covered functions. Blast radius (depth-2 reverse reachability on gin)\n  scores F1 0.88.\n- Every number above is a CI gate (`.github/workflows/eval.yml`), including\n  a universe-match floor that catches tree-sitter grammar drift when new\n  language syntax ships.\n\nGround truths are tool-agnostic JSONL — score your own code graph against\nthem.\n\n---\n\n## Graph Edge Types\n\n| Edge | Meaning |\n|------|---------|\n| `defines` | File defines this symbol |\n| `contains` | Class/namespace contains this member |\n| `imports` | File imports another file |\n| `extends` | Class extends/embeds another |\n| `implements` | Class implements an interface |\n| `calls` | Function calls another function (scoped) |\n| `uses-type` | Function/field uses a type (scoped) |\n| `tests` | Test function covers a named symbol |\n| `overrides` | Concrete method implements an interface/abstract declaration |\n\n---\n\n## Performance\n\nMeasured on real repositories (macOS, Apple Silicon, 2026-06-12), parallel\nparsing and native analyzers enabled:\n\n| Repo | Files indexed | Symbols | Edges | Cold index | One-file change | No-change reindex (CLI) |\n|------|--------------:|--------:|------:|-----------:|----------------:|------------------------:|\n| [prometheus](https://github.com/prometheus/prometheus) | 1,476 | 14k | 259k | 12.2 s | — | 0.7 s |\n| [django](https://github.com/django/django) | 3,792 | 39.5k | 425k | 18.5 s | — | 1.5 s |\n| [grafana](https://github.com/grafana/grafana) | 18,979 | 98.5k | 1.16M | 56.6 s | 18.7 s | 9.5 s |\n\nHow to read these:\n\n- **Cold index** includes tree-sitter parsing of everything, native\n  analyzers (the TypeScript program check dominates polyglot repos), full\n  edge construction, and persistence.\n- **One-file change** re-parses one file, runs native analyzers only for\n  the changed file's language (untouched languages' edges are carried\n  forward), rebuilds the graph, and diff-syncs the edge table (only changed\n  rows are written). The remaining cost on huge repos is the full\n  in-memory edge rebuild — per-file incremental edge maintenance is the\n  known next step.\n- **No-change reindex (CLI)** is dominated by per-process graph\n  rehydration from SQLite — a long-lived embedded engine (Prism MCP, Fuse)\n  pays it once at open, after which a no-change `Index()` is milliseconds\n  and queries run against the in-memory graph (BFS depth-3 over 50k nodes\n  ≈ 4.5 ms, ranked search ≈ 15 ms at 50k symbols).\n- Native analyzer timeouts default to 5 s per analyzer; very large repos\n  may need `GROVE_NATIVE_TIMEOUT=60s` for `go list`/`tsc` to finish (skips\n  degrade gracefully and are reported in the index diagnostics).\n\n---\n\n## Tool and IDE Integration\n\nGrove is the graph backend for the toolchain. Prism and Fuse consume the embedded Go API directly. Direct AI agent integration is available through MCP stdio.\n\n| Integration | How | Use case |\n|-------------|-----|---------|\n| Claude Code CLI | `grove mcp .` → MCP stdio | Direct agent integration without Prism |\n| Cursor, Windsurf, Zed | `grove mcp .` → MCP stdio | Same |\n| VS Code (Copilot Agent) | Prism extension → embedded Grove | Grove-backed context through Prism |\n| Prism (all IDEs) | Embedded Go API | Token-optimized context delivery |\n| Fuse (git merge) | Embedded Go API | Blast radius + conflict hints |\n| Shale (planned) | Embedded Go API | Intent-to-diff conformance |\n| Custom automation | `pkg/grove` | In-process Go integration |\n\nFor most AI agent use cases, running Grove directly is only necessary for custom integrations. The normal path is `prism init` in your project — Prism embeds Grove in-process and opens the index itself; nothing is started, installed, or configured separately.\n\n---\n\n## Installation\n\n**Binary install (fastest):**\n\n```bash\n# macOS / Linux\ncurl -fsSL https://raw.githubusercontent.com/provasign/grove/main/install.sh | bash\n\n# Windows (PowerShell)\nirm https://raw.githubusercontent.com/provasign/grove/main/install.ps1 | iex\n\n# Pin a specific version\nVERSION=v0.5.0 curl -fsSL https://raw.githubusercontent.com/provasign/grove/main/install.sh | bash\n```\n\nInstalls to `~/bin` by default. Set `INSTALL_DIR=/usr/local/bin` to override.\n\n**Build from source:**\n\n```bash\nmake build    # compile ./bin/grove\nmake install  # install to $GOPATH/bin\nmake test     # run all tests\n```\n\n---\n\n## CLI Reference\n\n```bash\n# Set up a project (creates .grove/ directory and config)\ngrove init [dir]\n\n# Index or reindex (skips unchanged files via delta SHA; reuses the stored\n# graph outright when nothing changed — --force re-runs analyzers anyway)\ngrove index [dir] [--force]\n\n# Show persisted index status without refreshing\ngrove status [dir] [--refresh]\n\n# Symbol search\ngrove symbols \u003cquery\u003e [dir] [--refresh]\n\n# Intent-based semantic query (Model2Vec embeddings + BFS graph ranking)\ngrove query \u003cintent\u003e [dir] [--refresh]\n\n# Blast radius: what would break if this symbol changed?\ngrove impact \u003csymbol\u003e [dir] [--refresh]\n\n# Which tests cover a symbol?\ngrove tests \u003csymbol\u003e [dir] [--refresh]\n\n# Conservative structural certification for a unified diff.\n# Exit codes: 0 allow, 2 manual_review, 3 block, 1 runtime error.\ngrove certify \u003cdiff-file-or-\u003e [dir]\n\n# Start MCP stdio server (primary AI agent integration)\ngrove mcp [dir]\n\n```\n\n## Graph Diff\n\n`pkg/grove` exposes the primitive behind cross-agent drift detection:\n\n```go\nbefore := eng.SnapshotSymbols(ctx)   // capture\n// ... merge lands / files change ...\neng.Index(ctx, \"\")                   // reindex\ndiff := eng.DiffSince(ctx, before)   // structural delta\n```\n\n`GraphDiff` reports added, removed, and changed symbols plus\n`BreakingChanges` (exported symbols removed or with a changed signature).\nSymbols are matched by stable identity — file path + qualified name + kind —\nso line shifts and content-SHA churn don't register; only symbols whose\nsignature or body actually changed appear. Fuse can diff the graph across a\nmerge and intersect the result with another agent's working set to deliver\n\"the ground shifted under you\" notifications with a minimal context patch.\n\n## Certification Mode\n\n`grove certify` and `pkg/grove.Engine.CertifyDiff` map unified diff hunks to indexed symbols and emit a JSON report containing changed files, changed symbols, impacted symbols, related tests, unknowns, findings, and a verdict.\n\nVerdicts are intentionally conservative:\n\n| Verdict | Meaning |\n|---------|---------|\n| `allow` | Grove mapped the diff to indexed symbols and found required evidence. |\n| `manual_review` | Grove could not prove enough structurally, for example unsupported files, ignored/sensitive paths, deleted/binary files, unmapped hunks, a stale index (file on disk no longer matches the indexed content), or missing test evidence. |\n| `block` | Grove could not process the diff deterministically, for example malformed diff input. |\n\nCertification mode is not a compiler or language-server resolver. Tree-sitter, astkit, and the native analyzers provide structural facts; the report still stays conservative and falls back to `manual_review` whenever evidence is incomplete.\n\n---\n\n## HTTP API\n\nThere is no HTTP or gRPC daemon in the current embedded mode. Use `pkg/grove` for in-process integration, the CLI for local commands, or `grove mcp` for stdio MCP.\n\n---\n\n## MCP Tools\n\nGrove exposes nine tools over JSON-RPC 2.0 stdio, accessible to any MCP-capable AI agent. Every tool publishes a full JSON schema with per-parameter descriptions, so agents can discover arguments without guessing:\n\n| Tool | Purpose |\n|------|---------|\n| `grove_index` | Index or reindex a directory (`force` re-runs analyzers) |\n| `grove_symbols` | Lexical symbol search, ranked by match quality |\n| `grove_query` | Semantic search: ranked context for a free-text intent |\n| `grove_impact` | Blast radius for a symbol or file |\n| `grove_deps` | Dependency edges for a file |\n| `grove_tests` | Tests that cover a symbol |\n| `grove_icr` | Isolated Change Region for an intent |\n| `grove_conflicts` | Overlap check between two ICRs |\n| `grove_certify` | Conservative certification report for a unified diff |\n\nStart the MCP server:\n\n```bash\ngrove mcp .\n```\n\n## Storage\n\nGrove stores everything in `.grove/grove.db` (SQLite, WAL mode). The database is a single file — back it up, copy it, or delete it to force a full reindex. Schema migrations are applied when the store opens.\n\nKey SQLite settings:\n- WAL mode for concurrent reads during indexing\n- `busy_timeout = 30s` to handle contention without immediate errors\n\n---\n\n## Security\n\nGrove does not expose a network listener in embedded mode. Indexing skips dependency/build/cache directories, honors `.groveignore` and `.gitignore`, and avoids common secret-bearing filenames and credential/key extensions.\n\n---\n\n## Testing\n\n```bash\nmake test                                          # all packages\ngo test ./internal/parser/... -run TestGoExtractor # single extractor\ngo test ./internal/parser/... -v                   # verbose parser tests\n```\n\nKey test areas: language extractors (fixture-based), BFS traversal on known graph topologies, delta indexing, ignore/secret-safe indexing, MCP stdio framing, and FTS5 query ranking.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprovasign%2Fgrove","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprovasign%2Fgrove","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprovasign%2Fgrove/lists"}