{"id":47737487,"url":"https://github.com/marjoballabani/hypergrep","last_synced_at":"2026-04-05T00:01:48.338Z","repository":{"id":347837067,"uuid":"1195369581","full_name":"marjoballabani/hypergrep","owner":"marjoballabani","description":"A better grep for AI agents. Structural search, call graphs, impact analysis, semantic compression. 87% fewer tokens. 16 languages. Built in Rust.","archived":false,"fork":false,"pushed_at":"2026-03-29T21:27:24.000Z","size":193,"stargazers_count":3,"open_issues_count":1,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T23:03:07.829Z","etag":null,"topics":["ai-agents","ast","call-graph","claude-code","cli","code-intelligence","code-search","copilot","cursor","developer-tools","grep","llm","ripgrep","rust","static-analysis","token-optimization","tree-sitter"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marjoballabani.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-29T15:36:35.000Z","updated_at":"2026-04-03T19:39:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"ea4f6692-b3b6-4e68-85d0-b1e553651f03","html_url":"https://github.com/marjoballabani/hypergrep","commit_stats":null,"previous_names":["marjoballabani/hypergrep"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/marjoballabani/hypergrep","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marjoballabani%2Fhypergrep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marjoballabani%2Fhypergrep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marjoballabani%2Fhypergrep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marjoballabani%2Fhypergrep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marjoballabani","download_url":"https://codeload.github.com/marjoballabani/hypergrep/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marjoballabani%2Fhypergrep/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31419549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","ast","call-graph","claude-code","cli","code-intelligence","code-search","copilot","cursor","developer-tools","grep","llm","ripgrep","rust","static-analysis","token-optimization","tree-sitter"],"created_at":"2026-04-02T22:58:36.723Z","updated_at":"2026-04-05T00:01:48.004Z","avatar_url":"https://github.com/marjoballabani.png","language":"Rust","readme":"# Hypergrep\n\n[![CI](https://github.com/marjoballabani/hypergrep/actions/workflows/ci.yml/badge.svg)](https://github.com/marjoballabani/hypergrep/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Tests](https://img.shields.io/badge/tests-120%20passing-brightgreen.svg)]()\n\n**A codebase intelligence engine for AI coding agents.**\n\nAI agents waste 60-80% of their tokens on navigation -- grep returns raw lines, the agent reads files to understand context, repeats 50+ times per session. Hypergrep returns structural answers: function bodies, call graphs, impact analysis, and codebase summaries in 87% fewer tokens.\n\n### Key numbers (measured, not projected)\n\n| Metric | ripgrep | Hypergrep | |\n|--------|---------|-----------|--|\n| Warm search latency | 31ms | **4.4ms** | 7x faster |\n| 50-query session | 1,550ms | **220ms** | 7x faster |\n| Tokens per 3-query task | 20,580 | **2,814** | 87% less |\n| \"Who calls this?\" | impossible | **2.5us** | new capability |\n| \"Does this use Redis?\" | 31ms (full scan) | **291ns** | 100,000x faster |\n| Codebase summary | N/A | **699 tokens** | loaded once |\n\n\u003e Benchmarked on ripgrep's own source (208 files, 52K lines). See [BENCHMARKS.md](BENCHMARKS.md) for full methodology.\n\n### Why not just ripgrep?\n\nripgrep is the best text search tool. Use it for one-off greps. But AI agents don't do one-off greps -- they do 50-200 searches per session, and every result is raw text that needs follow-up file reads to understand.\n\nHypergrep answers the questions agents actually ask:\n\n| Agent needs | ripgrep gives | Hypergrep gives |\n|-------------|---------------|-----------------|\n| \"Find the auth handler\" | 47 matching lines | The function body + signature + call graph |\n| \"What calls this?\" | nothing | `--callers`: reverse call graph in 2.5us |\n| \"What breaks if I change this?\" | nothing | `--impact`: blast radius with severity |\n| \"Does this project use Redis?\" | Full scan, 0 results | `--exists`: YES/NO in 291ns |\n| \"How is this codebase structured?\" | nothing | `--model`: structural summary in 699 tokens |\n| \"Give me the best results in 500 tokens\" | not possible | `--budget 500`: budget-fitted results |\n\n### Status\n\n**v0.1.0** -- Production-ready for small/medium codebases (\u003c1K files). 120 tests. 8 languages. Disk-cached index. Zero false negatives guaranteed.\n\n| Component | Status |\n|-----------|--------|\n| Text search (trigram index) | Stable |\n| Structural search (tree-sitter, 8 langs) | Stable |\n| Call graph + impact analysis | Stable |\n| Semantic compression (L0/L1/L2 + budget) | Stable |\n| Bloom filter (existence checks) | Stable |\n| Mental model (codebase summary) | Stable |\n| Disk persistence (.hypergrep/index.bin) | Stable |\n| Daemon mode (persistent index + fs watcher) | Beta |\n| Predictive query prefetch | Experimental |\n\n## Install\n\n### Pre-built binary (fastest)\n\nmacOS and Linux -- downloads the right binary for your platform:\n\n```bash\ncurl -sSfL https://github.com/marjoballabani/hypergrep/releases/latest/download/hypergrep-installer.sh | sh\n```\n\nOr download manually from the [Releases page](https://github.com/marjoballabani/hypergrep/releases).\n\n| Platform | Binary |\n|----------|--------|\n| macOS Apple Silicon (M1/M2/M3/M4) | `hypergrep-aarch64-apple-darwin.tar.gz` |\n| macOS Intel | `hypergrep-x86_64-apple-darwin.tar.gz` |\n| Linux x86_64 | `hypergrep-x86_64-unknown-linux-gnu.tar.gz` |\n| Linux ARM64 | `hypergrep-aarch64-unknown-linux-gnu.tar.gz` |\n\n### From source\n\n```bash\ngit clone https://github.com/marjoballabani/hypergrep.git\ncd hypergrep\n./install.sh\n```\n\nOr manually:\n\n```bash\ncargo build --release\ncp target/release/hypergrep ~/.cargo/bin/   # or /usr/local/bin/\n```\n\nRequires Rust 1.75+ and a C compiler (for tree-sitter grammars).\n\n### Update\n\nSame command as install -- always gets the latest release:\n\n```bash\ncurl -sSfL https://github.com/marjoballabani/hypergrep/releases/latest/download/hypergrep-installer.sh | sh\n```\n\n### Uninstall\n\n```bash\n# Stop any running daemon\nhypergrep-daemon --stop . 2\u003e/dev/null\n\n# Remove binaries\nrm -f $(which hypergrep) $(which hypergrep-daemon)\n\n# Remove index caches from your projects (optional)\nfind ~ -name \".hypergrep\" -type d -exec rm -rf {} + 2\u003e/dev/null\n```\n\n### Verify\n\n```bash\nhypergrep --version\nhypergrep --help\n```\n\n## AI agent setup\n\nTell your AI tools to use hypergrep. Run this in any project:\n\n```bash\nhypergrep-setup.sh /path/to/your/project\n```\n\nThis creates config files for Claude Code, Cursor, Copilot, and Windsurf. Your agents will automatically use hypergrep instead of grep.\n\n**Manual setup** -- copy one file for your tool:\n\n| Tool | File to create | Template |\n|------|---------------|----------|\n| Claude Code | `CLAUDE.md` | [agent-config/CLAUDE.md](agent-config/CLAUDE.md) |\n| Cursor | `.cursorrules` | [agent-config/.cursorrules](agent-config/.cursorrules) |\n| GitHub Copilot | `.github/copilot-instructions.md` | [agent-config/.github/copilot-instructions.md](agent-config/.github/copilot-instructions.md) |\n| Windsurf | `.windsurfrules` | [agent-config/.windsurfrules](agent-config/.windsurfrules) |\n\n## Quick start\n\n```bash\n# Search (ripgrep-compatible)\nhypergrep \"authenticate\" src/\n\n# Structural search (return full function bodies)\nhypergrep -s \"authenticate\" src/\n\n# Semantic compression (signatures + call graph, 500 token budget)\nhypergrep --layer 1 --budget 500 \"authenticate\" src/\n\n# JSON output for agent consumption\nhypergrep --layer 1 --json \"authenticate\" src/\n\n# Impact analysis (what breaks if this changes?)\nhypergrep --impact \"authenticate\" src/\n\n# Codebase mental model (load once, skip orientation)\nhypergrep --model \"\" src/\n\n# Existence check (O(1) bloom filter)\nhypergrep --exists \"redis\" src/\n```\n\n## Search modes\n\n### Text search (default)\n\nRipgrep-compatible output. Builds a trigram index internally for fast repeated searches.\n\n```\nhypergrep \"pattern\" dir\nhypergrep -c \"pattern\" dir            # count only\nhypergrep -l \"pattern\" dir            # file names only\n```\n\n### Structural search (`-s`)\n\nReturns complete enclosing functions/classes instead of raw lines. If a pattern matches 5 lines inside one function, the function is returned once (deduplicated).\n\n```\nhypergrep -s \"authenticate\" src/\n```\n\nOutput:\n```\nsrc/auth.rs:1-8 function authenticate\nfn authenticate(user: \u0026str, pass: \u0026str) -\u003e bool {\n    let hashed = hash_password(pass);\n    check_db(user, hashed)\n}\n---\n```\n\n### Semantic compression (`--layer`)\n\nThree levels of detail, each using fewer tokens:\n\n| Layer | Content | Tokens/result |\n|-------|---------|---------------|\n| `--layer 0` | File path + symbol name + kind | ~15 |\n| `--layer 1` | Signature + calls + called_by | ~80-120 |\n| `--layer 2` | Full source code of enclosing function | ~200-800 |\n\n```bash\n# Layer 1: signatures + call graph context\nhypergrep --layer 1 \"search\" src/\n```\n\nOutput:\n```\nsrc/index.rs:function search (~65 tokens)\n  sig: pub fn search(\u0026self, pattern: \u0026str) -\u003e Result\u003cVec\u003cSearchMatch\u003e\u003e\n  calls: trigrams_from_regex, resolve_query\n  called_by: search_structural, search_semantic, test_search_literal\n```\n\n### Token budget (`--budget`)\n\nTell Hypergrep how many tokens you can afford. It selects the best results that fit.\n\n```bash\n# Best results in 500 tokens\nhypergrep --layer 1 --budget 500 \"authenticate\" src/\n```\n\n### JSON output (`--json`)\n\nFor programmatic agent consumption. Works with `--layer`, `--model`, and `--exists`.\n\n```bash\nhypergrep --layer 1 --json \"authenticate\" src/\n```\n\n```json\n[\n  {\n    \"file\": \"src/auth.rs\",\n    \"name\": \"authenticate\",\n    \"kind\": \"function\",\n    \"line_range\": [1, 8],\n    \"signature\": \"fn authenticate(user: \u0026str, pass: \u0026str) -\u003e bool\",\n    \"calls\": [\"hash_password\", \"check_db\"],\n    \"called_by\": [\"login_handler\", \"api_key_verify\"],\n    \"tokens\": 85\n  }\n]\n```\n\n## Graph queries\n\n### Callers (`--callers`)\n\nReverse call graph: who calls this symbol?\n\n```bash\nhypergrep --callers \"authenticate\" src/\n```\n\n### Callees (`--callees`)\n\nForward call graph: what does this symbol call?\n\n```bash\nhypergrep --callees \"authenticate\" src/\n```\n\n### Impact analysis (`--impact`)\n\nWhat breaks if you change this symbol? BFS upstream through the call graph with severity classification:\n\n```bash\nhypergrep --impact \"hash_password\" src/\n```\n\nOutput:\n```\nImpact analysis for 'hash_password' (depth 3):\n\n  [depth 1] WILL BREAK   src/auth.rs:authenticate\n  [depth 2] MAY BREAK    src/api.rs:login_handler\n  [depth 3] REVIEW        src/main.rs:setup_routes\n```\n\nSeverity levels:\n- **WILL BREAK** (depth 1) -- direct callers\n- **MAY BREAK** (depth 2) -- callers of callers\n- **REVIEW** (depth 3+) -- transitive dependents\n\n## Codebase intelligence\n\n### Mental model (`--model`)\n\nA compressed structural summary (~300-500 tokens) of the entire codebase. Load this once at agent session start to skip 80% of exploratory searches.\n\n```bash\nhypergrep --model \"\" src/\n```\n\nOutput:\n```\n# Codebase Mental Model\n\n## Languages\n- Rust: 14 files\n- TypeScript: 8 files\n\n## Structure\n- src/auth/ (3 files) -- 5 functions, 2 structs\n- src/api/ (6 files) -- 12 functions, 3 structs\n- src/db/ (4 files) -- 8 functions, 1 struct\n\n## Key Abstractions\n- function authenticate (src/auth/handler.rs) -- 8 callers, 3 callees\n- struct UserService (src/auth/service.rs) -- 5 callers, 4 callees\n\n## Entry Points\n- src/main.rs\n\n## Hot Spots (most complex)\n- src/api/handlers.rs (15 symbols, 340 lines)\n- src/auth/handler.rs (8 symbols, 180 lines)\n```\n\n### Existence check (`--exists`)\n\nDoes this codebase use a specific technology? Answered in microseconds via bloom filter.\n\n```bash\nhypergrep --exists \"redis\" src/        # YES or NO\nhypergrep --exists \"graphql\" src/\nhypergrep --exists \"kubernetes\" src/\n```\n\n- **NO** = definitely not present (zero false negatives, guaranteed)\n- **YES** = likely present (~1% false positive rate)\n\n### Stats (`--stats`)\n\n```bash\nhypergrep --stats \"\" src/\n```\n\n```\nFiles indexed: 17\nUnique trigrams: 8113\nSymbols parsed: 214\nGraph edges: 305\nBloom filter: 173 concepts, 11984 bytes\nMental model: 474 tokens\nIndex build time: 94ms\n```\n\n## Supported languages\n\nTree-sitter grammars for structural parsing and call graph extraction:\n\n| Language | Structural search | Call graph | Import tracking |\n|----------|------------------|------------|-----------------|\n| Rust | Functions, structs, enums, traits, impls, modules | Yes | Yes |\n| Python | Functions, classes | Yes | Yes |\n| JavaScript | Functions, classes, methods, arrow functions | Yes | Yes |\n| TypeScript | Functions, classes, methods, arrow functions | Yes | Yes |\n| Go | Functions, methods, type declarations | Yes | Yes |\n| Java | Methods, classes, interfaces, enums | Yes | Partial |\n| C | Functions, structs, enums | Yes | No |\n| C++ | Functions, classes, structs, enums | Yes | No |\n\nUnsupported languages fall back to line-level text search (same as ripgrep).\n\n## Daemon mode\n\nFor agent sessions with 50+ queries, the daemon keeps the index in memory for sub-millisecond searches:\n\n```bash\n# Start in background (auto-stops after 30 min idle)\nhypergrep-daemon --background /path/to/project\n\n# Check status (shows PID, memory, socket path)\nhypergrep-daemon --status /path/to/project\n\n# Stop manually\nhypergrep-daemon --stop /path/to/project\n```\n\n**Safety features:**\n- **Auto-stop**: Shuts down after 30 minutes of no queries (configurable: `--idle-timeout 3600`)\n- **Memory limit**: Hard cap at 500 MB -- shuts down with a warning if exceeded\n- **Memory reporting**: `--status` shows live RSS so you always know what it's using\n- **PID file**: Prevents duplicate daemons for the same project\n- **Clean shutdown**: Ctrl+C or `--stop` removes socket + PID file\n- **Socket permissions**: Owner-only (0600) -- other users can't query your code\n\n```\n$ hypergrep-daemon --status .\nRunning\n  PID:    18067\n  Socket: /tmp/hypergrep-f983e88f.sock\n  Memory: 8.5 MB\n  Root:   /Users/you/project\n```\n\n**When to use the daemon vs CLI:**\n\n| Scenario | Use |\n|----------|-----|\n| Quick one-off search | `hypergrep \"pattern\" src/` (CLI) |\n| AI agent session (50+ queries) | `hypergrep-daemon --background src/` |\n| CI/CD pipeline | `hypergrep \"pattern\" src/` (CLI, no daemon) |\n| Long coding session | `hypergrep-daemon --background --idle-timeout 3600 src/` |\n\n## Architecture\n\n```\nAgent (Claude Code, Cursor, etc.)\n  |\n  v\nHypergrep Daemon\n  |\n  +-- Query Router (text / structural / graph / existence)\n  +-- Prefetch Engine (predict next 3-5 queries, cache speculatively)\n  +-- Result Compiler (layer selection, budget fitting, dedup)\n  |\n  +-- Unified Index\n  |     +-- Text Index (trigram posting lists, galloping intersection)\n  |     +-- Code Graph (call/import/type edges, BFS impact analysis)\n  |     +-- AST Cache (tree-sitter symbol boundaries per file)\n  |     +-- Bloom Filter (concept vocabulary, ~12KB)\n  |     +-- Mental Model (derived structural summary)\n  |\n  +-- Index Manager (fs watcher, incremental re-index, git state tracking)\n```\n\n## How it works\n\n1. **Index build** (~100ms for medium codebases): Walk directory, extract trigrams from every file, parse ASTs with tree-sitter, build call graph from call expressions, populate bloom filter from imports/patterns.\n\n2. **Text search**: Decompose regex into required trigrams. Intersect posting lists (galloping merge). Run regex verification only on candidate files. Zero false negatives guaranteed.\n\n3. **Structural search**: After text match, look up the enclosing AST node (function, class, method). Return the complete symbol body. Deduplicate: multiple matches in one symbol return it once.\n\n4. **Graph queries**: BFS traversal of the call graph. Callers = reverse edges. Impact = multi-depth BFS with severity classification.\n\n5. **Semantic compression**: Convert symbols to compact JSON representations. Layer 0 = name. Layer 1 = signature + call graph. Layer 2 = full code. Budget fitting = greedy selection of top results within token limit.\n\n## Performance\n\n| Scenario | Latency | Notes |\n|----------|---------|-------|\n| Cold start (no cache) | ~800ms | Builds trigram index + saves to disk |\n| Cached start | **40ms** | Loads from `.hypergrep/index.bin` |\n| Warm text search | **3-7ms** | Daemon mode, index in memory |\n| Warm structural search | **5-17ms** | Lazy tree-sitter, parses only matched files |\n| Graph queries | **2-7us** | In-memory adjacency list traversal |\n| Bloom filter | **291ns** | Single hash lookup |\n| 50-query agent session | **220ms** | 4.4ms/query average |\n\nTested on 208 files / 52K lines (ripgrep source). See [BENCHMARKS.md](BENCHMARKS.md) for full numbers with methodology.\n\n## Limitations\n\n- **Cold start is slower than ripgrep** (800ms vs 31ms). The index pays for itself after ~40 queries. Use the daemon for agent workloads.\n- **Call graph is static analysis only.** Dynamic dispatch, reflection, callbacks, and macros are not resolved. Impact results may be incomplete.\n- **Bloom filter has ~2% false positives.** \"YES\" means \"probably\" -- confirm with a real search. \"NO\" is always correct.\n- **Large codebases (\u003e10K files)** need daemon mode. CLI cold start is too slow.\n- **Memory**: ~17 MB for text index, ~54 MB with full structural pass (208 files). Scales linearly.\n\n## Research\n\nSee [RESEARCH.md](RESEARCH.md) for the full theoretical foundations, prior art analysis (42 references), and quantitative projections behind Hypergrep.\n\n## License\n\n[MIT](LICENSE)\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, project structure, and how to add new languages.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarjoballabani%2Fhypergrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarjoballabani%2Fhypergrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarjoballabani%2Fhypergrep/lists"}