{"id":49505580,"url":"https://github.com/ugurcan-aytar/recall","last_synced_at":"2026-05-14T15:01:03.963Z","repository":{"id":351296227,"uuid":"1210095115","full_name":"ugurcan-aytar/recall","owner":"ugurcan-aytar","description":"Hybrid BM25 + vector search over SQLite. A single-binary CLI for local-first semantic recall across notes, docs, and code: no servers, no cloud.","archived":false,"fork":false,"pushed_at":"2026-05-13T19:21:28.000Z","size":3023,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-13T21:24:34.571Z","etag":null,"topics":["bm25","cli","embeddings","golang","hybrid-search","knowledge-base","local-first","personal-knowledge-management","rag","search","semantic-search","sqlite","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ugurcan-aytar.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-14T04:44:38.000Z","updated_at":"2026-05-13T19:21:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ugurcan-aytar/recall","commit_stats":null,"previous_names":["ugurcan-aytar/recall"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/ugurcan-aytar/recall","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ugurcan-aytar%2Frecall","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ugurcan-aytar%2Frecall/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ugurcan-aytar%2Frecall/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ugurcan-aytar%2Frecall/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ugurcan-aytar","download_url":"https://codeload.github.com/ugurcan-aytar/recall/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ugurcan-aytar%2Frecall/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33030380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","cli","embeddings","golang","hybrid-search","knowledge-base","local-first","personal-knowledge-management","rag","search","semantic-search","sqlite","vector-search"],"created_at":"2026-05-01T15:40:04.222Z","updated_at":"2026-05-14T15:01:03.944Z","avatar_url":"https://github.com/ugurcan-aytar.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.svg\" alt=\"recall\" width=\"420\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eLocal search engine for your notes and documents.\u003c/strong\u003e\u003cbr\u003e\n  Markdown, plain text, meeting transcripts, knowledge bases.\u003cbr\u003e\n  BM25 + vector + hybrid fusion in a single Go binary.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://go.dev\"\u003e\u003cimg src=\"https://img.shields.io/badge/go-1.24%2B-00ADD8?logo=go\" alt=\"Go 1.24+\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-green.svg\" alt=\"MIT License\"\u003e\u003c/a\u003e\n  \u003ca href=\"#installation\"\u003e\u003cimg src=\"https://img.shields.io/badge/platform-macOS%20%7C%20Linux-blue.svg\" alt=\"Platforms\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ugurcan-aytar/recall/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/ugurcan-aytar/recall/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"CONTRIBUTING.md\"\u003e\u003cimg src=\"https://img.shields.io/badge/PRs-welcome-brightgreen.svg\" alt=\"PRs welcome\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/quickstart.gif\" alt=\"recall in under a minute: collection add → index → search 'circuit breaker' (bolded matches) → get a single doc → recall doctor → recall status\" width=\"900\"\u003e\n\u003c/p\u003e\n\n---\n\n## What is recall?\n\nrecall is an on-device search engine for your personal knowledge base — markdown notes, plain-text files, meeting transcripts, journals, docs. Point it at a folder of writing and it gives you:\n\n- **BM25 full-text search** via SQLite FTS5 — fast, deterministic keyword matching.\n- **Vector semantic search** via sqlite-vec — find notes by meaning, not just exact words.\n- **Hybrid fusion** — BM25 and vector results combined with Reciprocal Rank Fusion and an adaptive score floor, so vague queries still surface their best match instead of returning nothing.\n- **Markdown-aware chunking** — 900-token chunks with 15% overlap, scored at natural break-points so headings stay with their bodies.\n- **Local embedding** — a ~146 MB GGUF model (nomic-embed-text-v1.5, Apache 2.0, ungated) runs via a llama.cpp `llama-server` subprocess on a local Unix socket. No API calls, no cloud — the first `recall embed` auto-fetches the ~40 MB llama.cpp prebuilt into `~/.recall/bin/llamacpp/\u003cversion\u003e/`, then everything is offline.\n\nEverything lives in one SQLite file and one Go binary. No servers, no Docker, no Node runtime, no Python runtime.\n\n\u003e **Also works with source code.** If you keep READMEs, design docs, and codebases in the same tree, recall indexes them alongside your notes and uses tree-sitter for AST-aware chunking on Go / Python / TypeScript / JavaScript / Java / Rust. Notes are the main use case; code is a natural extension.\n\n## Who is this for?\n\n- You've built up a `~/notes` folder over years and can't find things in it.\n- Your team's knowledge base is a pile of markdown across a few repos.\n- You run meeting transcripts through a transcription tool and now have hundreds of `.md` files you'd like to query.\n- You want \"grep for ideas, not just strings\" without handing your notes to a cloud service.\n\n## Why another search engine?\n\nKnowledge-base search usually pushes you toward a hosted vector DB, a Node or Python stack, and an OpenAI key. recall takes the opposite position:\n\n- **Local-first, always.** The default path makes zero HTTP requests. Your notes never leave your machine.\n- **One binary.** `make build` and you're done — no external services to keep alive.\n- **Incremental.** Edit one paragraph; recall re-embeds one chunk, not the whole file.\n- **Adaptive scoring.** Vague queries still surface their best result instead of returning an empty list.\n- **Library-first.** The CLI is a thin wrapper over `pkg/recall`, so other Go programs can embed the same engine.\n\nrecall is designed to be boring infrastructure for your notes. Index once, search forever.\n\n## Quick start\n\n```bash\n# 1. Install\nbrew install ugurcan-aytar/recall/recall    # or grab a binary from releases\n\n# 2. Point recall at a folder of notes\nrecall collection add ~/notes --name notes  # or any folder of .md / .txt files\nrecall index\n\n# 3. Full-text search (works immediately, no model needed)\nrecall search \"rate limiter\"\n\n# 4. Search syntax: quoted phrases and exclusion\nrecall search '\"circuit breaker\" timeout -redis'\n#      └── matches docs with the exact phrase \"circuit breaker\"\n#          and the word \"timeout\", excluding anything that mentions redis.\n\n# 5. (Optional) hybrid BM25 + vector — needs an embedding backend\nexport RECALL_EMBED_PROVIDER=openai\nexport OPENAI_API_KEY=sk-…\nrecall embed\nrecall query \"what did we decide about the launch date\" --explain\n```\n\nWant to try recall without pointing it at your own data? Clone the repo and use the bundled `examples/` folder of fictional meeting notes, runbook, journal, and incident report:\n\n```bash\ngit clone https://github.com/ugurcan-aytar/recall.git\nrecall collection add ./recall/examples --name demo\nrecall index\nrecall search \"circuit breaker\"\n```\n\nTo use recall on your own notes, replace `./examples` with `~/notes` (or wherever your markdown lives).\n\n## How retrieval works\n\nrecall implements the same retrieval pipeline most modern hybrid search systems use, in pure Go, on top of SQLite. Retrieval (BM25, vector KNN, RRF fusion) runs in-process. Embedding and generation shell out to the official llama.cpp `llama-server` over a local Unix socket — auto-downloaded on first use, no CGo on the inference path.\n\nThe default path (`recall query \"\u003cq\u003e\"`) is the lean BM25 + vector + RRF flow on the left of the diagram below. The dashed boxes — query expansion, HyDE, and the position-aware reranker — are opt-in via `--expand`, `--hyde`, and `--rerank`. Each opt-in stage gracefully no-ops when its required model isn't available, so the dashed path is never load-bearing.\n\n```mermaid\nflowchart TD\n    Q[\"\u003cb\u003equery\u003c/b\u003e\u003cbr\u003e\u003ci\u003erate limiter exhaustion\u003c/i\u003e\"]\n\n    Q -. \"\u003cb\u003e--expand\u003c/b\u003e\" .-\u003e EXP\n    Q -. \"\u003cb\u003e--hyde\u003c/b\u003e\" .-\u003e HYD\n    EXP[\"\u003cb\u003eexpansion LLM\u003c/b\u003e\u003cbr\u003e\u003ci\u003eqmd-query-expansion-1.7B\u003c/i\u003e\u003cbr\u003eemits lex / vec / hyde lines\"]\n    HYD[\"\u003cb\u003ehypothetical doc\u003c/b\u003e\u003cbr\u003eembedded as a passage\"]\n    EXP -. lex variants .-\u003e BM25\n    EXP -. vec variants .-\u003e VEC\n    HYD -. extra vector query .-\u003e VEC\n\n    Q --\u003e BM25\n    Q --\u003e VEC\n    subgraph parallel [\"parallel goroutines · per-variant fan-out\"]\n        direction LR\n        BM25[\"\u003cb\u003eFTS5 BM25\u003c/b\u003e\u003cbr\u003equoted phrases\u003cbr\u003e+ -term negation\"]\n        VEC[\"\u003cb\u003evec0 KNN\u003c/b\u003e\u003cbr\u003ecosine distance\"]\n    end\n\n    BM25 -. \"merge same-side\u003cbr\u003e(2× weight on original)\" .-\u003e SBM\n    VEC -. \"merge same-side\u003cbr\u003e(2× weight on original)\" .-\u003e SVE\n    SBM[\"\u003cb\u003eBM25 lists fused\u003c/b\u003e\"]\n    SVE[\"\u003cb\u003evector lists fused\u003c/b\u003e\"]\n\n    BM25 --\u003e RRF\n    VEC --\u003e RRF\n    SBM --\u003e RRF\n    SVE --\u003e RRF\n\n    RRF[\"\u003cb\u003eRRF fusion (k=60)\u003c/b\u003e\u003cbr\u003e+ top-rank bonus\u003cbr\u003e+ adaptive min-score floor\"]\n\n    RRF --\u003e BLEND\n    RRF -. \"\u003cb\u003e--rerank\u003c/b\u003e\" .-\u003e RER\n    RER[\"\u003cb\u003ecross-encoder reranker\u003c/b\u003e\u003cbr\u003e\u003ci\u003ebge-reranker-v2-m3\u003c/i\u003e\u003cbr\u003ebatched /v1/rerank · top-30 candidates\"]\n    RER -. continuous logits\u003cbr\u003e(min-max normalised) .-\u003e BLEND\n\n    BLEND{\"\u003cb\u003eposition-aware blend\u003c/b\u003e\u003cbr\u003etop 1-3: 75/25 RRF/rerank\u003cbr\u003etop 4-10: 60/40\u003cbr\u003etop 11+: 40/60\u003cbr\u003e\u003ci\u003e(skipped without --rerank)\u003c/i\u003e\"}\n    BLEND --\u003e R[\"\u003cb\u003eresults, ranked\u003c/b\u003e\u003cbr\u003ewith \u003ccode\u003e--explain\u003c/code\u003e trace\"]\n\n    classDef step fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#1f2937\n    classDef result fill:#dcfce7,stroke:#15803d,stroke-width:2px,color:#1f2937\n    classDef query fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1f2937\n    classDef llm fill:#fef3c7,stroke:#b45309,stroke-width:2px,color:#1f2937,stroke-dasharray:5 3\n    classDef merge fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#1f2937\n    class Q query\n    class BM25,VEC,RRF step\n    class EXP,HYD,RER llm\n    class SBM,SVE,BLEND merge\n    class R result\n```\n\n### BM25\n\n`recall search` runs SQLite FTS5's `bm25()` ranking function over every indexed document. FTS5 returns negative scores (lower = better); recall flips them positive for display. Snippets come from FTS5's built-in `snippet()` with bold ANSI markers around matched terms (suppressed when `NO_COLOR` is set).\n\nPerformance: ~5 ms wall time per query on a 79-document / 189-chunk corpus including process startup. The actual SQL runs in microseconds.\n\n### Vector\n\n`recall vsearch` embeds the query (`search_query: …` prefix per nomic's required format), then runs a KNN cosine-distance query against the `chunk_embeddings` `vec0` virtual table. The store keeps **one row per document** (best chunk wins) so results are document-level, not chunk-level.\n\nSimilarity is reported as `1 / (1 + distance)` so closer = higher score, easy to compare against BM25's positive-flipped values.\n\n### RRF Fusion\n\n`recall query` runs BM25 and vector concurrently in two goroutines, then fuses the two ranked lists with **Reciprocal Rank Fusion** (Cormack et al. 2009). The classic RRF score for a document `d` is:\n\n```\nscore(d) = Σ  1 / (k + rank(d, list))\n         lists\n```\n\nwith `k = 60` (the canonical TREC value). recall layers two practical refinements on top:\n\n| Refinement | Value | Purpose |\n|---|---|---|\n| Top-rank bonus | +0.05 if doc is rank 1 in either list, +0.02 for ranks 2–3 | Boost results that one of the rankers considers obviously best |\n| Adaptive min-score floor | drop results below `0.4 × top.Score` | Trim long-tail noise on weak queries — but the floor is *relative*, so weak-but-best results survive instead of returning empty |\n\n`recall query --explain` prints the per-result trace so you can see exactly why each document landed where it did:\n\n```\nnotes/incident-2026-03-22.md  #07d4c5  (score 0.08)\n[explain 1] notes/incident-2026-03-22.md  rrf=0.0328 bonus=0.0500 floor=0.0331 bm25_rank=1 vec_rank=1\n```\n\n### Chunking\n\nDocuments bigger than the target are split with break-point scoring: H1 = 100, H2 = 90, code fence = 80, blank line = 20, list item = 5, regular line break = 1. The chunker hunts back ≤200 tokens for the highest-scoring break point near the cut point, then echoes the last 15% of each chunk into the next so context spans the seam. Code fences are never split mid-block.\n\nFor source files (`.go`, `.py`, `.ts`, `.js`, `.java`, `.rs`) the chunker uses `tree-sitter` to cut at function / method / class / impl / import boundaries instead. Languages without a tree-sitter grammar fall back to the markdown chunker silently.\n\nDefault target is **900 estimated tokens** (nomic-embed-text-v1.5's training context is 2048; the words×1.3 estimator under-counts BERT WordPieces by ~2× worst-case, leaving comfortable margin). Pre-v0.2.7 the target was 384, pinned by the in-process backend's 512-token `n_ubatch` cap; the subprocess pattern removed that cap. Override with `--chunk-strategy` on `recall embed`.\n\n### Incremental re-embedding\n\nWhen you edit a file and re-run `recall index`, only the chunks whose `content_hash` changed get re-embedded. Editing one paragraph of a 50-page note costs ~1 chunk re-embed, not 50.\n\n## Commands\n\n| Command | What it does |\n|---|---|\n| `recall collection add \u003cpath\u003e` | Register a folder as a collection |\n| `recall collection remove \u003cname\u003e` | Remove a collection |\n| `recall collection list` | List registered collections |\n| `recall collection rename \u003cold\u003e \u003cnew\u003e` | Rename a collection |\n| `recall ls [collection[/path]]` | List files in a collection |\n| `recall index` | Re-scan and index all collections |\n| `recall index --pull` | `git pull` each collection before re-indexing |\n| `recall embed` | Generate vector embeddings for chunks |\n| `recall embed -f` | Force re-embed everything |\n| `recall search \"\u003cquery\u003e\"` | BM25 full-text search |\n| `recall vsearch \"\u003cquery\u003e\"` | Vector semantic search |\n| `recall query \"\u003cquery\u003e\"` | Hybrid: BM25 + vector + RRF fusion |\n| `recall query \"\u003cquery\u003e\" --explain` | Hybrid + per-result RRF / bonus / floor / rank trace |\n| `recall query \"\u003cquery\u003e\" --expand` | Query expansion via qmd-query-expansion-1.7B — lex + vec variants searched in parallel, merged |\n| `recall query \"\u003cquery\u003e\" --hyde` | Hypothetical Document Embedding — LLM writes an \"ideal answer,\" recall embeds it as an extra vector probe |\n| `recall query \"\u003cquery\u003e\" --rerank` | Cross-encoder rerank top-30 via bge-reranker-v2-m3, position-aware blend with RRF |\n| `recall --index \u003cname\u003e ...` | Use a named isolated index at `~/.recall/indexes/\u003cname\u003e.db` instead of the default DB |\n| `recall bench \u003cfile.jsonl\u003e` | Retrieval quality benchmark — precision@k, recall@k, MRR across `bm25 \\| vector \\| hybrid` modes |\n| `recall get \u003cpath \\| #docid\u003e` | Retrieve a single document |\n| `recall multi-get \u003cpattern\u003e` | Batch retrieve by glob or list |\n| `recall context add [path] \"text\"` | Add descriptive context for a path |\n| `recall context list` / `rm` / `check` | Manage path contexts |\n| `recall status` | Index health, collection sizes, doc / chunk / embedding counts |\n| `recall doctor` | Verify database, schema, embedding backend |\n| `recall models` / `models download` / `models path` | List, fetch, or locate GGUF models |\n| `recall cleanup` | Drop orphan chunks + stale embeddings, run SQLite VACUUM |\n| `recall version` | Print version, build date, commit, Go version |\n\nShared search flags: `-n`, `-c/--collection` (comma-separated for multi-collection), `--all`, `--min-score`, `--full`, `--line-numbers`, `--explain`. Output formats: `--json`, `--csv`, `--md`, `--xml`, `--files`.\n\n### Multi-collection cross-search\n\nIndex two repos as separate collections, then search either one, the other, or both at once with `-c repo1,repo2`:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/multi-collection.gif\" alt=\"multi-collection: notes alone, src alone, then both with -c notes,src\" width=\"900\"\u003e\n\u003c/p\u003e\n\n## Architecture\n\n```\nrecall\n├── cmd/recall/        # CLI entry point (thin main.go)\n├── internal/\n│   ├── commands/      # One Cobra command per file\n│   ├── store/         # SQLite + FTS5 + sqlite-vec + RRF fusion\n│   ├── chunk/         # Markdown chunker + tree-sitter AST chunker for code\n│   └── embed/         # Local GGUF backend + optional API fallback\n└── pkg/recall/        # Public Go API (for library consumers)\n```\n\nThe pieces:\n\n- **store** — `mattn/go-sqlite3` (with the `sqlite_fts5` build tag) plus `asg017/sqlite-vec-go-bindings/cgo`. WAL mode, 64 MB cache, prepared statements cached for the BM25 hot path.\n- **chunk** — markdown break-point scoring is the default. Code files route through `smacker/go-tree-sitter` for AST-aware cuts. Strategy is overridable via `--chunk-strategy auto|regex|ast`.\n- **embed** — local GGUF via the official llama.cpp prebuilt `llama-server` binary running as a subprocess on a Unix socket. recall downloads the platform-appropriate llama.cpp release on first `recall embed` (~30-40 MB, one-time, into `~/.recall/bin/llamacpp/\u003cversion\u003e/`), spawns it with `--embedding`, and talks to `/v1/embeddings` (OpenAI-compatible batch input) over the socket. The model loads once for the lifetime of the embed run; HTTP round-trips on a local socket are negligible. BM25-only commands never start a subprocess. Optional API fallback (`RECALL_EMBED_PROVIDER=openai|voyage`) is opt-in only and never default.\n- **pkg/recall** — a stable facade (`NewEngine`, `SearchBM25`, `SearchVector`, `SearchHybrid`, `Index`, `Embed`, `Get`, …) that external Go consumers import.\n\n## Installation\n\n### Homebrew (macOS, Linux)\n\n```bash\nbrew install ugurcan-aytar/recall/recall\n```\n\n### One-line install script\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ugurcan-aytar/recall/main/install.sh | bash\n```\n\nThe script picks the right pre-built tarball for your OS / arch from the [latest release](https://github.com/ugurcan-aytar/recall/releases/latest), verifies its SHA-256, and installs to `/usr/local/bin` (root) or `~/.local/bin` (user).\n\n### Pre-built binary\n\nGrab a tarball directly from the [releases page](https://github.com/ugurcan-aytar/recall/releases/latest), extract it, and drop the `recall` binary anywhere on your `$PATH`. Currently shipped: `darwin_arm64`, `linux_amd64`. SHA-256 sums in `checksums.txt`.\n\n**Linux runtime dep:** `recall embed` spawns llama.cpp's prebuilt `llama-server`, whose CPU backend plugins (`libggml-cpu-*.so`) link OpenMP. On a normal workstation you'll already have it via `gcc`/`g++`; minimal container bases (`ubuntu:24.04` without build-essential, Alpine, etc.) need it added explicitly:\n\n```bash\n# Debian/Ubuntu\napt-get install -y libgomp1\n\n# Fedora/RHEL\ndnf install -y libgomp\n\n# Alpine\napk add libgomp\n```\n\nWithout it the llama.cpp backend loader silently falls through with \"no CPU backend found\" and `recall embed` times out at 60 s.\n\n### From source\n\nFor contributors and anyone on a platform without a pre-built binary:\n\n```bash\ngit clone https://github.com/ugurcan-aytar/recall.git\ncd recall\nmake build         # → ./recall\n```\n\nRequires Go 1.24+ with CGo enabled (the default). If you `go install` instead of cloning, you need the `sqlite_fts5` tag — recall hard-fails with an actionable error otherwise:\n\n```bash\ngo install -tags sqlite_fts5 github.com/ugurcan-aytar/recall/cmd/recall@latest\n```\n\n## Vector / hybrid search (optional)\n\n`recall search` (BM25 full-text) works the moment you install — no model, no API. `recall vsearch` and `recall query` need an **embedding backend**. Pick whichever fits how you installed:\n\n**Brew / pre-built users**: the bottle ships **without** the local GGUF model (~146 MB on disk, won't fit in a tap). Use an API:\n\n```bash\nexport RECALL_EMBED_PROVIDER=openai     # or voyage\nexport OPENAI_API_KEY=sk-…              # or VOYAGE_API_KEY\nrecall embed\nrecall query \"your question\"\n```\n\n`RECALL_EMBED_PROVIDER` defaults to `local`, so the API path is opt-in only — recall never sends data anywhere unless you explicitly set the env var.\n\n**Pre-built binaries include local embedding AND local generation.** No source build needed. On the first `recall embed`, recall downloads the official llama.cpp prebuilt `llama-server` (~30-40 MB) into `~/.recall/bin/llamacpp/\u003cversion\u003e/` and the embedding model itself (use `recall models download` for that — ~146 MB for nomic-embed). Generation features (`--expand` / `--rerank` / `--hyde`) talk to the same binary via a separate subprocess per model: `recall models download --expansion` (~1.3 GB) and `recall models download --reranker` (~1.1 GB). After the first run everything is offline.\n\nRun `recall doctor` any time to see which backend the current binary will use.\n\n## Configuration\n\n| Environment variable | Default | Purpose |\n|---|---|---|\n| `RECALL_DB_PATH` | `~/.recall/index.db` | SQLite database location |\n| `RECALL_MODELS_DIR` | `~/.recall/models/` | GGUF model storage |\n| `RECALL_EMBED_PROVIDER` | `local` | `local` (default), `openai`, or `voyage` |\n| `RECALL_EMBED_MODEL` | `nomic-embed-text-v1.5.Q8_0.gguf` | Override the local GGUF — bare filename joined with `RECALL_MODELS_DIR`, or absolute path |\n| `RECALL_EMBED_PROMPT_FORMAT` | _detected from filename_ | Force a prompt family — `nomic`, `gemma` / `embeddinggemma`, `qwen` / `qwen3`, or `generic` / `raw` / `none` |\n| `RECALL_EMBED_WORKERS` | `1` | Parallel embedder workers. Local backend loads N model instances (~146 MB each); API backend fires N concurrent HTTP requests. Capped at 8. |\n| `RECALL_EXPAND_MODEL` | `qmd-query-expansion-1.7B-q4_k_m.gguf` | Override the GGUF used by `--expand` and (eventually) `--hyde`. Bare filename joins with `RECALL_MODELS_DIR`; absolute path passes through. |\n| `RECALL_RERANK_MODEL` | `qwen2.5-1.5b-instruct-q4_k_m.gguf` | Override the GGUF used by `--rerank`. Same path-resolution rules as `RECALL_EXPAND_MODEL`. |\n| `OPENAI_API_KEY` | — | Only read when `RECALL_EMBED_PROVIDER=openai` |\n| `VOYAGE_API_KEY` | — | Only read when `RECALL_EMBED_PROVIDER=voyage` |\n| `NO_COLOR` | — | Set to any value to disable ANSI colors |\n\n### Bringing your own embedding model\n\nThe default `nomic-embed-text-v1.5` covers most use cases, but you can\npoint recall at a different GGUF without rebuilding:\n\n```sh\n# Drop a model into RECALL_MODELS_DIR (default ~/.recall/models/)\nmv ~/Downloads/embeddinggemma-300m.Q8_0.gguf ~/.recall/models/\n\n# Tell recall to use it\nexport RECALL_EMBED_MODEL=embeddinggemma-300m.Q8_0.gguf\nrecall embed -f          # -f drops old vectors that were embedded with nomic\nrecall query \"...\"\n```\n\nRecall auto-detects the prompt family from the filename — `nomic-`,\n`embeddinggemma-`, and `Qwen3-Embedding-` patterns are recognised and\nget the right `task: …` / `Instruct: …` / `search_query: …` prefix\napplied. Set `RECALL_EMBED_PROMPT_FORMAT` if you have a model whose\nfilename doesn't hint at its family. The model dimension must match\nrecall's vec0 schema (768d); embedders that return a different width\nwill fail at `recall embed` with a clear error.\n\n### Query expansion (`--expand`)\n\n`recall query` runs BM25 + vector + RRF on the user's literal query.\nFor natural-language questions where the query and the doc use\ndifferent vocabulary (\"decide\" vs \"decisions\"), even hybrid search can\nmiss obvious matches. `--expand` asks a small local LLM to rewrite the\nquery into a few BM25-friendly keyword variants and a few semantic\nphrasings, then fuses the extra retrieval lists with a 2× weight on\nthe user's original phrasing so aggressive variants can't out-vote it.\n\n```sh\n# One-time: download the expansion model (qmd-query-expansion-1.7B,\n# MIT-licensed, ungated, ~1.3 GB GGUF). Same library backs --expand\n# now and --hyde when that lands.\nrecall models download --expansion\n\n# Use the flag on any hybrid query.\nrecall query --expand \"what did the team decide about authentication\"\n\n# Optional intent line to disambiguate two-word nouns (\"performance\"\n# could mean web latency, financial returns, athletic, …).\nrecall query --expand --intent \"web page latency\" \"performance\"\n```\n\nThe flag is opt-in for two reasons: the expansion model is an extra\n~1.3 GB download, and each `--expand` query pays one LLM-inference\nroundtrip (~1-3s on a modern laptop). Without the flag — or when the\nmodel isn't downloaded — `recall query` runs the original\nsingle-query hybrid path with zero LLM cost.\n\nTo swap the model out for a different generation GGUF (e.g. you want\nto A/B against Qwen2.5-1.5B-Instruct), drop it under\n`RECALL_MODELS_DIR` and set `RECALL_EXPAND_MODEL=my-llm.gguf`. Output\nparsing expects `lex: …` / `vec: …` / `hyde: …` lines (the qmd\nexpansion-model format); a model that emits anything else won't\ncrash but won't produce useful variants either.\n\n### HyDE — embedding hypothetical answers (`--hyde`)\n\n`--hyde` (Hypothetical Document Embedding) asks the same expansion\nLLM to write a short imagined answer to the query, then embeds that\nhypothetical passage in the same vector space as your real\ndocuments. Vector search runs against both the real query embedding\nAND the hypothetical-answer embedding; results are RRF-fused.\n\nThe intuition: if the user types \"what's the recovery timeout for\nthe circuit breaker pattern?\", a real document about circuit\nbreakers is more likely to land near a *hypothetical answer* about\ncircuit breakers than near the bare *question*. The hypothetical\nacts as a richer query.\n\n```sh\n# Same model + download as --expand:\nrecall models download --expansion\n\nrecall query --hyde \"what is the recovery timeout for the circuit breaker\"\nrecall query --expand --hyde \"what did the team decide about authentication\"\n```\n\n`--expand` and `--hyde` share one LLM call when both flags are on\n— the qmd-format model emits `lex / vec / hyde` in a single\nresponse, so there's no extra cost for combining them.\n\n**Auto-intent fix**: when you target a single collection (`-c`)\nand that collection has a context blurb (`recall collection add\n--context \"…\"`), recall passes the blurb as `Query intent: …` to\nthe LLM. qmd skipped this for HyDE generation and the resulting\nhypothetical passages were noticeably less on-topic; recall fixes\nthat. An explicit `--intent` flag still wins over the auto-fill.\n\n### Reranking (`--rerank`)\n\n`--rerank` sends the top-N RRF results through a cross-encoder\nreranker that scores every (query, passage) pair directly and\nreturns a continuous relevance logit per candidate. The scores\nare min-max normalised into [0, 1] within the candidate set,\nthen fused with RRF rank via the position-aware blender below.\n\n```sh\n# One-time: download the reranker model (bge-reranker-v2-m3,\n# Apache 2.0, ungated, ~418 MB at Q4_K_M).\nrecall models download --reranker\n\n# Use the flag on any hybrid query.\nrecall query --rerank \"circuit breaker recovery timeout\"\n\n# Combine with --expand for the full pipeline.\nrecall query --expand --rerank \"what did the team decide about authentication\"\n\n# Tune how many RRF hits get reranked (default 30).\nrecall query --rerank --rerank-top-n 50 \"...\"\n```\n\nThe flag is opt-in because of the extra ~418 MB download. At\nruntime the reranker is one batched HTTP call against\n`llama-server`'s `/v1/rerank` endpoint — typical cost is 1-2 s\ntotal for 30 candidates on Apple Silicon (model load + single\nround-trip). Without the flag the query path stays untouched.\n\n**Why bge-reranker-v2-m3?** It's the reference model llama.cpp\nPR #9510 used when it added reranking support to\n`libllama`/`llama-server`/`llama-embedding`. Apache 2.0, 568 M\nparams, multilingual, explicitly supported via the\n`--reranking` flag. Continuous logits in practice span roughly\n[-12, +8] — real gradient signal, unlike the binary yes/no\nprompt fallback recall shipped pre-v0.2.4.\n\n**Position-aware blending**: raw cross-encoder logits alone\nwould throw away the RRF signal. Instead, recall normalises\nthe logits into [0,1] and combines them with RRF rank using\nweights that depend on the candidate's RRF rank — top-3 hits\nget a 75/25 RRF/reranker mix, ranks 4-10 get 60/40, ranks 11+\nget 40/60. A strong RRF hit the reranker disagrees with still\nscores ~0.75 (RRF protects high-confidence retrieval), while a\ndeep-tail candidate the reranker confidently approves can land\nnear the top (reranker rescues good recall-misses). Brain-style\nconsumers can retune the bands via\n`recall.DefaultRerankBlendBands`.\n\n### Speeding up `recall embed` with parallel workers\n\nBy default `recall embed` runs one chunk through the model at a time —\nsafe everywhere, no extra RAM. For larger corpora (1k+ chunks) you can\nopt into a worker pool:\n\n```sh\n# Local GGUF backend: each worker mmaps its own model instance\n# (~146 MB for nomic Q8). 4 workers ≈ 600 MB extra RAM, ~3-4× speedup\n# on a multi-core machine.\nRECALL_EMBED_WORKERS=4 recall embed\n# or per-invocation:\nrecall embed --workers 4\n\n# API backend (OpenAI / Voyage): each worker fires one in-flight HTTP\n# request. Network round-trip dominates so 4-8 workers usually saturates\n# the bottleneck without tripping provider rate limits.\nRECALL_EMBED_PROVIDER=openai RECALL_EMBED_WORKERS=8 recall embed\n```\n\nBoth backends cap at 8 workers internally so a typo'd\n`RECALL_EMBED_WORKERS=64` won't OOM your laptop or get you rate-limited.\nThe single-worker default (`workers=0` or `1`) is identical to the v0.1\nbehaviour — no goroutines, no extra model loads.\n\n## Using recall as a Go library\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n\n    \"github.com/ugurcan-aytar/recall/internal/embed\"\n    \"github.com/ugurcan-aytar/recall/pkg/recall\"\n)\n\nfunc main() {\n    eng, err := recall.NewEngine(recall.WithDBPath(\"./index.db\"))\n    if err != nil {\n        log.Fatal(err)\n    }\n    defer eng.Close()\n\n    if _, err := eng.AddCollection(\"notes\", \"/path/to/notes\", \"\", \"team notes\"); err != nil {\n        log.Fatal(err)\n    }\n    if _, err := eng.Index(); err != nil {\n        log.Fatal(err)\n    }\n\n    // For vector search, supply any Embedder. Production code uses\n    // embed.NewLocalEmbedder (GGUF) or embed.NewAPIEmbedder. Tests use\n    // embed.NewMockEmbedder.\n    emb := embed.NewMockEmbedder(0)\n    defer emb.Close()\n    if _, err := eng.Embed(emb, false); err != nil {\n        log.Fatal(err)\n    }\n\n    results, err := eng.SearchHybrid(emb, \"Q3 launch decisions\", recall.WithLimit(10))\n    if err != nil {\n        log.Fatal(err)\n    }\n    for _, r := range results {\n        fmt.Printf(\"%s/%s  score=%.4f\\n\", r.CollectionName, r.Path, r.FusedScore)\n    }\n}\n```\n\nThe public API lives in `pkg/recall`. `internal/` is off-limits for external consumers.\n\n## Status\n\nrecall is pre-1.0. CLI flags, the public Go API, and the SQLite schema may shift between minor versions; semver patches stay backwards-compatible.\n\n## Contributing\n\nBug reports, feature requests, and PRs are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the dev setup, CGo build notes, and commit conventions. Security issues: see [SECURITY.md](SECURITY.md).\n\n## Credits\n\nrecall's architecture is inspired by [qmd](https://github.com/tobi/qmd) by Tobi Lütke — the chunking strategy (900-token target with break-point scoring), RRF fusion with top-rank bonus + adaptive min-score, and the overall shape of the CLI all trace back to that project.\n\nrecall diverges in a few places:\n\n- **Go, not TypeScript/Bun.** Single static binary, no Node runtime, importable as a library via `pkg/recall` (brain is the primary library consumer).\n- **SQLite-vec for vector search** (`asg017/sqlite-vec-go-bindings`) in the same `mattn/go-sqlite3` database as FTS5 BM25 — one file, one connection, one transaction boundary across hybrid queries.\n- **AST-aware code chunking from day one** via `smacker/go-tree-sitter` — Go, Python, TypeScript, Java, Rust parse into syntax-respecting chunks instead of regex-sliced windows.\n- **Incremental embedding** — `chunks.content_hash` gates the re-run, so only modified chunks re-embed on `recall embed`. Full-corpus re-embeds happen only with `-f`.\n- **Subprocess-based inference** — local embedding and generation both shell out to llama.cpp's official `llama-server` prebuilt on a Unix socket (auto-downloaded on first use, ~21 MB extracted, no CGo on the inference path). Gives recall real cross-encoder reranking via `/v1/rerank` and chat-template-aware generation via `/v1/chat/completions`, without vendoring any llama.cpp binding.\n- **Cross-encoder reranker**: [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) at Q4_K_M (~418 MB, Apache 2.0, multilingual 568M params). Continuous 0-1 relevance scoring, position-aware blended with RRF rank (top-3: 75/25, ranks 4-10: 60/40, ranks 11+: 40/60).\n- **Query expansion + HyDE**: [qmd-query-expansion-1.7B](https://huggingface.co/tobil/qmd-query-expansion-1.7B-gguf) via Tobi — the expansion model from qmd is what powers `recall query --expand` and the hypothetical passages for `--hyde`.\n- **Named indexes via `--index`** — isolated SQLite DBs at `~/.recall/indexes/\u003cname\u003e.db` so users can keep `work` / `personal` / `code` knowledge bases cleanly separated from one binary.\n- **`recall bench`** — JSONL-driven IR quality benchmark with Precision@k, Recall@k, MRR across `bm25 | vector | hybrid` modes.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fugurcan-aytar%2Frecall","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fugurcan-aytar%2Frecall","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fugurcan-aytar%2Frecall/lists"}