{"id":47898096,"url":"https://github.com/roomi-fields/rtfm","last_synced_at":"2026-05-24T09:01:53.516Z","repository":{"id":339670212,"uuid":"1162926520","full_name":"roomi-fields/rtfm","owner":"roomi-fields","description":"The open retrieval layer for AI coding agents. Indexes code, docs, legal, research, data — 22 parsers (incl. EPUB, DOCX, ODT), FTS5 + semantic search, knowledge graph. Serves surgical context via MCP. Open source, local, free.","archived":false,"fork":false,"pushed_at":"2026-05-21T14:02:57.000Z","size":2826,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-21T22:46:17.729Z","etag":null,"topics":["ai-agents","claude","claude-code","code-search","context-engineering","developer-tools","embeddings","fts","json-schema","knowledge-base","knowledge-graph","mcp","mcp-server","notebooklm","obsidian","python","rag","retrieval","semantic-search","sqlite"],"latest_commit_sha":null,"homepage":"https://roomi-fields.github.io/rtfm/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roomi-fields.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-20T21:37:42.000Z","updated_at":"2026-05-21T14:08:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/roomi-fields/rtfm","commit_stats":null,"previous_names":["roomi-fields/rtfm"],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/roomi-fields/rtfm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roomi-fields%2Frtfm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roomi-fields%2Frtfm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roomi-fields%2Frtfm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roomi-fields%2Frtfm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roomi-fields","download_url":"https://codeload.github.com/roomi-fields/rtfm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roomi-fields%2Frtfm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33427584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T22:14:44.296Z","status":"online","status_checked_at":"2026-05-24T02:00:06.296Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude","claude-code","code-search","context-engineering","developer-tools","embeddings","fts","json-schema","knowledge-base","knowledge-graph","mcp","mcp-server","notebooklm","obsidian","python","rag","retrieval","semantic-search","sqlite"],"created_at":"2026-04-04T03:56:15.921Z","updated_at":"2026-05-24T09:01:53.510Z","avatar_url":"https://github.com/roomi-fields.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- mcp-name: io.github.roomi-fields/rtfm --\u003e\n\u003cdiv align=\"center\"\u003e\n\n# RTFM\n\n***Retrieve The Forgotten Memory***\n\n### The open retrieval layer your AI agent was missing\n\nIndex everything in your project — code, docs, PDFs, legal texts, research, data — and your agent finds the right context instantly. No hallucinations. No cloud. No API costs.\n\n**`Free · Local · Open Source · MIT`**\n\n\u003cbr\u003e\n\n![RTFM vs vanilla Claude Code — same task, same model, who pays the bill?](docs/demo/rtfm-split.png)\n\n\u003cbr\u003e\n\n[![PyPI version](https://badge.fury.io/py/rtfm-ai.svg)](https://pypi.org/project/rtfm-ai/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/) [![MCP](https://img.shields.io/badge/MCP-2026-green.svg)](https://modelcontextprotocol.io/) [![Claude Code](https://img.shields.io/badge/Claude_Code-MCP-8A2BE2)](https://claude.ai/claude-code) [![GitHub stars](https://img.shields.io/github/stars/roomi-fields/rtfm?style=social)](https://github.com/roomi-fields/rtfm)\n\n\u003c/div\u003e\n\n---\n\n\u003c!-- ─────────── TIER 1 — Pain \u0026 promise ─────────── --\u003e\n\n## The problem\n\nYour AI agent is flying blind.\n\nIt greps through thousands of files, misses the doc that answers the question, invents modules that don't exist, forgets what you decided last session. The bigger the project, the worse it gets. You've added a smarter model. It didn't help. Because the bottleneck isn't intelligence — it's **retrieval**.\n\nCode indexers (Augment, Sourcegraph, Cursor) only see code. But your project isn't just code. It's specs, PRs, architecture decisions, research papers, PDFs, regulations, vault notes — the context your agent needs to stop guessing.\n\n### Why I built this\n\nI was writing a French tax article (~50 pages of regulatory text, cross-references between code articles, case law, administrative doctrine). Claude Code kept grep-ing the same directories in loops, running out of context, and producing confidently wrong citations. I'd added more memory, better prompts, a smarter model. None of it worked, because the agent wasn't reasoning badly — it just couldn't *find* the right paragraph in a 2,000-file legal corpus. So I stopped trying to make the model smarter and built the layer it was missing. That's RTFM.\n\n## The solution\n\n**RTFM indexes everything.** One command, one SQLite file, one retrieval layer your agent queries before grepping.\n\n```bash\npip install rtfm-ai \u0026\u0026 cd your-project \u0026\u0026 rtfm init\n```\n\n30 seconds. Claude Code now searches your indexed knowledge base — code *and* docs *and* PDFs *and* whatever else you drop in — with full-text, semantic, or hybrid search. The agent sees 300 tokens of metadata first, then expands only what's relevant. Progressive disclosure instead of context dumps.\n\n\u003e **Free. Runs locally. No API keys. No cloud. Your data stays yours.**\n\n### What it looks like\n\n```text\n$ rtfm search \"authentication flow\" --limit 3\n[1] src/auth/handlers.py \u003e authenticate_user (p.2)    score 9.12\n    src/auth/handlers.py:147  42 lines\n[2] docs/architecture/auth.md \u003e SSO flow (p.1)        score 7.84\n    docs/architecture/auth.md:1   23 lines\n[3] docs/ADR/0007-oauth.md \u003e Decision (p.1)           score 6.90\n    docs/ADR/0007-oauth.md:12  18 lines\n```\n\nThree results, ~300 tokens. The agent decides what to read next with `rtfm_expand(source, target_section)` — not a context dump, a conversation.\n\n---\n\n## Quick start\n\n### Recommended — Claude Code plugin\n\nIn Claude Code (CLI or Desktop **Code** tab) :\n\n```\n/plugin marketplace add roomi-fields/claude-plugins\n/plugin install rtfm@roomi-fields\n```\n\nRTFM is distributed via the [`roomi-fields/claude-plugins`](https://github.com/roomi-fields/claude-plugins) marketplace, which also ships [`notebooklm-mcp`](https://github.com/roomi-fields/notebooklm-mcp) for citation-backed Q\u0026A. To grab both at once:\n\n```\n/plugin install notebooklm@roomi-fields\n```\n\nThat's it. The plugin auto-initializes each project on first use:\n- Creates `.rtfm/library.db` (one SQLite file)\n- Injects search instructions into `CLAUDE.md`\n- Pre-grants permission for the MCP tools (no prompt every search)\n- Indexes the project on the first prompt, re-indexes incrementally on every prompt\n\n**No `pip install` required.** Pure Python, runs on Linux / macOS / Windows / WSL with Python 3.10+ already on PATH. The plugin bundles its own MCP server (no `mcp` SDK dep) and resolves `python3` / `python` / `py` automatically.\n\nThen say to Claude: *\"Find the authentication flow\"* — it uses `rtfm_search` instead of grepping.\n\n### Optional extras (semantic search, PDF parsing)\n\nThe core plugin is dependency-free. Heavier optional extras (embedding model, PDF parsers) install on demand into an isolated venv inside the plugin's data directory — no pollution of your system Python, no PEP 668 conflicts:\n\n```\n/rtfm:install-embeddings    # FastEmbed ONNX (~85 MB), semantic + hybrid search\n/rtfm:install-pdf           # pdftext only (~50 MB), fast text extraction\n/rtfm:install-pdf-full      # + marker-pdf + CPU-only torch (~1.5 GB), complex layouts\n```\n\nThe `pdf-full` install uses PyTorch's CPU-only index (no CUDA, no GPU needed) to stay around 1.5 GB instead of 5 GB.\n\nRestart Claude Code after install for the extras to be picked up.\n\n### Manual install (Cursor, Codex, Claude Desktop chat, other MCP clients)\n\nFor clients without Claude Code's plugin system :\n\n```bash\npip install rtfm-ai\ncd /path/to/your-project\nrtfm init\n```\n\nThen point your MCP client at `rtfm-serve` (the entry exposed by the pip package). Optional extras via `pip install rtfm-ai[embeddings,pdf]`.\n\n---\n\n\u003c!-- ─────────── TIER 2 — Positioning \u0026 buzz ─────────── --\u003e\n\n## How it compares\n\n|                       | **RTFM**              | Augment CE    | Sourcegraph       | Code-Index-MCP | MemPalace                |\n| --------------------- | --------------------- | ------------- | ----------------- | -------------- | ------------------------ |\n| Code indexing         | ✅ (AST-aware)        | ✅            | ✅                | ✅             | Shallow (char-chunk)     |\n| Docs, specs, markdown | ✅ (header-parsed)    | Partial       | ❌                | Limited        | Verbatim chunks          |\n| Legal / regulatory    | ✅ (XML, BOFiP)       | ❌            | ❌                | ❌             | ❌                       |\n| Research (LaTeX, PDF) | ✅                    | ❌            | ❌                | ❌             | ❌                       |\n| Custom parsers        | ✅ (~50 lines)        | ❌            | ❌                | ❌             | ❌                       |\n| Knowledge graph       | ✅ (file/code links)  | ❌            | Partial           | ❌             | Entity graph (people)    |\n| File version history  | ✅ (unlimited)        | ❌            | ❌                | ❌             | ❌ (purge-and-replace)   |\n| MCP native            | ✅                    | ✅            | ✅                | ✅             | ✅                       |\n| Runs locally          | ✅                    | Cloud         | Enterprise        | ✅             | ✅                       |\n| Open source           | MIT                   | ❌            | Partial           | ✅             | MIT                      |\n| Price                 | **Free**              | $20-200/mo    | $$$/mo            | Free           | Free                     |\n\n**RTFM is the only open-source option that indexes multi-domain content with structural parsing, a code-level knowledge graph, and unlimited per-file history.** That's the niche.\n\nDifferent from MemPalace specifically: MemPalace is an entity-level memory for conversations (who/project/decision triples in SQLite, plus verbatim chunks in ChromaDB). RTFM is a retrieval layer for *artefacts* — parsed by format, linked at the file level, versioned over time. The two are stackable, not competing.\n\n\u003e For a deeper breakdown of the design choices behind any RAG (chunking, retrieval, augmentation, integration, freshness, storage), see **[RAG Fundamentals — the 6 axes →](docs/rag-fundamentals.md)**\n\n---\n\n## Memory that survives sessions\n\nBetween sessions, most agents forget. RTFM indexes Claude Code's own memory files across every project on your machine, with full version history.\n\n```bash\nrtfm memory                    # Manual snapshot\nrtfm memory --install-hook     # Auto-snapshot on every SessionEnd\n```\n\n- **Cross-project index** — one DB at `~/.rtfm/memory.db` sees every `~/.claude/projects/*/memory/` directory on your machine. Ask `rtfm_search(\"OAuth auth decisions\")` and get hits from all 18 of your projects.\n- **Unlimited version history** — every change to a memory file is snapshotted (no prune). `rtfm_history \u003cslug\u003e` returns the full evolution.\n- **Auto-snapshot on `SessionEnd`** — one command installs a global Claude Code hook. Every session you close captures a new snapshot.\n- **Curated, not verbatim** — RTFM indexes the notes the agent already curated itself during the session (small, structured, signal-dense). Different philosophy from MemPalace, which indexes the full conversation transcripts in ChromaDB (large, noisy, needs aggressive semantic filtering).\n\n---\n\n## Obsidian vault mode\n\nRTFM is the retrieval layer for the [Karpathy LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern. Karpathy himself wrote: *\"at small scale the index file is enough, but as the wiki grows you want proper search.\"* This is proper search.\n\n```bash\ncd /path/to/your-obsidian-vault\nrtfm vault\n```\n\n- Detects `.obsidian/`, proposes a folder → corpus mapping\n- Resolves `[[wikilinks]]` following Obsidian rules → stored as graph edges\n- Generates `_rtfm/` with Obsidian-native navigation (index, graph with Mermaid, hubs, orphans, Dataview frontmatter)\n- Tested on a 1,700-note research vault\n\n```\n_rtfm/\n├── index.md      # Hub: corpus list, top connected documents\n├── graph.md      # Hub documents, orphans, broken links, Mermaid\n├── recent.md     # Recently modified files\n└── corpus/       # Per-corpus indexes\n```\n\nThe LLM still writes your wiki. RTFM handles the retrieval that `index.md` can't scale to.\n\n**[Full Obsidian guide →](docs/obsidian-vault-guide.md)**\n\n---\n\n## NotebookLM integration\n\nRTFM pairs naturally with [`notebooklm-mcp`](https://github.com/roomi-fields/notebooklm-mcp). NotebookLM caps you at 50 queries/day per notebook; RTFM removes that ceiling by indexing answers locally — ask once, retrieve forever, offline, in milliseconds.\n\n`notebooklm-mcp`'s `/batch-to-vault` endpoint writes citation-backed Q\u0026A as `{slug}.md` (markdown with frontmatter) plus `{slug}.json` (structured `nblm-answer-v1` sidecar). Both are guaranteed to coexist. Two integration paths, both ship today:\n\n- **Path A — Markdown (zero config)**: drop the vault into RTFM and `rtfm sync`. The default markdown parser slices each answer into question / answer / per-citation chunks automatically. No mapping, no schema, no code.\n- **Path B — JSON sidecar (typed metadata)**: drop a [`nblm-answer.yaml` mapping](docs/notebooklm-integration.md#path-b--json-sidecar-typed-metadata--edges) into `.rtfm/mappings/`. Each `.json` answer file produces typed chunks with `notebook_id`, `source_name`, `citation_marker` queryable via SQL, plus `cites` edge candidates between answers and sources.\n\nUse Path A unless you specifically need to filter or graph by structured citation fields.\n\n**[Full NotebookLM recipe →](docs/notebooklm-integration.md)**\n\n---\n\n## What I measured\n\nI ran two kinds of benchmarks. The honest picture is nuanced — retrieval helps most on tasks that are actually solvable and where the agent is spending time *looking for things*.\n\n### Document-heavy task: French tax article generation (B10)\n\nWriting a ~50-page regulated article from a corpus of legal code, case law, and administrative doctrine. Same agent (Claude Code + Sonnet 4), same prompt, eight configurations tested.\n\n| Configuration               | Duration  | Cost    | Tokens |\n| --------------------------- | --------- | ------- | ------ |\n| Baseline (no RTFM)          | 8m 16s    | $22.61  | 8.21 M |\n| **With RTFM (FTS default)** | **6m 58s**| **$11.14** | **3.22 M** |\n\n**Δ : −51 % cost, −61 % tokens, −16 % duration — with better factual accuracy.**\nThis is the use case RTFM was built for: navigating a large multi-domain corpus where grep misses the right paragraph.\n\n### Code task: [FeatureBench](https://huggingface.co/datasets/LiberCoders/FeatureBench) (LiberCoders dataset)\n\n11 tasks, 3 repos of varying size, 4 conditions (A = standard prompt with file paths; B = discovery, no paths; C = RTFM FTS; D = RTFM hybrid), 3 runs each.\n\n| Repo     | Size        | Where RTFM helps                                     |\n| -------- | ----------- | ---------------------------------------------------- |\n| metaflow | 620 files   | Everyone resolves — RTFM adds no measurable gain     |\n| astropy  | 1,119 files | All conditions 25–30 % F2P pass; none fully resolve  |\n| mlflow   | 8,255 files | All conditions 0–5 % F2P pass; none fully resolve    |\n\nOn a single smaller-scope run (`test_stub_generator` on metaflow), RTFM cut agent time by **−37 %** vs the no-paths baseline. On the larger repos, the tasks themselves were too hard for Sonnet 4 to resolve inside a 20-minute timeout regardless of retrieval.\n\n### The honest caveats\n\n- Single model (Sonnet 4), single agent (Claude Code). Not statistically bullet-proof.\n- On small repos (\u003c 1k files), `grep` is enough and RTFM adds overhead.\n- FeatureBench measures *code modification*, not *information retrieval*. It's the wrong benchmark for a retrieval tool — I'm running against it because it's what exists. Better-suited benchmarks (RepoQA, SWE-QA, LocAgent) are on the roadmap.\n\n### What this says\n\nRTFM measurably wins when the bottleneck is **\"find the right paragraph in a 2,000-file corpus\"**. It doesn't magically make unsolvable tasks solvable. The model still has to do the work — RTFM just makes sure it has the right context to do it with.\n\n---\n\n## Who it's for\n\nRTFM works anywhere your project isn't just code:\n\n- **LegalTech** — Code + tax law + regulatory specs. Ships with Legifrance XML and BOFiP parsers.\n- **Research** — Code + LaTeX papers + datasets. Ships with LaTeX and PDF parsers.\n- **FinTech** — Code + financial regulations + XBRL reports. Write an XBRL parser in 50 lines.\n- **HealthTech** — Code + medical records (HL7/FHIR) + clinical guidelines.\n- **Solo devs with big projects** — Stop watching your agent grep the same 8,000 files every session.\n- **Obsidian / PKM users** — Make your vault actually searchable by your AI.\n- **Any regulated industry** — If your project mixes code with domain documents, RTFM is for you.\n\n---\n\n\u003c!-- ─────────── TIER 3 — Technical depth ─────────── --\u003e\n\n## Full feature list\n\n### Search \u0026 retrieval\n- **FTS5 full-text search** — instant, zero-config, works out of the box\n- **Semantic search** — optional embeddings (FastEmbed/ONNX, no GPU needed)\n- **Hybrid mode** — combine both, rank by relevance score\n- **Metadata-first** — results return file paths + scores (~300 tokens), not content dumps\n- **Progressive disclosure** — agent expands only the chunks it actually needs\n- **Knowledge graph** — wikilinks + Python imports resolved as graph edges, hub detection, centrality ranking\n\n### Multi-format indexing\n- **22 parsers built-in** — Markdown, Python (AST), LaTeX, YAML, JSON, TOML, Shell, PDF, XML, HTML, SQLite, Jupyter, CSV/TSV, XLSX, EPUB, MOBI/AZW, FB2, DJVU, DOCX, ODT, RTF, plain text\n- **Extensible** — add any format in ~50 lines of Python\n- **Auto-sync hooks** — index stays fresh every prompt, zero manual work\n- **Incremental** — only re-indexes what changed\n\n### Integration\n- **Native Claude Code plugin** — `/plugin install rtfm@roomi-fields/rtfm`, auto-init per project\n- **Pure-Python MCP server** — 0 external deps, no `mcp` SDK / `pydantic` / native binaries\n- **Cross-platform** — Linux, macOS, Windows, WSL (only requires Python ≥ 3.10 on PATH)\n- **13 MCP tools** — search, context, expand, graph, history, sync, tags, ...\n- **Manual install fallback** — `pip install rtfm-ai` for Cursor, Codex, Claude Desktop chat, any other MCP client\n- **CLI + Python API** — scriptable for pipelines\n- **Non-invasive** — doesn't touch your code, doesn't replace your editor\n\n---\n\n## The parser architecture\n\nNeed to index a format nobody supports? Write a parser in ~50 lines.\n\n```python\nfrom rtfm.parsers.base import BaseParser, ParserRegistry\nfrom rtfm.core.models import Chunk\nimport json\nfrom uuid import uuid4\n\n@ParserRegistry.register\nclass FHIRParser(BaseParser):\n    \"\"\"Parse HL7 FHIR medical records.\"\"\"\n    extensions = ['.fhir.json']\n    name = \"fhir\"\n\n    def parse(self, path, metadata=None):\n        data = json.loads(path.read_text())\n        for entry in data.get('entry', []):\n            resource = entry.get('resource', {})\n            yield Chunk(\n                id=resource.get('id', str(uuid4())),\n                content=json.dumps(resource, indent=2),\n                book_title=f\"FHIR {resource.get('resourceType', 'Unknown')}\",\n                book_slug=resource.get('id', 'unknown'),\n                page_start=1,\n                page_end=1,\n            )\n```\n\nDrop it in your project, restart Claude Code, your medical AI agent now understands FHIR records.\n\n### Two levels of JSON integration\n\nFor JSON-based formats specifically, RTFM offers a second extensibility path that doesn't need any Python:\n\n| Level | What you do | What you get |\n| --- | --- | --- |\n| **1. Generic** | Nothing. Just index the file. | Each top-level key becomes a chunk. Full-text search works on values. |\n| **2. Mapped** | Drop a YAML mapping in `.rtfm/mappings/` (~30 lines). | Typed chunks with declared metadata, custom titles, foreach extraction over arrays, edge candidates. The producing project (NotebookLM, Linear, Notion, OpenAPI…) ships the mapping; RTFM stays generic. |\n\nSee [JSON schema mappings](docs/json-mappings.md) for the full reference, and [RTFM × NotebookLM](docs/notebooklm-integration.md) for a concrete recipe.\n\n### Built-in parsers\n\n| Parser         | Extensions                      | Strategy                                              |\n| -------------- | ------------------------------- | ----------------------------------------------------- |\n| Markdown       | `.md`                           | Split by headers, YAML frontmatter extraction         |\n| Python         | `.py`                           | AST-based: each class/function = 1 chunk              |\n| LaTeX          | `.tex`                          | Split by `\\section`, `\\chapter`, etc.                 |\n| YAML           | `.yaml`, `.yml`                 | Split by top-level keys                               |\n| JSON           | `.json`                         | Split by top-level keys or array elements             |\n| TOML           | `.toml`                         | Top-level tables; emits `depends_on` edges (PEP 621, Cargo, Poetry) |\n| Shell          | `.sh`, `.bash`, `.zsh`          | Function-aware chunking                               |\n| PDF            | `.pdf`                          | Page-based (`pip install rtfm-ai[pdf]`)               |\n| Legifrance XML | `.xml`                          | French legal codes (LEGI format)                      |\n| BOFiP HTML     | `.html`                         | French tax doctrine                                   |\n| SQLite         | `.sqlite`, `.sqlite3`, `.db`    | Schema + sample rows per table; FK edges (read-only)  |\n| Jupyter        | `.ipynb`                        | Group cells by markdown heading; outputs dropped      |\n| CSV / TSV      | `.csv`, `.tsv`                  | Header + sample rows + lightweight type inference     |\n| XLSX           | `.xlsx`                         | Per-sheet schema + sample (`pip install rtfm-ai[xlsx]`) |\n| Plain text     | `.js`, `.ts`, `.rs`, `.go`, ... | Line-boundary chunks (~500 chars)                     |\n\n---\n\n## MCP tools\n\n| Tool              | What it does                                          |\n| ----------------- | ----------------------------------------------------- |\n| `rtfm_search`     | Search the index (FTS, semantic, or hybrid)           |\n| `rtfm_context`    | Get relevant context for a subject (metadata-only)    |\n| `rtfm_expand`     | Show all chunks of a source with full content         |\n| `rtfm_discover`   | Fast project structure scan (~1s, no indexing needed) |\n| `rtfm_books`      | List indexed documents                                |\n| `rtfm_stats`      | Library statistics                                    |\n| `rtfm_sync`       | Sync a directory (incremental)                        |\n| `rtfm_ingest`     | Ingest a single file                                  |\n| `rtfm_tags`       | List all tags                                         |\n| `rtfm_tag_chunks` | Add tags to specific chunks                           |\n| `rtfm_remove`     | Remove a file from the index                          |\n| `rtfm_graph`      | Show dependency graph for a source (imports, links)   |\n| `rtfm_history`    | File version history and memory snapshots             |\n\n---\n\n## CLI reference\n\n```bash\n# Search\nrtfm search \"authentication flow\"\nrtfm search \"article 39\" --corpus cgi --limit 5\n\n# Sync\nrtfm sync                              # All registered sources\nrtfm sync /path/to/docs --corpus docs  # Specific directory\nrtfm sync . --force                    # Force re-index\n\n# Source management\nrtfm add /path/to/docs --corpus docs --extensions md,pdf\nrtfm sources\n\n# Obsidian vault\nrtfm vault                             # Initialize for cwd vault\nrtfm vault /path/to/vault              # Specific vault\nrtfm vault --regenerate                # Regenerate _rtfm/ files\n\n# Cross-project Claude memory\nrtfm memory                            # Manual snapshot\nrtfm memory --install-hook             # Auto-snapshot on SessionEnd\n\n# Status \u0026 info\nrtfm status\nrtfm books\nrtfm tags\nrtfm history path/to/file.md           # Memory version history\n\n# Semantic search\nrtfm embed                             # Generate embeddings (one-time)\nrtfm semantic-search \"tax deductions\" --hybrid\n\n# MCP server\nrtfm serve\n```\n\n---\n\n## Python API\n\n```python\nfrom rtfm import Library\n\nlib = Library(\"my_library.db\")\n\n# Index\nstats = lib.ingest(\"documents/article.md\", corpus=\"docs\")\nresult = lib.sync(\".\", corpus=\"my-project\")  # SyncResult(+3 ~1 -0 =42)\n\n# Search\nresults = lib.search(\"depreciation\", limit=10, corpus=\"cgi\")\nresults = lib.hybrid_search(\"amortissement fiscal\", limit=10)\n\n# Export for LLM\nprompt_context = results.to_prompt(max_chars=8000)\n\nlib.close()\n```\n\n---\n\n## Where RTFM fits\n\nRTFM isn't a task manager. It's not an agent framework. It's the knowledge layer your agent needs underneath whatever you're already using.\n\n```\n┌─────────────────────────────────┐\n│  GSD / Taskmaster / Claude Flow │  ← Orchestration\n├─────────────────────────────────┤\n│              RTFM               │  ← Knowledge (you are here)\n├─────────────────────────────────┤\n│          Claude Code            │  ← Execution\n└─────────────────────────────────┘\n```\n\nWithout RTFM, your orchestrator drives an agent that hallucinates. With RTFM, the agent knows what it's building on.\n\n---\n\n## Contributing\n\nAdding a parser is the easiest way to contribute — and the most impactful. See [CONTRIBUTING.md](CONTRIBUTING.md).\n\nFound a bug? Have an idea? [Open an issue](https://github.com/roomi-fields/rtfm/issues).\n\n## License\n\nMIT — use it, fork it, extend it, ship it.\n\n## Author\n\n**Romain Peyrichou** — [@roomi-fields](https://github.com/roomi-fields)\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Code indexers see your code. RTFM sees everything.**\n\n[⭐ Star on GitHub](https://github.com/roomi-fields/rtfm) if RTFM saves your agent from hallucinating.\n\nCurious how it works under the hood? See the [Architecture](docs/architecture.md) — SQLite + FTS5, the parser registry, and the priority-queue worker (ingest → embed → OCR).\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froomi-fields%2Frtfm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froomi-fields%2Frtfm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froomi-fields%2Frtfm/lists"}