{"id":47634782,"url":"https://github.com/lyonzin/knowledge-rag","last_synced_at":"2026-06-09T02:01:30.303Z","repository":{"id":341105311,"uuid":"1150645924","full_name":"lyonzin/knowledge-rag","owner":"lyonzin","description":"[knowledge-rag] - Drop docs, search instantly from Claude Code — 12 MCP tools, 20 format parsers, hybrid search + reranking. Zero servers, zero API keys, 100% local.","archived":false,"fork":false,"pushed_at":"2026-06-09T00:34:59.000Z","size":438,"stargazers_count":91,"open_issues_count":0,"forks_count":17,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-09T01:23:56.817Z","etag":null,"topics":["antigravity","claude","claude-code","claude-code-cli","codex","cursor-ai","document-search","hybrid-search","inteligencia-artificial","knowledge-base","local-ai","mcp","mcp-server","rag","rag-chatbot","rag-pipeline","reranking","retrieval-augmented-generation","semantic-search","vector-database"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/knowledge-rag/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lyonzin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"lyonzin"}},"created_at":"2026-02-05T14:23:33.000Z","updated_at":"2026-06-09T00:34:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lyonzin/knowledge-rag","commit_stats":null,"previous_names":["lyonzin/knowledge-rag"],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/lyonzin/knowledge-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lyonzin%2Fknowledge-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lyonzin%2Fknowledge-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lyonzin%2Fknowledge-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lyonzin%2Fknowledge-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lyonzin","download_url":"https://codeload.github.com/lyonzin/knowledge-rag/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lyonzin%2Fknowledge-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34088013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antigravity","claude","claude-code","claude-code-cli","codex","cursor-ai","document-search","hybrid-search","inteligencia-artificial","knowledge-base","local-ai","mcp","mcp-server","rag","rag-chatbot","rag-pipeline","reranking","retrieval-augmented-generation","semantic-search","vector-database"],"created_at":"2026-04-02T00:00:50.833Z","updated_at":"2026-06-09T02:01:30.295Z","avatar_url":"https://github.com/lyonzin.png","language":"Python","funding_links":["https://github.com/sponsors/lyonzin"],"categories":["🧠 Knowledge Management \u0026 Memory","Docker MCP Toolkit","📦 Other"],"sub_categories":["Knowledge Management"],"readme":"# Knowledge RAG\n\n\u003cdiv align=\"center\"\u003e\n\n[![PyPI](https://img.shields.io/pypi/v/knowledge-rag)](https://pypi.org/project/knowledge-rag/)\n[![NPM](https://img.shields.io/npm/v/knowledge-rag)](https://www.npmjs.com/package/knowledge-rag)\n[![PyPI Downloads](https://static.pepy.tech/personalized-badge/knowledge-rag?period=total\u0026units=INTERNATIONAL_SYSTEM\u0026left_color=BLACK\u0026right_color=GREEN\u0026left_text=downloads)](https://pepy.tech/projects/knowledge-rag)\n![Python](https://img.shields.io/badge/python-3.11%2B-green.svg)\n![License](https://img.shields.io/badge/license-MIT-yellow.svg)\n![Platform](https://img.shields.io/badge/platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)\n![GPU](https://img.shields.io/badge/GPU-NVIDIA%20CUDA-76B900.svg?logo=nvidia)\n[![CI](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/ci.yml)\n[![CodeQL](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/security.yml)\n[![Quality Gate](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml/badge.svg)](https://github.com/lyonzin/knowledge-rag/actions/workflows/quality-gate.yml)\n[![Glama Score](https://glama.ai/mcp/servers/lyonzin/knowledge-rag/badges/score.svg)](https://glama.ai/mcp/servers/lyonzin/knowledge-rag)\n\n### Your docs, your machine, zero cloud. Claude Code searches them natively.\n\nDrop your PDFs, markdown, code, notebooks — **1800+ files, 39K chunks, indexed in under 3 minutes.**\u003cbr/\u003e\nHybrid search (BM25 + semantic vectors + cross-encoder reranking) through 12 MCP tools.\u003cbr/\u003e\nEverything runs locally via ONNX. No Docker, no Ollama, no API keys, no data leaves your machine.\n\n```\npip install knowledge-rag → restart Claude Code → search_knowledge(\"your query\")\n```\n\n---\n\n**12 MCP Tools** | **Hybrid Search + Reranking** | **20 File Formats** | **Optional NVIDIA GPU** | **100% Local**\n\n[What's New](#whats-new-in-v390) | [Supported Formats](#supported-formats) | [Installation](#installation) | [Configuration](#configuration) | [API Reference](#api-reference) | [Architecture](#architecture)\n\n\u003c/div\u003e\n\n---\n\n## Star History\n\n\u003cdiv align=\"center\"\u003e\n\n\u003ca href=\"https://www.star-history.com/?repos=lyonzin%2Fknowledge-rag\u0026type=date\u0026legend=top-left\"\u003e\n \u003cpicture\u003e\n   \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/chart?repos=lyonzin/knowledge-rag\u0026type=date\u0026theme=dark\u0026legend=top-left\" /\u003e\n   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/chart?repos=lyonzin/knowledge-rag\u0026type=date\u0026legend=top-left\" /\u003e\n   \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/chart?repos=lyonzin/knowledge-rag\u0026type=date\u0026legend=top-left\" /\u003e\n \u003c/picture\u003e\n\u003c/a\u003e\n\n\u003c/div\u003e\n\n---\n\n## What's New in v3.9.0\n\n### Quality Gate — 7-Pillar PR Validation\n\nEvery PR (including dependabot bumps and one-line fixes) is now evaluated against **35+ automated checks** spread across 7 pillars before any human review:\n\n| Pillar | What it enforces | Tools |\n|---|---|---|\n| **1 Security** | SAST, secrets, CVEs, supply chain | bandit, semgrep, gitleaks, pip-audit, dependency-review, Snyk, CodeQL, Socket |\n| **2 Stability** | Flake detection, coverage trend, test count, deterministic runs | pytest-rerunfailures, codecov ±0.5pp, test-count guard |\n| **3 Memory Leak** | RSS bounded under 1000-query load, no idle bloat | psutil-based baseline tests + nightly 50K-iteration soak |\n| **4 Versatility** | 9 OS×Python combos, 14 format parsers, 4 config presets, locale tolerance, property-based fuzzing | matrix CI on Linux+Windows+macOS × 3.11+3.12+3.13, Hypothesis |\n| **5 Scalability** | Performance regression \u003e 10% blocks merge, public bench dashboard | pytest-benchmark, GH Pages chart |\n| **6 Versioning** | Atomic version sync, API surface diff, conventional commits, CHANGELOG enforcement, backwards compat | griffe-style AST diff, custom guards |\n| **7 Quality** | Type strictness, docstring coverage, complexity, dead code | mypy strict, interrogate ≥80%, radon, vulture |\n\nPlus a **nightly resilience workflow** that runs chaos failure-injection (HF down, ChromaDB corruption, watchdog crash, ONNX zero-byte replay), determinism check (full suite × 3), and mutation testing on selected modules.\n\nRead the full philosophy in [CONTRIBUTING.md](CONTRIBUTING.md). Report bugs via [SECURITY.md](SECURITY.md) or the [issue templates](.github/ISSUE_TEMPLATE/).\n\n### Critical Hotfix — No More Silent Zero-Vector Corruption (v3.8.1)\n\n`FastEmbedEmbeddings.__call__` no longer swallows exceptions and returns `[[0.0]*dim, ...]` when the ONNX model fails to load. That bug pre-existed in master but was silent: ChromaDB happily stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped them as \"already indexed\", and queries returned garbage similarity with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError` loudly. **All v3.8.0 users should upgrade.** Full details in [Changelog](#v381-2026-05-10--hotfix).\n\n### Lazy-Loaded Embeddings — Cheaper Idle Processes (v3.8.0)\n\nThe FastEmbed ONNX model (~200MB resident) now loads on the **first query**, not at startup. Idle `knowledge-rag` processes are now genuinely cheap. Why this matters: MCP stdio is one-process-per-client by protocol — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes. Before v3.8.0, every one of them paid the full embedding-model cost up front. Now only processes that actually serve queries load the model. Public API is unchanged.\n\n### Opt-In Single-Instance Guard (v3.8.0)\n\nFor users who measured their setup and want a hard cap of one server per `data_dir`:\n\n```bash\nexport KNOWLEDGE_RAG_SINGLE_INSTANCE=1\n```\n\nA second instance exits immediately with code 75. **OFF by default** so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide in [docs/single-instance.md](docs/single-instance.md). Sample MCP config in [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).\n\n### 5 Ways to Install\n\n```bash\nnpx -y knowledge-rag                    # NPM — zero setup, auto-manages Python venv\npip install knowledge-rag               # PyPI — classic Python install\ncurl -fsSL .../install.sh | bash        # One-line installer (Linux/macOS/Windows)\ndocker pull ghcr.io/lyonzin/knowledge-rag  # Docker — models pre-downloaded\ngit clone ... \u0026\u0026 pip install -r ...     # From source\n```\n\nAll methods produce the same MCP server. See [Installation](#installation) for full instructions.\n\n### Recent Highlights\n\n- **v3.9.0** — **Quality Gate** activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)\n- **v3.8.1** — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)\n- **v3.8.0** — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker\n- **v3.6.0** — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline\n- **v3.5.2** — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when `gpu: false`), BASE_DIR resolution fix for editable installs\n- **v3.5.1** — Remove Python `\u003c3.13` upper bound — 3.13 and 3.14 now supported\n- **v3.5.0** — Optional GPU acceleration, supported formats table, full README rewrite\n- **v3.4.3** — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)\n- **v3.4.0** — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support\n\nSee [Changelog](#changelog) for full history.\n\n---\n\n## Supported Formats\n\n| Format | Extension | Parser | Default | Notes |\n|--------|-----------|--------|---------|-------|\n| Markdown | `.md` | Section-aware (splits at `##`) | Yes | Headers preserved as chunk boundaries |\n| Plain Text | `.txt` | Fixed-size chunking | Yes | 1000 chars + 200 overlap |\n| PDF | `.pdf` | PyMuPDF extraction | Yes | Text-based PDFs only (no OCR) |\n| Python | `.py` | Code-aware parser | Yes | Functions/classes as chunks |\n| JSON | `.json` | Structure-aware | Yes | Flattened key-value extraction |\n| CSV | `.csv` | Row-based parser | Yes | Headers + rows as text |\n| Word | `.docx` | python-docx | Yes | Headings preserved as markdown |\n| Excel | `.xlsx` | openpyxl | Yes | Sheet-by-sheet extraction |\n| PowerPoint | `.pptx` | python-pptx | Yes | Slide-by-slide extraction |\n| Jupyter Notebook | `.ipynb` | Cell-aware parser | Yes | Markdown + code cells only, no outputs/base64 |\n| C Source | `.c` | Code-aware parser | Yes | Functions/structs/includes extracted |\n| C/C++ Header | `.h` | Code-aware parser | Yes | Function declarations/structs extracted |\n| C++ Source | `.cpp` | Code-aware parser | Yes | Classes/structs/includes extracted |\n| JavaScript | `.js` | Code-aware parser | Yes | Functions/classes/imports (ESM + CJS) |\n| React JSX | `.jsx` | Code-aware parser | Yes | Same as JS parser |\n| TypeScript | `.ts` | Code-aware parser | Yes | Functions/classes/interfaces/enums/imports |\n| React TSX | `.tsx` | Code-aware parser | Yes | Same as TS parser |\n| XML | `.xml` | XML parser | Yes | Root element and namespace extraction |\n| MQL4 Header | `.mqh` | Code parser | No | MetaTrader — add to `supported_formats` to enable |\n| MQL4 Source | `.mq4` | Code parser | No | MetaTrader — add to `supported_formats` to enable |\n\n\u003e **Tip:** The parser dispatch is extensible. Any format mapped in `_parsers` can be enabled via `supported_formats` in config.yaml.\n\n---\n\n## Features\n\n| Feature | Description |\n|---------|-------------|\n| **Hybrid Search** | Semantic + BM25 keyword search with Reciprocal Rank Fusion |\n| **Cross-Encoder Reranker** | Xenova/ms-marco-MiniLM-L-6-v2 re-scores top candidates for precision |\n| **GPU Acceleration** | Optional ONNX CUDA support for 5-10x faster indexing |\n| **YAML Configuration** | Fully customizable via `config.yaml` with domain-specific presets |\n| **Query Expansion** | Configurable synonym mappings (69 security-term defaults) |\n| **Markdown-Aware Chunking** | `.md` files split by `##`/`###` sections instead of fixed windows |\n| **In-Process Embeddings** | FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D) |\n| **Keyword Routing** | Word-boundary aware routing for domain-specific queries |\n| **20 Format Parsers** | MD, TXT, PDF, PY, C, H, CPP, JS, JSX, TS, TSX, JSON, XML, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4 |\n| **Category Organization** | Organize docs by folder, auto-tagged by path |\n| **Incremental Indexing** | Change detection via mtime/size — only re-indexes modified files |\n| **Chunk Deduplication** | SHA256 content hashing prevents duplicate chunks |\n| **Query Cache** | LRU cache with 5-min TTL for instant repeat queries |\n| **Document CRUD** | Add, update, remove documents via MCP tools |\n| **URL Ingestion** | Fetch URLs, strip HTML, convert to markdown, index |\n| **Similarity Search** | Find documents similar to a reference document |\n| **Retrieval Evaluation** | Built-in MRR@5 and Recall@5 metrics |\n| **File Watcher** | Auto-reindex on document changes via watchdog (5s debounce) |\n| **Exclude Patterns** | Glob-based file/directory exclusion during indexing |\n| **MMR Diversification** | Maximal Marginal Relevance reduces redundant results |\n| **Persistent Model Cache** | Embedding models cached in `models_cache/` — survives reboots |\n| **Auto-Migration** | Detects embedding dimension mismatch and rebuilds automatically |\n| **12 MCP Tools** | Full CRUD + search + evaluation via Claude Code |\n\n---\n\n## Architecture\n\n### System Overview\n\n```mermaid\nflowchart TB\n    subgraph MCP[\"MCP SERVER (FastMCP)\"]\n        direction TB\n        TOOLS[\"12 MCP Tools\u003cbr/\u003esearch | get | add | update | remove\u003cbr/\u003ereindex | list | stats | url | similar | evaluate\"]\n    end\n\n    subgraph SEARCH[\"HYBRID SEARCH ENGINE\"]\n        direction LR\n        ROUTER[\"Keyword Router\u003cbr/\u003e(word boundaries)\"]\n        SEMANTIC[\"Semantic Search\u003cbr/\u003e(ChromaDB)\"]\n        BM25[\"BM25 Keyword\u003cbr/\u003e(rank-bm25 + expansion)\"]\n        RRF[\"Reciprocal Rank\u003cbr/\u003eFusion (RRF)\"]\n        RERANK[\"Cross-Encoder\u003cbr/\u003eReranker\"]\n\n        ROUTER --\u003e SEMANTIC\n        ROUTER --\u003e BM25\n        SEMANTIC --\u003e RRF\n        BM25 --\u003e RRF\n        RRF --\u003e RERANK\n    end\n\n    subgraph STORAGE[\"STORAGE LAYER\"]\n        direction LR\n        CHROMA[(\"ChromaDB\u003cbr/\u003eVector Database\")]\n        COLLECTIONS[\"Collections\u003cbr/\u003esecurity | ctf\u003cbr/\u003elogscale | development\"]\n        CHROMA --- COLLECTIONS\n    end\n\n    subgraph EMBED[\"EMBEDDINGS (In-Process)\"]\n        FASTEMBED[\"FastEmbed ONNX\u003cbr/\u003eBAAI/bge-small-en-v1.5\u003cbr/\u003e(384D, CPU or GPU)\"]\n        CROSSENC[\"Cross-Encoder\u003cbr/\u003ems-marco-MiniLM-L-6-v2\"]\n        FASTEMBED --- CROSSENC\n    end\n\n    subgraph INGEST[\"DOCUMENT INGESTION\"]\n        PARSERS[\"20 Parsers\u003cbr/\u003eMD | PDF | TXT | PY | C | H | CPP | JS | JSX | TS | TSX | JSON | XML | CSV\u003cbr/\u003eDOCX | XLSX | PPTX | IPYNB | MQH | MQ4\"]\n        CHUNKER[\"Chunking\u003cbr/\u003eMD: section-aware\u003cbr/\u003eOther: 1000 chars + 200 overlap\"]\n        PARSERS --\u003e CHUNKER\n    end\n\n    CLAUDE[\"Claude Code\"] --\u003e MCP\n    MCP --\u003e SEARCH\n    SEARCH --\u003e STORAGE\n    STORAGE --\u003e EMBED\n    INGEST --\u003e EMBED\n    EMBED --\u003e STORAGE\n```\n\n### Query Processing Flow\n\n```mermaid\nflowchart TB\n    QUERY[\"User Query\u003cbr/\u003e'mimikatz credential dump'\"] --\u003e EXPAND\n\n    subgraph EXPANSION[\"Query Expansion\"]\n        EXPAND[\"Synonym Expansion\u003cbr/\u003emimikatz -\u003e mimikatz, sekurlsa, logonpasswords\"]\n    end\n\n    EXPAND --\u003e ROUTER\n\n    subgraph ROUTING[\"Keyword Routing\"]\n        ROUTER[\"Keyword Router\"]\n        MATCH{\"Word Boundary\u003cbr/\u003eMatch?\"}\n        CATEGORY[\"Filter: redteam\"]\n        NOFILTER[\"No Filter\"]\n\n        ROUTER --\u003e MATCH\n        MATCH --\u003e|Yes| CATEGORY\n        MATCH --\u003e|No| NOFILTER\n    end\n\n    subgraph HYBRID[\"Hybrid Search\"]\n        direction LR\n        SEMANTIC[\"Semantic Search\u003cbr/\u003e(ChromaDB embeddings)\u003cbr/\u003eConceptual similarity\"]\n        BM25[\"BM25 Search\u003cbr/\u003e(expanded query)\u003cbr/\u003eExact term matching\"]\n    end\n\n    subgraph FUSION[\"Result Fusion + Reranking\"]\n        RRF[\"Reciprocal Rank Fusion\u003cbr/\u003escore = alpha * 1/(k+rank_sem)\u003cbr/\u003e+ (1-alpha) * 1/(k+rank_bm25)\"]\n        RERANK[\"Cross-Encoder Reranker\u003cbr/\u003eRe-scores top 3x candidates\u003cbr/\u003equery+doc pair scoring\"]\n        SORT[\"Sort by Reranker Score\u003cbr/\u003eNormalize to 0-1\"]\n\n        RRF --\u003e RERANK --\u003e SORT\n    end\n\n    CATEGORY --\u003e HYBRID\n    NOFILTER --\u003e HYBRID\n    SEMANTIC --\u003e RRF\n    BM25 --\u003e RRF\n\n    SORT --\u003e RESULTS[\"Results\u003cbr/\u003esearch_method: hybrid|semantic|keyword\u003cbr/\u003escore + reranker_score + raw_rrf_score\"]\n```\n\n### Document Ingestion Flow\n\n```mermaid\nflowchart LR\n    subgraph INPUT[\"Input\"]\n        FILES[\"documents/\u003cbr/\u003e├── security/\u003cbr/\u003e├── development/\u003cbr/\u003e├── ctf/\u003cbr/\u003e└── general/\"]\n    end\n\n    subgraph PARSE[\"Parse (20 formats)\"]\n        MD[\"Markdown\"]\n        PDF[\"PDF\u003cbr/\u003e(PyMuPDF)\"]\n        OFFICE[\"DOCX | XLSX\u003cbr/\u003ePPTX | CSV\"]\n        CODE[\"PY | C | H | CPP | JS | JSX\u003cbr/\u003eTS | TSX | JSON | XML | IPYNB\"]\n    end\n\n    subgraph CHUNK[\"Chunk\"]\n        MDSPLIT[\"MD: Section-Aware\u003cbr/\u003eSplit at ## headers\"]\n        TXTSPLIT[\"Other: Fixed-Size\u003cbr/\u003e1000 chars + 200 overlap\"]\n        DEDUP[\"SHA256 Dedup\u003cbr/\u003eSkip duplicate content\"]\n    end\n\n    subgraph EMBED[\"Embed\"]\n        FASTEMBED[\"FastEmbed ONNX\u003cbr/\u003ebge-small-en-v1.5\u003cbr/\u003e(384D, CPU or GPU)\"]\n    end\n\n    subgraph STORE[\"Store\"]\n        CHROMADB[(\"ChromaDB\")]\n        BM25IDX[\"BM25 Index\"]\n    end\n\n    FILES --\u003e MD \u0026 PDF \u0026 OFFICE \u0026 CODE\n    MD --\u003e MDSPLIT\n    PDF \u0026 OFFICE \u0026 CODE --\u003e TXTSPLIT\n    MDSPLIT --\u003e DEDUP\n    TXTSPLIT --\u003e DEDUP\n    DEDUP --\u003e EMBED\n    EMBED --\u003e STORE\n```\n\n### hybrid_alpha Parameter Effect\n\n```mermaid\nflowchart LR\n    subgraph ALPHA[\"hybrid_alpha values\"]\n        A0[\"0.0\u003cbr/\u003ePure BM25\u003cbr/\u003eInstant\"]\n        A3[\"0.3 (default)\u003cbr/\u003eKeyword-heavy\u003cbr/\u003eFast\"]\n        A5[\"0.5\u003cbr/\u003eBalanced\"]\n        A7[\"0.7\u003cbr/\u003eSemantic-heavy\"]\n        A10[\"1.0\u003cbr/\u003ePure Semantic\"]\n    end\n\n    subgraph USE[\"Best For\"]\n        U0[\"CVEs, tool names\u003cbr/\u003eexact matches\"]\n        U3[\"Technical queries\u003cbr/\u003especific terms\"]\n        U5[\"General queries\"]\n        U7[\"Conceptual queries\u003cbr/\u003erelated topics\"]\n        U10[\"'How to...' questions\u003cbr/\u003econceptual search\"]\n    end\n\n    A0 --- U0\n    A3 --- U3\n    A5 --- U5\n    A7 --- U7\n    A10 --- U10\n```\n\n---\n\n## Installation\n\n### Prerequisites\n\n- Python 3.11+\n- Claude Code CLI\n- *…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see [Use with other MCP clients](#use-with-other-mcp-clients)*\n- ~200MB disk for model cache (auto-downloaded on first run)\n- *Optional:* NVIDIA GPU + CUDA for accelerated embeddings (`pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config)\n\n### Install Methods\n\nPick one — all produce the same running server.\n\n#### Option A: NPX (fastest)\n\nRequires Node.js 16+. Handles Python venv, pip install, and version upgrades automatically.\n\n```bash\nclaude mcp add knowledge-rag -s user -- npx -y knowledge-rag\n```\n\nThat's it. On first run, `npx` creates a venv at `~/.knowledge-rag/`, installs the PyPI package, and starts the MCP server. Subsequent runs reuse the cached venv.\n\n#### Option B: One-line installer\n\n```bash\n# Linux/macOS:\ncurl -fsSL https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.sh | bash\n\n# Windows (PowerShell):\nirm https://raw.githubusercontent.com/lyonzin/knowledge-rag/master/install.ps1 | iex\n```\n\nThen configure Claude Code:\n\n```bash\nclaude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server\n```\n\n\u003e **Windows**: `claude mcp add knowledge-rag -s user -- %USERPROFILE%\\knowledge-rag\\venv\\Scripts\\python.exe -m mcp_server.server`\n\n#### Option C: pip install\n\n```bash\nmkdir ~/knowledge-rag \u0026\u0026 cd ~/knowledge-rag\npython3 -m venv venv \u0026\u0026 source venv/bin/activate\npip install knowledge-rag\nknowledge-rag init              # Exports config template, presets, creates documents/\n```\n\nThen configure Claude Code:\n\n```bash\nclaude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server\n```\n\n\u003e **Windows users**: Use `python` instead of `python3`, `venv\\Scripts\\activate` instead of `source venv/bin/activate`.\n\u003e **Windows path**: `claude mcp add knowledge-rag -s user -- %USERPROFILE%\\knowledge-rag\\venv\\Scripts\\python.exe -m mcp_server.server`\n\n#### Option D: Clone from source\n\n```bash\ngit clone https://github.com/lyonzin/knowledge-rag.git ~/knowledge-rag\ncd ~/knowledge-rag\npython3 -m venv venv \u0026\u0026 source venv/bin/activate\npip install -r requirements.txt\n```\n\nThen configure Claude Code:\n\n```bash\nclaude mcp add knowledge-rag -s user -- ~/knowledge-rag/venv/bin/python -m mcp_server.server\n```\n\n#### Option E: Docker\n\n```bash\ndocker pull ghcr.io/lyonzin/knowledge-rag:latest\n```\n\n```bash\nclaude mcp add knowledge-rag -s user -- \\\n  docker run -i --rm \\\n  -v ~/knowledge-rag/documents:/app/documents \\\n  -v ~/knowledge-rag/data:/app/data \\\n  ghcr.io/lyonzin/knowledge-rag:latest\n```\n\nModels are pre-downloaded in the image — no first-run delay.\n\n\u003cdetails\u003e\n\u003csummary\u003eAlternative: manual JSON config\u003c/summary\u003e\n\nAdd to `~/.claude.json`:\n\n**Windows:**\n```json\n{\n  \"mcpServers\": {\n    \"knowledge-rag\": {\n      \"command\": \"C:\\\\Users\\\\YOUR_USER\\\\knowledge-rag\\\\venv\\\\Scripts\\\\python.exe\",\n      \"args\": [\"-m\", \"mcp_server.server\"]\n    }\n  }\n}\n```\n\n**Linux / macOS:**\n```json\n{\n  \"mcpServers\": {\n    \"knowledge-rag\": {\n      \"command\": \"/home/YOUR_USER/knowledge-rag/venv/bin/python\",\n      \"args\": [\"-m\", \"mcp_server.server\"]\n    }\n  }\n}\n```\n\u003e Replace `YOUR_USER` with your username, or use the full path from `echo $HOME`.\n\u003c/details\u003e\n\n### Use with other MCP clients\n\n`knowledge-rag` is a standard **stdio MCP server** — it works with any MCP-compatible client, not only Claude Code. The launch command is the same everywhere (the `python -m mcp_server.server` from whichever install method you picked); only the **config file location** and **JSON shape** differ per client.\n\n#### Clients using the standard `mcpServers` format\n\nFor **Claude Desktop, Cursor, Antigravity, and Windsurf**, use the same block — only the file location changes:\n\n```json\n{\n  \"mcpServers\": {\n    \"knowledge-rag\": {\n      \"command\": \"/home/YOUR_USER/knowledge-rag/venv/bin/python\",\n      \"args\": [\"-m\", \"mcp_server.server\"]\n    }\n  }\n}\n```\n\n\u003e **Windows**: set `command` to the full path of `venv\\Scripts\\python.exe`.\n\n| Client | Config file | Notes |\n|---|---|---|\n| **Claude Code** | use `claude mcp add …` (see install methods above) | The CLI writes `~/.claude.json` for you — manual edits to it aren't reliably picked up. |\n| **Claude Desktop** | macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\\Claude\\claude_desktop_config.json` | Easiest: **Settings → Developer → Edit Config** opens the correct file (avoids the Windows Store/MSIX path quirk). |\n| **Cursor** | `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (per project) | — |\n| **Antigravity** | macOS/Linux: `~/.gemini/antigravity/mcp_config.json` · Windows: `%USERPROFILE%\\.gemini\\antigravity\\mcp_config.json` | Open via Agent panel → **\"…\" → Manage MCP Servers → View raw config**. |\n| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` (global only) | Easiest: Cascade panel → MCP → **View raw config**. |\n\n#### VS Code — uses a `servers` key\n\nVS Code (Copilot MCP) nests servers under **`servers`**, not `mcpServers`. Put this in `.vscode/mcp.json` (workspace) or the file opened by the **MCP: Open User Configuration** command:\n\n```json\n{\n  \"servers\": {\n    \"knowledge-rag\": {\n      \"type\": \"stdio\",\n      \"command\": \"/home/YOUR_USER/knowledge-rag/venv/bin/python\",\n      \"args\": [\"-m\", \"mcp_server.server\"]\n    }\n  }\n}\n```\n\n#### opencode — uses an `mcp` key\n\nopencode nests servers under **`mcp`**, takes `command` as a single **array**, and uses `environment` instead of `env`. Put this in `opencode.json` (project root) or `~/.config/opencode/opencode.json` (global):\n\n```jsonc\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"mcp\": {\n    \"knowledge-rag\": {\n      \"type\": \"local\",\n      \"command\": [\"/home/YOUR_USER/knowledge-rag/venv/bin/python\", \"-m\", \"mcp_server.server\"],\n      \"enabled\": true\n    }\n  }\n}\n```\n\n\u003e **Any other MCP client**: point it at the same command + args (`…/venv/bin/python -m mcp_server.server`). If it speaks stdio MCP, knowledge-rag works — only the config file's location and key naming differ. Check your client's docs for the exact path.\n\n### Verify\n\n```bash\nclaude mcp list\n```\n\nOn first start, the server will:\n1. Download the embedding model (~50MB, cached in `models_cache/`)\n2. Auto-index any documents in the `documents/` directory\n3. Start watching for file changes (auto-reindex)\n\n---\n\n## Usage\n\n### Adding Documents\n\nPlace your documents in the `documents/` directory, organized by category:\n\n```\ndocuments/\n├── security/          # Pentest, exploit, vulnerability docs\n├── development/       # Code, APIs, frameworks\n├── ctf/               # CTF writeups and methodology\n├── logscale/          # LogScale/LQL documentation\n└── general/           # Everything else\n```\n\nOr add documents programmatically via MCP tools:\n\n```python\n# Add from content\nadd_document(\n    content=\"# My Document\\n\\nContent here...\",\n    filepath=\"security/my-technique.md\",\n    category=\"security\"\n)\n\n# Add from URL\nadd_from_url(\n    url=\"https://example.com/article\",\n    category=\"security\",\n    title=\"Custom Title\"\n)\n```\n\n### Searching\n\nClaude uses the RAG system automatically when configured. You can also control search behavior:\n\n```python\n# Pure keyword search — instant, no embedding needed\nsearch_knowledge(\"gtfobins suid\", hybrid_alpha=0.0)\n\n# Keyword-heavy (default) — fast, slight semantic boost\nsearch_knowledge(\"mimikatz\", hybrid_alpha=0.3)\n\n# Balanced hybrid — both engines equally weighted\nsearch_knowledge(\"SQL injection techniques\", hybrid_alpha=0.5)\n\n# Semantic-heavy — better for conceptual queries\nsearch_knowledge(\"how to escalate privileges\", hybrid_alpha=0.7)\n\n# Pure semantic — embedding similarity only\nsearch_knowledge(\"lateral movement strategies\", hybrid_alpha=1.0)\n```\n\n### Indexing\n\nDocuments are automatically indexed on first startup. To manage the index:\n\n```python\n# Incremental: only re-index changed files (fast)\nreindex_documents()\n\n# Smart reindex: detect changes + rebuild BM25\nreindex_documents(force=True)\n\n# Nuclear rebuild: delete everything, re-embed all (use after model change)\nreindex_documents(full_rebuild=True)\n```\n\n### Evaluating Retrieval Quality\n\n```python\nevaluate_retrieval(test_cases='[\n    {\"query\": \"sql injection\", \"expected_filepath\": \"security/sqli-guide.md\"},\n    {\"query\": \"privilege escalation\", \"expected_filepath\": \"security/privesc.md\"}\n]')\n# Returns: MRR@5, Recall@5, per-query results\n```\n\n---\n\n## API Reference\n\n### Search \u0026 Query\n\n#### `search_knowledge`\n\nHybrid search combining semantic search + BM25 keyword search with cross-encoder reranking.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `query` | string | required | Search query text (1-3 keywords recommended) |\n| `max_results` | int | 5 | Maximum results to return (1-20) |\n| `category` | string | null | Filter by category |\n| `hybrid_alpha` | float | 0.3 | Balance: 0.0 = keyword only, 1.0 = semantic only |\n\n**Returns:**\n\n```json\n{\n  \"status\": \"success\",\n  \"query\": \"mimikatz credential dump\",\n  \"hybrid_alpha\": 0.5,\n  \"result_count\": 3,\n  \"cache_hit_rate\": \"0.0%\",\n  \"results\": [\n    {\n      \"content\": \"Mimikatz can extract credentials from memory...\",\n      \"source\": \"documents/security/credential-attacks.md\",\n      \"filename\": \"credential-attacks.md\",\n      \"category\": \"security\",\n      \"score\": 0.9823,\n      \"raw_rrf_score\": 0.016393,\n      \"reranker_score\": 0.987654,\n      \"semantic_rank\": 2,\n      \"bm25_rank\": 1,\n      \"search_method\": \"hybrid\",\n      \"keywords\": [\"mimikatz\", \"credential\", \"lsass\"],\n      \"routed_by\": \"redteam\"\n    }\n  ]\n}\n```\n\n**Search Method Values:**\n- `hybrid`: Found by both semantic and BM25 search (highest confidence)\n- `semantic`: Found only by semantic search\n- `keyword`: Found only by BM25 keyword search\n\n---\n\n#### `get_document`\n\nRetrieve the full content of a specific document.\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `filepath` | string | Path to the document file |\n\n**Returns:** JSON with document content, metadata, keywords, and chunk count.\n\n---\n\n#### `reindex_documents`\n\nIndex or reindex all documents in the knowledge base.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `force` | bool | false | Smart reindex: detects changes, rebuilds BM25. Fast. |\n| `full_rebuild` | bool | false | Nuclear rebuild: deletes everything, re-embeds all documents. Use after model change. |\n\n**Returns:** JSON with indexing statistics (indexed, updated, skipped, deleted, chunks_added, chunks_removed, dedup_skipped, elapsed_seconds).\n\n---\n\n#### `list_categories`\n\nList all document categories with their document counts.\n\n**Returns:**\n\n```json\n{\n  \"status\": \"success\",\n  \"categories\": {\n    \"security\": 52,\n    \"development\": 8,\n    \"ctf\": 12,\n    \"general\": 3\n  },\n  \"total_documents\": 75\n}\n```\n\n---\n\n#### `list_documents`\n\nList all indexed documents, optionally filtered by category.\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `category` | string | Optional category filter |\n\n**Returns:** JSON array of documents with id, source, category, format, chunks, and keywords.\n\n---\n\n#### `get_index_stats`\n\nGet statistics about the knowledge base index.\n\n**Returns:**\n\n```json\n{\n  \"status\": \"success\",\n  \"stats\": {\n    \"total_documents\": 75,\n    \"total_chunks\": 9256,\n    \"unique_content_hashes\": 9100,\n    \"categories\": {\"security\": 52, \"development\": 8},\n    \"supported_formats\": [\".md\", \".txt\", \".pdf\", \".py\", \".json\", \".docx\", \".xlsx\", \".pptx\", \".csv\", \".ipynb\"],\n    \"embedding_model\": \"BAAI/bge-small-en-v1.5\",\n    \"embedding_dim\": 384,\n    \"reranker_model\": \"Xenova/ms-marco-MiniLM-L-6-v2\",\n    \"chunk_size\": 1000,\n    \"chunk_overlap\": 200,\n    \"query_cache\": {\n      \"size\": 12,\n      \"max_size\": 100,\n      \"ttl_seconds\": 300,\n      \"hits\": 45,\n      \"misses\": 23,\n      \"hit_rate\": \"66.2%\"\n    }\n  }\n}\n```\n\n---\n\n### Document Management\n\n#### `add_document`\n\nAdd a new document to the knowledge base from raw content. Saves the file to the documents directory and indexes it immediately.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `content` | string | required | Full text content of the document |\n| `filepath` | string | required | Relative path within documents dir (e.g., `security/new-technique.md`) |\n| `category` | string | \"general\" | Document category |\n\n---\n\n#### `update_document`\n\nUpdate an existing document. Removes old chunks from the index and re-indexes with new content.\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `filepath` | string | Full path to the document file |\n| `content` | string | New content for the document |\n\n---\n\n#### `remove_document`\n\nRemove a document from the knowledge base index. Optionally deletes the file from disk.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `filepath` | string | required | Path to the document file |\n| `delete_file` | bool | false | If true, also delete the file from disk |\n\n---\n\n#### `add_from_url`\n\nFetch content from a URL, strip HTML (scripts, styles, nav, footer, header), convert to markdown, and add to the knowledge base.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `url` | string | required | URL to fetch content from |\n| `category` | string | \"general\" | Document category |\n| `title` | string | null | Custom title (auto-detected from `\u003ctitle\u003e` tag if not provided) |\n\n---\n\n#### `search_similar`\n\nFind documents similar to a given document using embedding similarity.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `filepath` | string | required | Path to the reference document |\n| `max_results` | int | 5 | Number of similar documents to return (1-20) |\n\n---\n\n#### `evaluate_retrieval`\n\nEvaluate retrieval quality with test queries. Useful for tuning `hybrid_alpha`, testing query expansion effectiveness, or validating after reindexing.\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| `test_cases` | string (JSON) | Array of test cases: `[{\"query\": \"...\", \"expected_filepath\": \"...\"}, ...]` |\n\n**Metrics:**\n- **MRR@5** (Mean Reciprocal Rank): Average of 1/rank for expected documents. 1.0 = always first result.\n- **Recall@5**: Fraction of expected documents found in top 5 results. 1.0 = all found.\n\n---\n\n## Configuration\n\nKnowledge RAG is fully configurable via a `config.yaml` file in the project root. If no `config.yaml` exists, sensible defaults are used — the system works out of the box with zero configuration.\n\n### Quick Start\n\n```bash\n# Option 1: Use a preset\ncp presets/cybersecurity.yaml config.yaml    # Offensive/defensive security, CTFs\ncp presets/developer.yaml config.yaml        # Software engineering, APIs, DevOps\ncp presets/research.yaml config.yaml         # Academic research, papers, studies\ncp presets/general.yaml config.yaml          # Blank slate, pure semantic search\n\n# Option 2: Start from the documented template\ncp config.example.yaml config.yaml\n# Edit config.yaml to your needs\n```\n\nRestart Claude Code after changing `config.yaml`.\n\n### config.yaml Structure\n\n```yaml\n# Paths — where your documents live\npaths:\n  documents_dir: \"./documents\"    # Scanned recursively\n  data_dir: \"./data\"              # Index storage\n  models_cache_dir: \"./models_cache\"  # Persistent embedding model cache\n\n# Documents — what gets indexed and how\ndocuments:\n  supported_formats:              # File types to index\n    - .md\n    - .txt\n    - .pdf\n    - .docx\n    - .ipynb\n    # - .py                       # Uncomment to index code\n  exclude_patterns:               # Glob patterns to skip\n    - \"node_modules\"\n    - \".venv\"\n    - \"__pycache__\"\n  chunking:\n    chunk_size: 1000              # Max chars per chunk\n    chunk_overlap: 200            # Shared chars between chunks\n\n# Models — AI models for search (all run locally, no API keys)\nmodels:\n  embedding:\n    model: \"BAAI/bge-small-en-v1.5\"   # ONNX, ~33MB, auto-downloaded\n    dimensions: 384\n    gpu: false                         # Set true + pip install knowledge-rag[gpu]\n  reranker:\n    enabled: true                      # Falls back to RRF if model is unavailable\n    model: \"Xenova/ms-marco-MiniLM-L-6-v2\"\n    top_k_multiplier: 3               # Candidates fetched before reranking\n\n# Search — result limits and collection name\nsearch:\n  default_results: 5\n  max_results: 20\n  collection_name: \"knowledge_base\"   # Change for separate knowledge bases\n\n# Categories — auto-tag documents by folder path\n# Set to {} to disable categorization entirely\ncategory_mappings:\n  \"security/redteam\": \"redteam\"\n  \"security/blueteam\": \"blueteam\"\n  \"notes\": \"notes\"\n\n# Keyword routing — prioritize categories based on query keywords\n# Set to {} for pure semantic search with no routing bias\nkeyword_routes:\n  redteam:\n    - pentest\n    - exploit\n    - privilege escalation\n\n# Query expansion — expand abbreviations for better BM25 recall\n# Set to {} for no expansion (search terms used as-is)\nquery_expansions:\n  sqli:\n    - sql injection\n    - sqli\n  privesc:\n    - privilege escalation\n    - privesc\n```\n\n\u003e See `config.example.yaml` for the fully documented template with explanations for every field.\n\n### Presets\n\nPre-built configurations for common use cases:\n\n| Preset | File | Categories | Keywords | Expansions | Best For |\n|--------|------|-----------|----------|-----------|----------|\n| **Cybersecurity** | `presets/cybersecurity.yaml` | 8 | 200+ | 69 | Red/Blue Team, CTFs, threat hunting, exploit dev |\n| **Developer** | `presets/developer.yaml` | 9 | 150+ | 50+ | Full-stack dev, APIs, DevOps, cloud, databases |\n| **Research** | `presets/research.yaml` | 9 | 100+ | 40+ | Academic papers, thesis, lab notebooks, datasets |\n| **General** | `presets/general.yaml` | 0 | 0 | 0 | Blank slate — pure semantic search, no domain logic |\n\n**Creating your own preset**: Copy `config.example.yaml`, fill in your categories/keywords/expansions, save to `presets/your-domain.yaml`.\n\n### Configuration Reference\n\n#### Paths\n\n| Field | Default | Description |\n|-------|---------|-------------|\n| `paths.documents_dir` | `./documents` | Root folder scanned recursively for documents |\n| `paths.data_dir` | `./data` | Internal storage for ChromaDB and index metadata |\n| `paths.models_cache_dir` | `./models_cache` | Persistent cache for embedding models (~250MB). Survives reboots |\n\nRelative paths resolve from the project root. Absolute paths work too.\n\n#### Documents\n\n| Field | Default | Description |\n|-------|---------|-------------|\n| `documents.supported_formats` | .md .txt .pdf .py .json .docx .xlsx .pptx .csv .ipynb | File extensions to index |\n| `documents.exclude_patterns` | `[]` (empty) | Glob patterns for files/dirs to skip during indexing |\n| `documents.chunking.chunk_size` | 1000 | Max characters per chunk |\n| `documents.chunking.chunk_overlap` | 200 | Characters shared between consecutive chunks |\n\n**Chunking guidelines**: Short notes → 500/100. General use → 1000/200. Long technical docs → 1500/300.\n\nFor `.md` files, chunking splits at `##` and `###` header boundaries first. Sections larger than `chunk_size` are sub-chunked with overlap. Non-markdown files use fixed-size chunking.\n\n#### Models\n\n| Field | Default | Description |\n|-------|---------|-------------|\n| `models.embedding.model` | `BAAI/bge-small-en-v1.5` | Embedding model (ONNX, runs locally) |\n| `models.embedding.dimensions` | 384 | Vector dimensions (must match model) |\n| `models.embedding.gpu` | false | Enable CUDA GPU acceleration. Requires `pip install knowledge-rag[gpu]` |\n| `models.reranker.enabled` | true | Enable cross-encoder reranking |\n| `models.reranker.model` | `Xenova/ms-marco-MiniLM-L-6-v2` | Reranker model |\n| `models.reranker.top_k_multiplier` | 3 | Fetch N*multiplier candidates for reranking |\n\nIf the reranker model is not available locally and the machine cannot download it, search now falls back to the RRF order from hybrid semantic+BM25 retrieval. This keeps `search_knowledge` available offline, but result ordering may be less precise for ambiguous queries until the reranker model is cached.\n\n**Embedding model options** (fastest → most accurate):\n- `BAAI/bge-small-en-v1.5` — 384D, ~33MB (default)\n- `BAAI/bge-base-en-v1.5` — 768D, ~130MB\n- `BAAI/bge-large-en-v1.5` — 1024D, ~335MB\n- `intfloat/multilingual-e5-small` — 384D, 100+ languages\n\n\u003e **Warning**: Changing the embedding model after indexing requires `reindex_documents(full_rebuild=True)`.\n\n#### Search\n\n| Field | Default | Description |\n|-------|---------|-------------|\n| `search.default_results` | 5 | Results returned when no limit specified |\n| `search.max_results` | 20 | Hard cap even if client requests more |\n| `search.collection_name` | `knowledge_base` | ChromaDB collection — change for separate KBs |\n\n#### Categories\n\nMap folder paths to category names. Documents in matching folders get auto-tagged, enabling filtered searches.\n\n```yaml\ncategory_mappings:\n  \"security/redteam\": \"redteam\"\n  \"security\": \"security\"\n```\n\nSet `category_mappings: {}` to disable — documents are still searchable, just without category filters.\n\n#### Keyword Routing\n\nRoute queries to categories based on keywords. When a query contains listed keywords, results from that category are prioritized (not filtered — other categories still appear, ranked lower).\n\n```yaml\nkeyword_routes:\n  redteam:\n    - pentest\n    - exploit\n    - sqli\n```\n\nSingle-word keywords use regex word boundaries (`\\b`) — \"api\" won't match \"RAPID\". Multi-word keywords use substring matching.\n\nSet `keyword_routes: {}` for pure semantic search.\n\n#### Query Expansion\n\nExpand search terms with synonyms before BM25 search. Supports single tokens, bigrams, and full query matches.\n\n```yaml\nquery_expansions:\n  sqli:\n    - sql injection\n    - sqli\n  k8s:\n    - kubernetes\n    - k8s\n```\n\nSet `query_expansions: {}` for no expansion.\n\n### Hybrid Search Tuning\n\n| hybrid_alpha | Behavior | Best For |\n|--------------|----------|----------|\n| 0.0 | Pure BM25 keyword | Exact terms, CVEs, tool names |\n| 0.3 | Keyword-heavy **(default)** | Technical queries with specific terms |\n| 0.5 | Balanced | General queries |\n| 0.7 | Semantic-heavy | Conceptual queries, related topics |\n| 1.0 | Pure semantic | \"How to...\" questions, abstract concepts |\n\n---\n\n## Project Structure\n\n```\nknowledge-rag/\n├── mcp_server/\n│   ├── __init__.py          # Stdout protection + version\n│   ├── config.py            # YAML config loader + defaults\n│   ├── ingestion.py         # 20 parsers, chunking, metadata extraction\n│   └── server.py            # MCP server, ChromaDB, BM25, reranker, 12 tools\n├── config.example.yaml      # Documented config template (copy to config.yaml)\n├── config.yaml              # Your active configuration (git-ignored)\n├── presets/                  # Ready-to-use domain configurations\n│   ├── cybersecurity.yaml\n│   ├── developer.yaml\n│   ├── research.yaml\n│   └── general.yaml\n├── documents/               # Your documents (scanned recursively)\n├── data/\n│   ├── chroma_db/           # ChromaDB vector database\n│   └── index_metadata.json  # Incremental indexing state\n├── models_cache/            # Persistent embedding model cache\n├── tests/                   # Test suite (82 tests)\n├── install.sh               # Linux/macOS installer\n├── install.ps1              # Windows installer\n├── venv/                    # Python virtual environment\n├── requirements.txt\n├── pyproject.toml\n├── LICENSE\n└── README.md\n```\n\n---\n\n## Troubleshooting\n\n### Python version mismatch\n\nRequires Python 3.11 or newer.\n\n```bash\npython --version    # Must be 3.11+\n```\n\n### FastEmbed model download fails\n\nOn first run, FastEmbed downloads models to `models_cache/`. If the download fails:\n\n```bash\n# Clear cache and retry\n# Windows:\nrmdir /s /q models_cache\n\n# Linux/macOS:\nrm -rf models_cache\n\n# Then restart the MCP server\n```\n\n### Reranker model download fails\n\nThe reranker is lazy-loaded on the first query. If the model is not cached and the machine is offline, search continues without reranking and uses the RRF order from hybrid retrieval. To keep reranking enabled offline, run one query while online or pre-populate `models_cache/` on the target machine.\n\nYou can still disable reranking explicitly in `config.yaml`:\n\n```yaml\nmodels:\n  reranker:\n    enabled: false\n```\n\nDisabling reranking reduces memory use and avoids first-query model loading. The tradeoff is lower ranking precision, especially when several chunks match the same terms but only one is the best answer.\n\n### ChromaDB index crashes on startup\n\nNative ChromaDB failures can terminate Python before normal exception handling runs. Startup now probes ChromaDB in a child process before initializing the MCP server. If the probe crashes, the active `chroma_db/` and `index_metadata.json` are moved to `data/backups/auto-repair-*`, and the next startup can rebuild a clean index.\n\nThe same guarded behavior is available through either console script:\n\n```bash\nknowledge-rag\nknowledge-rag-guarded\n```\n\n### Index is empty\n\n```bash\n# Check documents directory has files\nls documents/\n\n# Force reindex via Claude Code:\n# reindex_documents(force=True)\n\n# Or nuclear rebuild if model changed:\n# reindex_documents(full_rebuild=True)\n```\n\n### MCP server not loading\n\n1. Check `~/.claude.json` exists and has valid JSON in the `mcpServers` section\n2. Verify paths use double backslashes (`\\\\`) on Windows\n3. Restart Claude Code completely\n4. Run `claude mcp list` to check connection status\n\n### \"Failed to connect\" error\n\nThe MCP server uses stdout for JSON-RPC communication. If a library prints to stdout during init, the stream gets corrupted. v3.4.3+ includes stdout protection that prevents this. If you're on an older version, upgrade:\n\n```bash\npip install --upgrade knowledge-rag\n```\n\n### Slow first query\n\nThe cross-encoder reranker model is lazy-loaded on the first query. This adds a one-time ~2-3 second delay for model download and loading. Subsequent queries are fast. If the model cannot be loaded, search falls back to RRF ordering and does not retry loading the reranker until the server restarts.\n\n### Memory usage\n\nWith ~200 documents, expect ~300-500MB RAM. The embedding model (~200MB ONNX runtime resident, lazy-loaded on first query since v3.8.0) and reranker (~25MB, lazy-loaded) are loaded into memory only when actually used. For very large knowledge bases (1000+ documents), consider enabling GPU acceleration and using exclude patterns to limit index scope.\n\n### Multiple MCP clients spawn duplicate servers\n\nMCP stdio is one process per client by protocol — multiple Claude Code windows, Claude Desktop + IDE, etc. each spawn their own `knowledge-rag` process. Since v3.8.0 idle processes are cheap (no embedding model loaded until first query). If you've measured and want a hard cap of one server per data directory, opt in:\n\n```bash\nexport KNOWLEDGE_RAG_SINGLE_INSTANCE=1\n```\n\nA second instance exits immediately with code 75. Default is OFF (multi-client friendly). Full guide: [docs/single-instance.md](docs/single-instance.md). Sample MCP config: [examples/mcp-config-single-instance.json](examples/mcp-config-single-instance.json).\n\n---\n\n## Changelog\n\n### v3.9.0 (2026-05-10) — Quality Gate\n\n**Major governance + CI hardening release. No runtime behavior change in `mcp_server/`. Public API surface unchanged from v3.8.1.**\n\n- **NEW** Quality Gate workflow (`.github/workflows/quality-gate.yml`) enforcing the 7 pillars on every PR: Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality. 35+ status checks total.\n- **NEW** Nightly resilience workflow (`.github/workflows/nightly.yml`): chaos suite (failure injection), 1h soak test (50K-iteration loop), determinism check (full suite × 3), mutation testing (mutmut). Auto-opens GitHub issue on any nightly failure.\n- **NEW** Performance benchmark suite under `bench/` (12 microbenchmarks, pytest-benchmark) with 10% regression gate on every PR.\n- **NEW** Public performance dashboard via GitHub Pages (`.github/workflows/bench-pages.yml`) — chart of latency/throughput per commit. Dormant until repo Pages is enabled.\n- **NEW** Property-based fuzzing of all parsers via Hypothesis (`tests/test_ingestion_property.py`) — 200 random examples per CI run.\n- **NEW** Memory baseline regression tests (`tests/test_memory_baseline.py`, cross-platform via psutil) — RSS bounded under 1000 queries; nightly soak amplifies to 50K iterations.\n- **NEW** Property/locale/format/preset matrices (`tests/test_presets.py`, `tests/test_locale.py`, `tests/test_format_smoke.py`).\n- **NEW** Backwards-compatibility regression tests (`tests/test_backwards_compat.py`) — legacy YAML configs from v3.6.0 / v3.7.0 still parse; all 12 MCP tool parameter names frozen.\n- **NEW** AST-based public API surface diff (`scripts/check_api_surface.py`) — any breaking change blocks merge, baseline at `.github/api-surface-baseline.json`.\n- **NEW** CHANGELOG enforcement (`scripts/check_changelog.py`) — user-facing PRs must add a bullet under `## Unreleased`; bypass via `skip-changelog` label.\n- **NEW** Test count anti-regression (`scripts/check_test_count.py`) — guards against silent test deletion.\n- **NEW** Conventional commits required on every PR title (commitlint via `amannn/action-semantic-pull-request`).\n- **NEW** mypy `--strict` rolling out per-module (currently `instance_lock.py` + `preflight.py` + `scripts/`); interrogate docstring coverage ≥ 80%; radon, vulture, PR-size guard report-only.\n- **NEW** CI matrix expanded to 9 cells: Linux + Windows + **macOS** × 3.11 + 3.12 + **3.13** (all required at v3.9.0; macOS / 3.13 promoted from experimental after two clean cycles).\n- **NEW** Governance docs: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `.github/PULL_REQUEST_TEMPLATE.md`, 3 issue templates, expanded `CODEOWNERS`.\n- **NEW** Pre-commit hooks: ruff, gitleaks, version-sync, conventional commits.\n- **CHORE** `.github/codecov.yml` enforcing coverage trend gate (-0.5pp blocks; new code ≥ 70%).\n\n### v3.8.1 (2026-05-10) — hotfix\n\n- **FIX (critical)**: `FastEmbedEmbeddings.__call__` no longer returns vectors of zeros when the ONNX model fails to load or `embed()` raises. The previous behavior silently corrupted the index — ChromaDB stored zero embeddings, `count()` reported normal numbers, smart-reindex skipped the bad chunks, and queries returned garbage scores with no error visible. Now raises `EmbeddingModelLoadError` / `EmbeddingError`. (#36)\n- **FIX**: Sticky `_load_failed` flag — after a load failure, subsequent calls re-raise immediately instead of looping through HuggingFace download attempts (was the \"frozen query\" UX in v3.8.0).\n- **NEW**: Sanity checks in `__call__` — embed count and dim mismatches raise `EmbeddingError` instead of silently returning malformed vectors.\n- **TEST**: 7 new regression cases in `tests/test_lazy_embeddings.py`, including `test_does_not_return_zero_vectors_silently` as a guard for the whole class of bug.\n- **NOTE**: This is a pre-existing bug in master, not introduced by v3.8.0. v3.8.0 lazy-load expanded the impact (failures moved to query time). All v3.8.0 users should upgrade.\n\n### v3.8.0 (2026-05-10)\n\n- **NEW**: Lazy-load FastEmbed embedding model (~200MB ONNX runtime). Loads on first query instead of startup — idle `knowledge-rag` processes are now cheap, which matters when MCP stdio clients spawn parallel server processes (multiple Claude Code windows, Claude Desktop + IDE, etc.). Public API unchanged. (#32)\n- **NEW**: Opt-in single-instance guard via `KNOWLEDGE_RAG_SINGLE_INSTANCE=1` env var. **OFF by default** — multi-client MCP usage continues to work unchanged. When enabled, a second server process for the same `data_dir` exits with code 75 (`EX_TEMPFAIL`). Includes stale-PID recovery and SIGINT/SIGTERM handlers. See [docs/single-instance.md](docs/single-instance.md). (#33, original concept by @Hohlas in #31)\n- **NEW**: `examples/mcp-config-single-instance.json` — sample MCP client config for the opt-in guard.\n- **DOCS**: New `docs/single-instance.md` — when to use, when NOT to use, troubleshooting, full activation reference.\n- **DOCS**: README troubleshooting section for \"Multiple MCP clients spawn duplicate servers\" + memory-usage note for lazy embeddings.\n- **CHORE**: Sync version across `pyproject.toml`, `mcp_server/__init__.py`, and `npm/package.json` (was drifting since v3.5.x).\n- **CHORE**: pytest `tmp_path_retention_count=1` to avoid Windows atexit cleanup race in CI.\n- **ROADMAP**: Tracked v4.0 shared-service architecture (one daemon, many thin MCP clients) as the long-term fix for multi-process resource duplication. (#34)\n\n### v3.9.1 (2026-06-08)\n\n- **FIX**: Expand `~` in `config.yaml` path values (`documents_dir`, `data_dir`, `models_cache_dir`) via `expanduser()` on all platforms (#86).\n- **FIX**: Warn when `documents_dir` resolves to a non-existent path instead of silently indexing zero files.\n- **FIX**: File watcher now uses accumulate-mode debounce — bulk file copies no longer starve the reindex trigger.\n- **FIX**: Concurrent `index_all()` calls are serialized via `_index_lock` to prevent ChromaDB SQLite corruption.\n- **FIX**: `collection.add()` is batched (500 chunks/call) to cap memory usage during large reindex operations.\n- **NEW**: `KNOWLEDGE_RAG_WATCHER_DISABLED=1` env var to disable the file watcher for troubleshooting.\n- **NEW**: Progress logging every 10% for reindex operations with \u003e100 documents.\n\n### Unreleased\n\n- **FIX**: Startup preflight probes ChromaDB in a child process and moves crashing persistent indexes to `data/backups/auto-repair-*` before MCP initialization.\n- **FIX**: Reranker load failures now fall back to RRF ordering instead of failing `search_knowledge` on offline machines.\n- **FIX**: Virtualenv project-root detection now handles Python symlinks that resolve to the system interpreter.\n- **NEW**: `knowledge-rag-guarded` console script kept as an explicit guarded startup alias.\n\n### v3.6.2 (2026-04-23)\n\n- **INFRA**: NPM provenance attestation (SLSA supply chain security), full README on npm page\n- **DOCS**: Reorganize Installation section — add NPX and Docker install methods, update What's New to v3.6.0\n\n### v3.6.0 (2026-04-23)\n\n- **NEW**: Multi-language code parsing — C (`.c`), C++ (`.cpp`/`.h`), JavaScript (`.js`/`.jsx`), TypeScript (`.ts`/`.tsx`) with per-language function/class/import extraction\n- **NEW**: XML parser (`.xml`) — root element and namespace metadata extraction\n- **NEW**: All 8 new formats default enabled — no config change needed\n- **NEW**: NPM wrapper (`npx knowledge-rag`) + Docker image (`ghcr.io/lyonzin/knowledge-rag`)\n- **NEW**: Automated release pipeline — PyPI (Trusted Publishing), NPM, Docker GHCR\n- **IMPROVED**: Code parser reports correct `language` metadata per file type (was hardcoded to `\"python\"` for all code files)\n\n### v3.5.2 (2026-04-16)\n\n- **NEW**: Auto-discovery of CUDA 12 DLLs from pip-installed NVIDIA packages — no manual PATH configuration needed\n- **NEW**: Graceful GPU→CPU fallback with `[WARN]` log when CUDA init fails (missing drivers, wrong version, etc.)\n- **FIX**: Explicit `CPUExecutionProvider` when `gpu: false` — eliminates noisy CUDA probe errors in logs\n- **FIX**: BASE_DIR resolution now correctly prefers directories with `config.yaml` over those with only `config.example.yaml` (fixes editable installs)\n\n### v3.5.1 (2026-04-16)\n\n- **FIX**: Removed Python upper bound constraint (`\u003c3.13` → `\u003e=3.11`). Python 3.13 and 3.14 now supported — onnxruntime ships wheels for both.\n\n### v3.5.0 (2026-04-16)\n\n- **NEW**: Optional GPU acceleration for ONNX embeddings — `pip install knowledge-rag[gpu]` + `models.embedding.gpu: true` in config. 5-10x faster indexing on NVIDIA GPUs with automatic CPU fallback.\n- **DOCS**: Supported formats table added to README (20 formats)\n\n### v3.4.3 (2026-04-16)\n\n- **FIX**: Correct stdout protection via save/restore pattern — `__init__.py` saves original stdout and redirects to stderr during init, `server.py main()` restores it before `mcp.run()`. v3.4.2's global redirect broke MCP JSON-RPC response channel.\n\n### v3.4.1 (2026-04-16)\n\n- **FIX**: `pip install knowledge-rag` now auto-detects project directory from venv location\n- **NEW**: `install.sh` — Linux/macOS installer with pip and from-source modes\n- **IMPROVED**: BASE_DIR resolution chain: env var → source dir → venv parent → CWD → fallback\n\n### v3.4.0 (2026-04-16)\n\n- **NEW**: `models_cache_dir` — persistent embedding model cache, prevents re-download after reboots\n- **NEW**: `exclude_patterns` — glob-based file/directory exclusion during indexing\n- **NEW**: Jupyter Notebook (.ipynb) parser — extracts markdown and code cell sources only\n- **NEW**: MCP stdout protection — redirects stdout to stderr before server start\n- **NEW**: File watcher resilience — graceful fallback when Linux inotify limits are reached\n- **NEW**: MetaTrader (.mq4, .mqh) support — opt-in code parsing\n- **NEW**: 23 new tests (exclude patterns, ipynb parser, stdout protection)\n- Community credit: [@Hohlas](https://github.com/Hohlas) ([PR #18](https://github.com/lyonzin/knowledge-rag/pull/18))\n\n### v3.3.x\n\n- **v3.3.2**: Full type validation on YAML config, bounds checking, version sync\n- **v3.3.1**: YAML null value crash fix, presets bundled in pip wheel, `knowledge-rag init` CLI\n- **v3.3.0**: YAML configuration system, 4 domain presets, generic use support\n\n### v3.2.x\n\n- **v3.2.4**: Symlink support with circular loop protection\n- **v3.2.3**: BASE_DIR smart detection for pip installs\n- **v3.2.2**: Plug-and-play pip install, `KNOWLEDGE_RAG_DIR` env var\n- **v3.2.1**: Auto-recovery from corrupted ChromaDB\n- **v3.2.0**: Parallel BM25 + Semantic search, adjacent chunk retrieval\n\n### v3.1.x\n\n- **v3.1.1**: Code block protection in markdown chunker, AAR category, 14 CVE aliases\n- **v3.1.0**: DOCX/XLSX/PPTX/CSV support, file watcher, MMR diversification, PyPI publish\n\n### v3.0.0 (2026-03-19)\n\n- Replaced Ollama with FastEmbed (ONNX in-process)\n- Cross-encoder reranking, markdown-aware chunking, query expansion\n- 6 new MCP tools (12 total), auto-migration from v2.x\n\n\u003cdetails\u003e\n\u003csummary\u003ev2.x and earlier\u003c/summary\u003e\n\n- **v2.2.0**: `hybrid_alpha=0` skips Ollama, default changed from 0.5 to 0.3\n- **v2.1.0**: Mermaid architecture diagrams\n- **v2.0.0**: Hybrid search, RRF fusion, `hybrid_alpha` parameter\n- **v1.1.0**: Incremental indexing, query cache, chunk deduplication\n- **v1.0.1**: Auto-cleanup orphan folders, removed hardcoded paths\n- **v1.0.0**: Initial release\n\u003c/details\u003e\n\n---\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Acknowledgments\n\n- [ChromaDB](https://www.trychroma.com/) — Vector database\n- [FastEmbed](https://qdrant.github.io/fastembed/) — ONNX Runtime embeddings\n- [FastMCP](https://github.com/anthropics/mcp) — Model Context Protocol framework\n- [PyMuPDF](https://pymupdf.readthedocs.io/) — PDF parsing\n- [rank-bm25](https://github.com/dorianbrown/rank_bm25) — BM25 Okapi implementation\n- [Watchdog](https://github.com/gorakhargosh/watchdog) — File system monitoring\n- [python-docx](https://python-docx.readthedocs.io/) / [openpyxl](https://openpyxl.readthedocs.io/) / [python-pptx](https://python-pptx.readthedocs.io/) — Office document parsing\n- [PyYAML](https://pyyaml.org/) — YAML configuration parsing\n- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) — HTML parsing for URL ingestion\n\n---\n\n## Author\n\n**Lyon.**\n\nSecurity Researcher | Developer\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**[Back to Top](#knowledge-rag)**\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flyonzin%2Fknowledge-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flyonzin%2Fknowledge-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flyonzin%2Fknowledge-rag/lists"}