{"id":50772893,"url":"https://github.com/gvtret/doc-rag-mcp-server","last_synced_at":"2026-06-11T21:00:15.840Z","repository":{"id":362517638,"uuid":"1259192680","full_name":"gvtret/doc-rag-mcp-server","owner":"gvtret","description":"Local-first RAG over engineering documents (PDF/DOCX/DOC/MD/TXT), exposed via MCP to Cursor/Claude. AGPL-3.0.","archived":false,"fork":false,"pushed_at":"2026-06-04T14:19:38.000Z","size":574,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-04T16:10:55.608Z","etag":null,"topics":["agpl","cursor","embeddings","faiss","fastapi","local-rag","mcp","rag","semantic-search"],"latest_commit_sha":null,"homepage":"https://habr.com/ru/articles/1043346/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gvtret.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-04T09:20:23.000Z","updated_at":"2026-06-04T14:20:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gvtret/doc-rag-mcp-server","commit_stats":null,"previous_names":["gvtret/doc-rag-mcp-server"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/gvtret/doc-rag-mcp-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gvtret%2Fdoc-rag-mcp-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gvtret%2Fdoc-rag-mcp-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gvtret%2Fdoc-rag-mcp-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gvtret%2Fdoc-rag-mcp-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gvtret","download_url":"https://codeload.github.com/gvtret/doc-rag-mcp-server/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gvtret%2Fdoc-rag-mcp-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34217312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agpl","cursor","embeddings","faiss","fastapi","local-rag","mcp","rag","semantic-search"],"created_at":"2026-06-11T21:00:14.903Z","updated_at":"2026-06-11T21:00:15.834Z","avatar_url":"https://github.com/gvtret.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# doc-rag — local RAG over your documents\n\n[![tests](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/tests.yml/badge.svg)](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/tests.yml)\n[![lint](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/lint.yml/badge.svg)](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/lint.yml)\n[![build](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/build.yml/badge.svg)](https://github.com/gvtret/doc-rag-mcp-server/actions/workflows/build.yml)\n[![license: MIT](https://img.shields.io/badge/license-MIT-blue)](LICENSE)\n\n`doc-rag` is a local, offline-first knowledge base for engineering documentation:\n\n```\nPDF / DOCX / DOC / MD / TXT  →  Markdown  →  Chunks  →  FAISS  →  MCP / Cursor / Web UI\n```\n\n- **Offline after first model download.** No data leaves your machine.\n- **Cursor / Claude MCP integration** via Streamable HTTP on a single endpoint.\n- **Web UI** for upload, ingest, delete, and live status — no terminal required.\n- **Graceful degradation**: if FAISS isn't ready, lexical search keeps working and the user is told.\n\n## Requirements\n\n- Linux or WSL2 (Python ≥ 3.10)\n- ~2 GB RAM for embeddings; ~1 GB extra for Docling models on first parse\n- `antiword` is optional (only needed for legacy `.doc` files)\n- OCR for scanned PDFs is built into Docling (RapidOCR); no separate Tesseract install required\n- Node ≥ 20 (optional, v2.2+ — only needed to build the new Svelte `/ui-next/` page; the legacy inline `/ui` works without Node)\n\n## Quickstart\n\n```bash\n# uv is the official installer since v2.1; the bootstrap script installs it if missing.\ngit clone https://github.com/gvtret/doc-rag-mcp-server\ncd doc-rag\nbash scripts/bootstrap.sh        # installs uv, runs `uv sync --frozen`, creates .venv\ncp YOUR_FILES.pdf sources/incoming/\nuv run doc-rag ingest\nbash scripts/run_mcp_http.sh     # MCP/UI on http://127.0.0.1:3333\n```\n\nThen open `http://127.0.0.1:3333/ui` or point Cursor at `http://127.0.0.1:3333/mcp`.\n\n## Documentation\n\n| Guide | What's inside |\n| --- | --- |\n| [docs/install.md](docs/install.md) | venv + torch (CPU/GPU), system packages, OCR, config reference |\n| [docs/cli.md](docs/cli.md) | `doc-rag ingest / rebuild / delete / wipe / clean-orphans / clear-incoming` |\n| [docs/mcp.md](docs/mcp.md) | Cursor / Claude integration, Streamable HTTP + SSE, auth, rate limit |\n| [docs/ui.md](docs/ui.md) | Web UI: upload, dedup, ingest, delete, danger zone, degraded-mode banner |\n| [docs/deploy.md](docs/deploy.md) | Docker Compose, native systemd, deploy archive, remote MCP |\n| [docs/troubleshooting.md](docs/troubleshooting.md) | Torch/CUDA, MCP not visible, FAISS rebuild, PEP 668, OCR |\n| [docs/roadmap.md](docs/roadmap.md) | Versioning policy and the path to public v1.x.y |\n\nSee [CHANGELOG.md](CHANGELOG.md) for notable changes between releases.\n\n## Project layout\n\n```\ndoc-rag/\n├── sources/\n│   ├── incoming/      # drop new documents here\n│   └── archived/      # processed files (moved automatically after ingest)\n├── build/             # generated: docs_md/, chunks_jsonl/, embeddings/, index/, manifest.json\n├── config/config.yaml # main config\n├── src/doc_rag/       # Python package\n├── scripts/           # bootstrap, run_mcp_http, install_server_native, ...\n├── docker/            # Dockerfile\n├── systemd/           # service unit template\n├── .github/workflows/ # CI (tests, lint, Docker build)\n└── docs/              # see the table above\n```\n\n## Philosophy\n\nOffline-first · reproducible · vendor-independent · long-term maintainable.\nDesigned for standards, specs, manuals, and research documents.\n\n## Talks and articles\n\n- *Russian*, June 2026 — [«Как я научил оракула читать ГОСТы: история doc-rag, рассказанная по-старорусски»](https://habr.com/ru/articles/1043346/) on Habr. A pet-project narrative covering the same architecture that ships in this repository.\n\n## License\n\n`doc-rag` is licensed under the **MIT License** — see [LICENSE](LICENSE).\n\nThird-party dependency licenses are summarised in [NOTICE](NOTICE).\nReleases before v2.0.0 were AGPL-3.0-or-later (because the default PDF\nbackend was PyMuPDF); v2.0 switched to Docling, which is MIT, and\nrelicensed the project to match.\n\n## Contributing\n\nIssues and pull requests are welcome. Please read\n[CONTRIBUTING.md](CONTRIBUTING.md) for development setup, test\ninstructions, commit conventions, and the SemVer policy for the public\nsurface ([docs/roadmap.md § 1](docs/roadmap.md)).\n\nTo report a security issue, see [SECURITY.md](SECURITY.md). Please do\nnot open a public issue for vulnerabilities.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgvtret%2Fdoc-rag-mcp-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgvtret%2Fdoc-rag-mcp-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgvtret%2Fdoc-rag-mcp-server/lists"}