{"id":49840165,"url":"https://github.com/fangyuan025/hushdoc","last_synced_at":"2026-05-17T21:09:42.782Z","repository":{"id":354525483,"uuid":"1222105125","full_name":"Fangyuan025/hushdoc","owner":"Fangyuan025","description":"Chat with your documents — privately, offline, on your own machine. Local-first RAG over PDFs/DOCX/images with GPU-accelerated streaming, optional voice mode, multi-conversation history, and citation-anchored sources. Bilingual (中/EN). FastAPI + React + llama.cpp.","archived":false,"fork":false,"pushed_at":"2026-05-13T23:04:21.000Z","size":1690,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-14T00:58:03.503Z","etag":null,"topics":["bilingual","chromadb","document-ai","fastapi","llama-cpp","llm","local-llm","offline-first","pdf-chat","privacy","rag","react","typescript","voice-assistant","whisper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Fangyuan025.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-27T03:38:22.000Z","updated_at":"2026-05-13T23:04:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Fangyuan025/hushdoc","commit_stats":null,"previous_names":["fangyuan025/local-rag-pdf-assistant","fangyuan025/hushdoc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Fangyuan025/hushdoc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fangyuan025%2Fhushdoc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fangyuan025%2Fhushdoc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fangyuan025%2Fhushdoc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fangyuan025%2Fhushdoc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Fangyuan025","download_url":"https://codeload.github.com/Fangyuan025/hushdoc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fangyuan025%2Fhushdoc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33011340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bilingual","chromadb","document-ai","fastapi","llama-cpp","llm","local-llm","offline-first","pdf-chat","privacy","rag","react","typescript","voice-assistant","whisper"],"created_at":"2026-05-14T05:01:02.205Z","updated_at":"2026-05-17T21:09:42.775Z","avatar_url":"https://github.com/Fangyuan025.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤫 Hushdoc\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/Fangyuan025/hushdoc/releases\"\u003e\u003cimg alt=\"Release\" src=\"https://img.shields.io/github/v/release/Fangyuan025/hushdoc?style=for-the-badge\u0026color=2ea44f\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg alt=\"MIT\" src=\"https://img.shields.io/badge/license-MIT-yellow.svg?style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003ca href=\"#why\"\u003e\u003cimg alt=\"Local-only\" src=\"https://img.shields.io/badge/local--only-1f6feb.svg?style=for-the-badge\u0026logo=ghostery\u0026logoColor=white\"\u003e\u003c/a\u003e\n  \u003ca href=\"README.zh-CN.md\"\u003e\u003cimg alt=\"Bilingual\" src=\"https://img.shields.io/badge/中文-7c3aed.svg?style=for-the-badge\u0026logo=googletranslate\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.python.org/\"\u003e\u003cimg alt=\"Python 3.12\" src=\"https://img.shields.io/badge/python-3.12-3776AB.svg?style=for-the-badge\u0026logo=python\u0026logoColor=white\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://fastapi.tiangolo.com/\"\u003e\u003cimg alt=\"FastAPI\" src=\"https://img.shields.io/badge/FastAPI-009688.svg?style=for-the-badge\u0026logo=fastapi\u0026logoColor=white\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://react.dev/\"\u003e\u003cimg alt=\"React 19\" src=\"https://img.shields.io/badge/React-19-61DAFB.svg?style=for-the-badge\u0026logo=react\u0026logoColor=000\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ggml-org/llama.cpp\"\u003e\u003cimg alt=\"llama.cpp\" src=\"https://img.shields.io/badge/llama-cpp-FF6B6B.svg?style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.trychroma.com/\"\u003e\u003cimg alt=\"ChromaDB\" src=\"https://img.shields.io/badge/ChromaDB-FFCD42.svg?style=for-the-badge\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003eEnglish\u003c/b\u003e · \u003ca href=\"README.zh-CN.md\"\u003e中文\u003c/a\u003e\n  \u0026nbsp;|\u0026nbsp;\n  \u003ca href=\"https://github.com/Fangyuan025/hushdoc/releases\"\u003eReleases\u003c/a\u003e ·\n  \u003ca href=\"CHANGELOG.md\"\u003eChangelog\u003c/a\u003e\n\u003c/p\u003e\n\n\u003e **Chat with your documents — privately, offline, on your own machine.**\n\nDrop in a PDF, DOCX, EPUB, or even a phone photo of a page. Ask anything\nin English or Chinese. Answers stream in with inline citations and an\nin-app PDF viewer that highlights the exact source passage in yellow.\n**Nothing leaves your machine.**\n\n`🛡️ Local-first` · `🚀 GPU-accelerated` · `🌍 中 / EN` · `🎙️ Voice (en)`\n\n---\n\n## Why \u003ca id=\"why\"\u003e\u003c/a\u003e\n\nMost AI document tools ship your files to someone else's cloud. That's\nfine for a public PDF — not fine for a contract, an unpublished\nmanuscript, or anything covered by NDA. Hushdoc was built so you never\nmake that trade-off.\n\n| | Cloud RAG | Hushdoc |\n|---|---|---|\n| Documents stored on | Their servers | Your disk |\n| Inference runs on | Their GPUs | Your GPU / CPU |\n| Works air-gapped? | ❌ | ✅ |\n| You own the chat history? | ❌ | ✅ |\n\nThe only network calls are one-time HuggingFace downloads of the\nembedding / ASR / TTS models. After that you can pull the ethernet.\n\n---\n\n## Features\n\n**Documents** — PDF · DOCX · EPUB · images (OCR). Drag-and-drop,\nmulti-file, replace-or-append. Per-file `Search scope` toggle.\n\n**Chat** — Streaming markdown answers with code, tables, and LaTeX.\nBilingual (中/EN) — answers in the language you asked in. Multi-thread\nsidebar with auto-titled conversations.\n\n**Inline `[N]` citations** — Every fact-bearing sentence ends in a\nsmall numeric chip. Hover lifts a popover showing the exact paragraph\nfrom the cited chunk; click *View source* to open the PDF page with\nthe paragraph marked. The sources panel is exactly what the answer\nreferenced — no irrelevant chunks padding the list. Ungrounded\nsentences (pure synthesis / low confidence) get a soft wavy\nunderline so you know what to double-check.\n\n**Multi-variant regenerate** — Regenerate appends a new answer as a\nvariant on the same bubble; flip between versions with a\nChatGPT-style `\u003c N/M \u003e` pager. The active variant is what the next\nfollow-up sees as the prior reply.\n\n**Voice (opt-in)** — Push-to-talk mic (~1.5 s silence auto-stop) +\nstreaming TTS that reads each sentence as it's generated. English only.\n\n**Settings** — Live model swap by typing a new `.gguf` path; auto-clean\nlocal data on browser close (opt-in checkbox). Persists to\n`hushdoc_config.json`.\n\n---\n\n## Quick start\n\nRequirements: **Windows 10/11, Linux, or macOS** · Python 3.12 ·\nNode 20+ · ~10 GB free disk. NVIDIA GPU optional (auto-detected).\n\n```powershell\n# Windows -- double-click these in order\n.\\setup.bat        # one-time: venv, npm install, llama-server, default model\n.\\hushdoc.bat      # every time after\n```\n\n```bash\n# macOS / Linux\nchmod +x setup.sh dev.sh\n./setup.sh         # one-time\n./dev.sh           # every time after\n```\n\n`setup` is idempotent — re-run after `git pull` and only the dirty\nsteps re-execute. It auto-picks CUDA or CPU build of `llama-server`\nbased on `nvidia-smi`; override with `-Cpu` / `-GpuBuild` / `-Force`\n(Windows) or `--cpu` / `--gpu-build` / `--force` (Unix). Default model\nis Qwen3-1.7B Q4_K_M (~1.2 GB).\n\nThe app opens at \u003chttp://localhost:5173\u003e. **First answer takes\n~15 s** (model warmup); subsequent ones stream in within a second.\n\n### Use a different model\n\nThree equivalent paths:\n\n1. Settings ⚙ → paste any `.gguf` path → *Save*. Hushdoc hot-swaps\n   `llama-server` with no restart.\n2. Drop a `.gguf` at `./models/model.gguf` and re-launch.\n3. `LLAMA_MODEL_PATH=/path/to/your.gguf` before launching.\n\nHushdoc speaks the OpenAI-compatible llama.cpp API, so anything llama.cpp\nloads works: Qwen3-4B, Mistral-7B, Llama-3.1-8B, DeepSeek-R1, etc.\nReasoning-model `\u003cthink\u003e` blocks are stripped automatically.\n\n---\n\n## Under the hood\n\nA few engineering choices that take Hushdoc past \"embed-and-pray\":\n\n- **Hybrid retrieval.** BM25 + dense embedding fuse via Reciprocal Rank\n  Fusion. Catches exact filenames / model versions / error codes the\n  bi-encoder flattens. Mode via `HUSHDOC_RETRIEVAL_MODE=hybrid|dense|bm25`.\n- **Cross-encoder reranker.** Wider bi-encoder recall, then cross-encoder\n  rescore — latency where it matters.\n- **Per-document summary cache.** Each file gets one LLM summary at\n  ingest, fed into every prompt so \"which of these is about X?\" works.\n- **Session chunk memory.** Chunks from earlier turns get mixed back into\n  the candidate pool on follow-ups, persisted across backend restarts.\n- **GPU auto-detect** for the embedding + reranker; override via\n  `HUSHDOC_EMBED_DEVICE=cpu|cuda`.\n- **Streaming `\u003cthink\u003e` stripper** for reasoning models (state machine\n  survives split tokens).\n- **Heartbeat shutdown** — close the browser, the server self-exits and\n  the launcher offers to wipe local data.\n\n**Stack:** FastAPI + React 19 + Vite + Tailwind/shadcn ·\nllama.cpp (`llama-server`) · ChromaDB + sentence-transformers\n(`all-MiniLM-L6-v2`) · IBM Docling · Whisper-base.en + Kokoro-82M\nfor voice.\n\n---\n\n## Quality\n\nNumbers, not vibes. Hushdoc ships an offline [Ragas](https://github.com/explodinggradients/ragas)\nharness that scores the full RAG pipeline against a labelled question\nset — using the same local llama.cpp as the judge LLM, so the whole\nevaluation is air-gapped.\n\n**Run setup** — v0.6.4 RAG pipeline · indexed corpus is the original\n[*Attention Is All You Need*](https://arxiv.org/abs/1706.03762) paper\n(42 chunks after Docling ingest) · bundled **Qwen3-1.7B-Q4_K_M**\nserves as both the generator AND the Ragas judge LLM (no external API\nin the loop). Three columns below — **CP** = Context Precision\n(fraction of top-k chunks on-topic), **F** = Faithfulness (every\nanswer claim traces back to a retrieved chunk, i.e. no hallucination),\n**AR** = Answer Relevancy (answer actually addresses the question):\n\n| # | Question | CP | F | AR |\n|---|---|---:|---:|---:|\n| 1 | What dataset was used for the English-German translation experiments? | 0.967 | — | 0.996 |\n| 2 | What is the dimensionality of the model (`d_model`) in the base Transformer? | 0.750 | 1.000 | 1.000 |\n| 3 | How many encoder and decoder layers does the base Transformer have? | 1.000 | — | 0.927 |\n| | **Mean** | **0.906** | **1.000** | **0.974** |\n\nA `—` means Ragas's claim-extractor couldn't pull a checkable claim\nout of that answer (common when the answer is a single short\nfactoid), so the question is skipped for that metric — it does NOT\nmean \"failed\".\n\nReproduce against your own corpus + question set:\n\n```bash\n# 1. One-time: the eval-only extras (ragas, datasets, pyarrow).\n#    Kept out of the main requirements so the chat path stays slim.\npip install -r requirements-eval.txt\n\n# 2. Score against your own labelled test set. The file is a JSON\n#    list of {question, ground_truth} objects; index the relevant\n#    documents into Hushdoc first, then point evaluate.py at it.\npython evaluate.py \\\n  --test-set my_questions.json \\\n  --include-context-precision \\\n  --include-faithfulness\n```\n\nMinimal `my_questions.json`:\n\n```json\n[\n  {\"question\": \"What is X?\", \"ground_truth\": \"X is ...\"},\n  {\"question\": \"How does Y work?\", \"ground_truth\": \"Y works by ...\"}\n]\n```\n\nResults land under `eval_results/` as paired JSON + CSV (per-question\nbreakdown in the CSV).\n\n---\n\n## Notes\n\n- **Air-gapped install:** copy `~/.cache/huggingface` from a connected\n  machine, drop a `.gguf` at `./models/`, and you're set.\n- **Auto-cleanup on exit** currently lives in `hushdoc.bat` / `.ps1`\n  only; `dev.sh` users Ctrl+C and clean up by hand.\n- **Voice is English-only** (Whisper-base.en + Kokoro-82M). Text chat\n  is fully bilingual.\n- Full release notes in [CHANGELOG.md](CHANGELOG.md).\n\n---\n\n## License\n\nMIT — see [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffangyuan025%2Fhushdoc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffangyuan025%2Fhushdoc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffangyuan025%2Fhushdoc/lists"}