{"id":50840005,"url":"https://github.com/vimalyad/doc-buddy","last_synced_at":"2026-06-14T06:06:21.292Z","repository":{"id":356129421,"uuid":"1231010431","full_name":"vimalyad/doc-buddy","owner":"vimalyad","description":"DocBuddy is a full-stack, production-ready Retrieval-Augmented Generation (RAG) application. Designed as a personalized version of Google's NotebookLM, it allows users to upload their own documents (PDFs, TXTs, CSVs) and have highly grounded, context-aware conversations with them.","archived":false,"fork":false,"pushed_at":"2026-05-06T19:40:29.000Z","size":224,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-06T19:40:34.475Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vimalyad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-06T14:35:51.000Z","updated_at":"2026-05-06T19:40:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vimalyad/doc-buddy","commit_stats":null,"previous_names":["vimalyad/doc-buddy"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vimalyad/doc-buddy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fdoc-buddy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fdoc-buddy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fdoc-buddy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fdoc-buddy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vimalyad","download_url":"https://codeload.github.com/vimalyad/doc-buddy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Fdoc-buddy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34310809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-14T06:06:20.438Z","updated_at":"2026-06-14T06:06:21.279Z","avatar_url":"https://github.com/vimalyad.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DocBuddy\n\nDocBuddy is a full-stack RAG (Retrieval-Augmented Generation) application designed to mirror the Google NotebookLM experience. It allows users to upload documents and have grounded, citation-rich conversations with their data.\n\n## ✨ Key Features\n- **Intelligent RAG Pipeline**: End-to-end document processing from ingestion to grounded answer generation.\n- **Interactive Citations**: Hover over AI-generated citations (e.g., `[1]`, `p. 3`) to see the exact text snippet retrieved from your document.\n- **Multi-Format Support**: Seamlessly parses and indexes **PDF, TXT, and CSV** files.\n- **Smart Query Rewriting**: Automatically optimizes user questions into descriptive semantic search queries for higher retrieval accuracy.\n- **Premium UI/UX**: A responsive, dark-mode dual-pane interface with auto-focusing chat and floating toast notifications.\n- **Zero-Persistence Option**: Configure the backend to wipe state on restart or persist data for production.\n\n## 🧠 The RAG Pipeline\n\n### 1. Ingestion \u0026 Chunking\nDocBuddy uses a **Recursive Character Splitting** strategy:\n- **Chunk Size**: 2,000 characters\n- **Overlap**: 400 characters\n- **Strategy**: The splitter recursively tries to break text at double newlines, single newlines, and spaces. This ensures that paragraphs and sentences are kept together, providing the LLM with coherent context.\n\n### 2. Embedding \u0026 Storage\n- **Model**: `sentence-transformers/all-MiniLM-L6-v2` (via Hugging Face Inference API).\n- **Vector Store**: **Qdrant Cloud**. We use 384-dimensional vectors with Cosine similarity to find the most relevant document chunks.\n\n### 3. Retrieval \u0026 Generation\n- **Retrieval**: The system fetches the top 5 most relevant chunks for every query.\n- **LLM**: Powered by **Llama 3** (via Groq API) for lightning-fast, high-quality reasoning.\n- **Groundedness**: A strict system prompt ensures the AI only answers based on the provided context and cites its sources using bracketed markers.\n\n## 🛠 Tech Stack\n- **Frontend**: React, TypeScript, Vite, Tailwind CSS, Lucide React.\n- **Backend**: Node.js, Express, TypeScript, LangChain.js.\n- **Database**: Qdrant Cloud (Vector), Local JSON (Metadata).\n\n## ⚙️ Setup \u0026 Installation\n\n### Backend\n1. `cd backend`\n2. `npm install`\n3. Create a `.env` file with:\n   ```env\n   GROQ_API_KEY=your_key\n   HUGGINGFACEHUB_API_TOKEN=your_token\n   QDRANT_URL=your_qdrant_url\n   QDRANT_API_KEY=your_qdrant_key\n   PORT=5000\n   ```\n4. `npm run build \u0026\u0026 npm start`\n\n### Frontend\n1. `cd frontend`\n2. `npm install`\n3. Create a `.env` file with:\n   ```env\n   VITE_API_URL=http://localhost:5000\n   ```\n4. `npm run dev`\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Fdoc-buddy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvimalyad%2Fdoc-buddy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Fdoc-buddy/lists"}