{"id":29092205,"url":"https://github.com/deepak4siriboyina/smartdoc-assistant","last_synced_at":"2026-04-28T09:36:38.828Z","repository":{"id":301561827,"uuid":"1009549002","full_name":"Deepak4Siriboyina/smartdoc-assistant","owner":"Deepak4Siriboyina","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-27T13:34:34.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-27T14:38:37.098Z","etag":null,"topics":["ai","document-qa","google-gemini-ai","langgraph","lanngchain","llms","pdf-chatbot","rag-chatbot","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Deepak4Siriboyina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-27T10:05:24.000Z","updated_at":"2025-06-27T13:36:42.000Z","dependencies_parsed_at":"2025-06-27T14:38:43.682Z","dependency_job_id":null,"html_url":"https://github.com/Deepak4Siriboyina/smartdoc-assistant","commit_stats":null,"previous_names":["deepak4siriboyina/smartdoc-assistant"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Deepak4Siriboyina/smartdoc-assistant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepak4Siriboyina%2Fsmartdoc-assistant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepak4Siriboyina%2Fsmartdoc-assistant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepak4Siriboyina%2Fsmartdoc-assistant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepak4Siriboyina%2Fsmartdoc-assistant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Deepak4Siriboyina","download_url":"https://codeload.github.com/Deepak4Siriboyina/smartdoc-assistant/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepak4Siriboyina%2Fsmartdoc-assistant/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262389457,"owners_count":23303341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","document-qa","google-gemini-ai","langgraph","lanngchain","llms","pdf-chatbot","rag-chatbot","streamlit"],"created_at":"2025-06-28T07:03:58.034Z","updated_at":"2026-04-28T09:36:38.822Z","avatar_url":"https://github.com/Deepak4Siriboyina.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📄 SmartDoc Assistant – RAG-based PDF QA Chatbot\r\n\r\nSmartDoc Assistant is an end-to-end intelligent document Q\u0026A assistant powered by **LangGraph**, **LangChain**, and **Gemini (Google Generative AI)**. It allows users to upload any PDF file and instantly ask questions about its contents using Retrieval-Augmented Generation (RAG).\r\n\r\n✅ Uses Google Embeddings + Gemini LLM  \r\n✅ Summarizes \u0026 answers questions based on document context  \r\n✅ Fully in-memory (privacy-friendly: no permanent file storage)  \r\n✅ Deployed via Streamlit Cloud (Free Tier)\r\n\r\n---\r\n\r\n## 🚀 Live Demo\r\n\r\n- **SmartDoc Assistant** 👉 [Try it on Streamlit](https://smartdoc-assistant-aepsdqriept5vuzsbcdu7v.streamlit.app/)\r\n\r\n---\r\n\r\n## 🛠 Tech Stack\r\n\r\n| Layer       | Tech                        |\r\n|-------------|-----------------------------|\r\n| UI          | Streamlit                   |\r\n| LLM         | Google Generative AI (Gemini 2.5 Flash) |\r\n| Embeddings  | Google Generative AI Embeddings (`embedding-001`) |\r\n| Vector DB   | FAISS (In-memory)           |\r\n| Graph Flow  | LangGraph                   |\r\n| Framework   | LangChain                   |\r\n| PDF Parser  | PyMuPDF                     |\r\n| Language    | Python 3.11                 |\r\n| Hosting     | Streamlit Cloud (free tier) |\r\n\r\n---\r\n\r\n## 📦 Project Structure\r\n\r\n```bash\r\nsmartdoc-assistant/\r\n├── backend/\r\n│   ├── chains.py              # LLMs and chain setup\r\n│   ├── config.py              # Constants and config\r\n│   ├── rag_utils.py           # PDF loading, embedding, vectorstore utils\r\n├── frontend/\r\n│   └── streamlit_app.py       # Main Streamlit UI\r\n├── langgraph_app.py           # LangGraph workflow (input -\u003e retrieval -\u003e output)\r\n├── test_embed_and_store.py    # Simple CLI test for embedding logic\r\n├── temp/                      # Temporary file store (auto-cleared)\r\n├── .env                       # Google API Key \u0026 Config\r\n├── requirements.txt           # Project dependencies\r\n└── README.md                  # You're reading it!\r\n\r\n```\r\n\r\n## 📁 How to Run Locally\r\n\r\n```bash\r\n# Clone repo\r\ngit clone https://github.com/deepak4siriboyina/smartdoc-assistant.git\r\ncd smartdoc-assistant\r\n\r\n# Create virtual environment\r\npython -m venv virtenvt\r\nvirtenvt\\Scripts\\activate  # (Use PowerShell)\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\n\r\n# Set your API Key\r\necho GOOGLE_API_KEY=your-api-key \u003e .env\r\n\r\n# Run the app\r\nstreamlit run frontend/streamlit_app.py\r\n```\r\n\r\n## 🧠 How It Works\r\n- **User uploads a PDF** file through the Streamlit UI.\r\n- The PDF is parsed, chunked, and embedded using **Google's** `embedding-001` model.\r\n- The chunks are stored in a temporary **in-memory FAISS vector store**.\r\n- When the user asks a question:\r\n  - The **LangGraph** flow is triggered:\r\n  - → `input` → `retrieve` → `answer`\r\n  - A retriever fetches relevant chunks, and **Gemini 2.5 Flash** answers using `RetrievalQA`.\r\n- All Q\u0026A pairs are saved in the session, can be viewed via dropdown, and downloaded as `.txt` or `.csv`\r\n\r\n## ✨ Features\r\n- 📄 Upload any PDF and ask questions interactively.\r\n- ⚙️ Temporary in-memory processing – no persistent storage or data leakage.\r\n- 🧠 Uses Google's latest Gemini Flash model for fast responses.\r\n- 🗂️ Expandable chat history with full Q\u0026A transcripts.\r\n- ⏬ One-click download of chat history.\r\n- ✅ Lightweight, free to run, and private by design.\r\n\r\n## 📤 Deployment\r\n- Streamlit Frontend → [Streamlit Cloud](https://streamlit.io/cloud)\r\n\r\n## 🔐 Data Privacy\r\n- All uploaded PDFs are processed in-memory and deleted after embedding.\r\n- No document data is permanently stored.\r\n\r\n## 🙌 Credits\r\n- [LangChain](https://www.langchain.com/)\r\n- [LangGraph](https://langchain-ai.github.io/langgraph/concepts/why-langgraph/)\r\n- [Google Generative AI](https://ai.google.dev/)\r\n- [FAISS](https://github.com/facebookresearch/faiss)\r\n- [Streamlit](https://streamlit.io/)\r\n- [PyMuPDF](https://pymupdf.readthedocs.io/en/latest/)\r\n\r\n## 🧑‍💻 Author\r\n- Deepak Siriboyina – [LinkedIn](https://www.linkedin.com/in/deepak-siriboyina/)\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepak4siriboyina%2Fsmartdoc-assistant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepak4siriboyina%2Fsmartdoc-assistant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepak4siriboyina%2Fsmartdoc-assistant/lists"}