{"id":26337602,"url":"https://github.com/anupam0202/contextual-rag-chatbot","last_synced_at":"2026-05-05T06:39:13.530Z","repository":{"id":279031295,"uuid":"937526267","full_name":"Anupam0202/Contextual-RAG-Chatbot","owner":"Anupam0202","description":"Contextual RAG Chatbot that processes PDF documents using the Google Gemini API","archived":false,"fork":false,"pushed_at":"2025-08-10T14:22:20.000Z","size":43,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-10T16:23:08.526Z","etag":null,"topics":["google-generativeai","numpy","pypdf2","scikit-learn","streamlit"],"latest_commit_sha":null,"homepage":"https://contextual-rag-chatbot.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Anupam0202.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-23T09:21:10.000Z","updated_at":"2025-08-10T14:22:23.000Z","dependencies_parsed_at":"2025-02-23T10:25:31.427Z","dependency_job_id":"89c8c2f6-e739-481e-96fe-17aea403a552","html_url":"https://github.com/Anupam0202/Contextual-RAG-Chatbot","commit_stats":null,"previous_names":["anupam0202/contextual-rag-chatbot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Anupam0202/Contextual-RAG-Chatbot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anupam0202%2FContextual-RAG-Chatbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anupam0202%2FContextual-RAG-Chatbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anupam0202%2FContextual-RAG-Chatbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anupam0202%2FContextual-RAG-Chatbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Anupam0202","download_url":"https://codeload.github.com/Anupam0202/Contextual-RAG-Chatbot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anupam0202%2FContextual-RAG-Chatbot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271128250,"owners_count":24703873,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["google-generativeai","numpy","pypdf2","scikit-learn","streamlit"],"created_at":"2025-03-16T02:19:29.700Z","updated_at":"2026-05-05T06:39:13.523Z","avatar_url":"https://github.com/Anupam0202.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 Contextual RAG Chatbot\r\n\r\n\u003e An intelligent document assistant powered by Google Gemini AI, featuring advanced RAG capabilities, contextual conversations, and comprehensive analytics.\r\n\r\n![RAG Chatbot Banner](https://github.com/user-attachments/assets/6dc6c9db-e419-4e44-9ae2-bacd4ade3b2c)\r\n\r\n## ✨ Key Features\r\n\r\n- **🧠 Advanced AI**: Powered by Google Gemini with thinking \u0026 reflection capabilities\r\n- **📚 Smart Document Processing**: PDF processing with multiple extraction methods and OCR fallback\r\n- **🔍 Hybrid Search**: Combines semantic and keyword search for optimal retrieval\r\n- **💬 Contextual Conversations**: Maintains conversation context with adjustable window (1-20 messages)\r\n- **📊 Analytics Dashboard**: Real-time metrics, interactive charts, and Excel/HTML export\r\n- **🎨 Modern UI**: 4 themes, responsive design, accessibility features\r\n- **🔒 Enterprise Security**: Privacy mode, input validation, session isolation\r\n- **⚡ High Performance**: Async processing, intelligent caching, circuit breakers\r\n\r\n## 🚀 Quick Start\r\n\r\n### Prerequisites\r\n\r\n- Python 3.12+\r\n- 8GB RAM (16GB recommended)\r\n- Google Gemini API key\r\n\r\n### Installation\r\n\r\n```bash\r\n# Clone repository\r\ngit clone https://github.com/Anupam0202/Contextual-RAG-Chatbot.git\r\ncd Contextual-RAG-Chatbot\r\n\r\n# Create virtual environment\r\npython -m venv venv\r\n\r\n# Activate (Windows)\r\nvenv\\Scripts\\activate\r\n\r\n# Activate (macOS/Linux)\r\nsource venv/bin/activate\r\n\r\n# Install dependencies\r\npip install --upgrade pip\r\npip install -r requirements.txt\r\n```\r\n\r\n### Configuration\r\n\r\n1. **Create `.env` file:**\r\n```bash\r\ncp .env.example .env\r\n```\r\n\r\n2. **Add your Gemini API key:**\r\n```env\r\nGEMINI_API_KEY=your_api_key_here\r\n```\r\n\r\nGet your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)\r\n\r\n### Run\r\n\r\n```bash\r\nstreamlit run app.py\r\n```\r\n\r\nApplication opens at `http://localhost:8501`\r\n\r\n## 📖 How to Use\r\n\r\n### 1. Upload Documents\r\n- Navigate to **📚 Documents** page\r\n- Upload PDF files (drag \u0026 drop or browse)\r\n- Wait for processing to complete\r\n\r\n### 2. Chat with Your Documents\r\n- Go to **💬 Chat** page\r\n- Ask questions about your documents\r\n- Get AI responses with source citations\r\n\r\n**Example queries:**\r\n- \"What are the main topics in this document?\"\r\n- \"Summarize the key findings\"\r\n- \"Compare section 2 and section 5\"\r\n\r\n### 3. View Analytics\r\n- Visit **📊 Analytics** page\r\n- Review performance metrics\r\n- Export reports (Excel/HTML)\r\n\r\n### 4. Adjust Settings\r\n- Open **⚙️ Settings** page\r\n- Configure model parameters\r\n- Adjust context window\r\n- Set UI preferences\r\n\r\n## 🏗️ Architecture\r\n\r\n```\r\n┌─────────────────────────────────────┐\r\n│         Streamlit UI (app.py)       │\r\n│   [Chat | Documents | Analytics]    │\r\n├─────────────────────────────────────┤\r\n│      RAG Engine (rag_core.py)       │\r\n│ [Planning→Retrieval→Generation]     │\r\n├──────────────┬──────────────────────┤\r\n│ Vector Store │   PDF Processor      │\r\n│ • Hybrid     │   • Chunking         │\r\n│ • FAISS      │   • OCR Fallback     │\r\n├──────────────┴──────────────────────┤\r\n│   Infrastructure (utils, config)    │\r\n│     [Cache | Sessions | Security]   │\r\n└─────────────────────────────────────┘\r\n```\r\n\r\n### Query Processing Flow\r\n\r\n```\r\nUser Query → Planning → Vector Search → Reranking → \r\nGeneration → Reflection → Response\r\n```\r\n\r\n## ⚙️ Configuration Options\r\n\r\n### Model Settings\r\n```env\r\nRAG_MODEL_NAME=gemini-1.5-flash      # AI model\r\nRAG_TEMPERATURE=0.7                   # Creativity (0-1)\r\nRAG_MAX_OUTPUT_TOKENS=2048           # Max response length\r\n```\r\n\r\n### Retrieval Settings\r\n```env\r\nRAG_RETRIEVAL_TOP_K=5                # Results per query\r\nRAG_CHUNK_SIZE=1000                  # Text chunk size\r\nRAG_CHUNK_OVERLAP=200                # Chunk overlap\r\nRAG_HYBRID_SEARCH_ALPHA=0.5          # Search balance\r\n```\r\n\r\n### Performance Settings\r\n```env\r\nRAG_ENABLE_CACHING=true              # Enable caching\r\nRAG_CACHE_TTL=3600                   # Cache lifetime (seconds)\r\nRAG_MAX_WORKERS=4                    # Parallel workers\r\n```\r\n\r\n## 🔧 Troubleshooting\r\n\r\n### Common Issues\r\n\r\n**Application won't start**\r\n```bash\r\n# Ensure venv is activated\r\nsource venv/bin/activate  # macOS/Linux\r\nvenv\\Scripts\\activate     # Windows\r\n\r\n# Reinstall dependencies\r\npip install -r requirements.txt\r\n```\r\n\r\n**API Key Error**\r\n```bash\r\n# Verify .env file exists with correct key\r\ncat .env  # macOS/Linux\r\ntype .env # Windows\r\n```\r\n\r\n**PDF Processing Fails**\r\n- Ensure PDF isn't corrupted\r\n- Check file size (\\u003c50MB)\r\n- For scanned PDFs, install Tesseract OCR\r\n\r\n**Memory Issues**\r\n- Reduce chunk size in settings\r\n- Limit context window to 3-5 messages\r\n- Clear cache regularly\r\n\r\n### Debug Mode\r\n\r\n```bash\r\n# Enable debug logging\r\nstreamlit run app.py --logger.level=debug\r\n```\r\n\r\n## 📚 API Usage\r\n\r\n### Basic Example\r\n\r\n```python\r\nfrom rag_core import getRAGEngine\r\nfrom pdf_processor import createPDFProcessor\r\n\r\n# Process PDF\r\nprocessor = createPDFProcessor()\r\ndoc = processor.processPDF(\"document.pdf\")\r\n\r\n# Query documents\r\nrag = getRAGEngine()\r\nasync for chunk in rag.processQuery(\"What is the summary?\"):\r\n    print(chunk, end=\"\")\r\n```\r\n\r\n### With Conversation History\r\n\r\n```python\r\nconversation = [\r\n    {\"role\": \"user\", \"content\": \"What is this about?\"},\r\n    {\"role\": \"assistant\", \"content\": \"It's about...\"}\r\n]\r\nasync for response in rag.processQuery(\r\n    \"Tell me more\", \r\n    conversation_history=conversation\r\n):\r\n    print(response, end=\"\")\r\n```\r\n\r\n## 🎯 Advanced Features\r\n\r\n### Thinking \u0026 Reflection\r\nMulti-step reasoning process for complex queries:\r\n- **Planning**: Intent classification \u0026 query decomposition\r\n- **Retrieval**: Hybrid search with reranking\r\n- **Generation**: Context-aware responses\r\n- **Reflection**: Self-evaluation \u0026 improvement\r\n\r\n### Circuit Breaker\r\nAutomatic failure recovery prevents cascade failures:\r\n- `CLOSED` → normal operation\r\n- `OPEN` → failures detected, requests blocked\r\n- `HALF_OPEN` → testing recovery\r\n- Back to `CLOSED` → recovered\r\n\r\n### Privacy Features\r\n- PII sanitization\r\n- Sensitive data redaction\r\n- Session timeout management\r\n- Secure API key storage\r\n\r\n## ❓ FAQ\r\n\r\n**Q: What file formats are supported?**  \r\nA: Currently PDF files (.pdf). Text file support coming soon.\r\n\r\n**Q: What's the file size limit?**  \r\nA: Default is 50MB, configurable in settings.\r\n\r\n**Q: Is my data secure?**  \r\nA: Yes - local processing, privacy mode, session isolation, secure API management.\r\n\r\n**Q: Which AI models are supported?**  \r\nA: Google Gemini 1.5 Flash (default), Gemini 1.5 Pro, Gemini 1.0 Pro\r\n\r\n**Q: How does hybrid search work?**  \r\nA: Combines semantic search (meaning) + keyword search (exact matches) with configurable weighting.\r\n\r\n**Q: What is context window?**  \r\nA: Number of previous messages (1-20) included when generating responses.\r\n\r\n## 🛠️ Tech Stack\r\n\r\n- **AI**: Google Gemini 1.5\r\n- **Framework**: Streamlit\r\n- **Embeddings**: Sentence Transformers\r\n- **Vector DB**: FAISS\r\n- **PDF**: PyPDF2, pdfplumber, Tesseract OCR\r\n- **Analytics**: Plotly, Pandas\r\n\r\n## 📞 Support\r\n\r\n- **Issues**: [GitHub Issues](https://github.com/Anupam0202/Contextual-RAG-Chatbot/issues)\r\n- **Documentation**: [GitHub README](https://github.com/Anupam0202/Contextual-RAG-Chatbot)\r\n\r\n## 🙏 Acknowledgments\r\n\r\nBuilt with:\r\n- [Google Gemini AI](https://ai.google/gemini/) - AI model\r\n- [Streamlit](https://streamlit.io/) - Web framework\r\n- [FAISS](https://github.com/facebookresearch/faiss) - Vector search\r\n- [Sentence Transformers](https://www.sbert.net/) - Embeddings\r\n\r\n---\r\n\r\n**Made with ❤️ by Anupam**\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanupam0202%2Fcontextual-rag-chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanupam0202%2Fcontextual-rag-chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanupam0202%2Fcontextual-rag-chatbot/lists"}