{"id":30651288,"url":"https://github.com/shivammishra1603/rag-bot","last_synced_at":"2026-04-14T15:31:22.303Z","repository":{"id":311846590,"uuid":"1044624941","full_name":"ShivamMishra1603/rag-bot","owner":"ShivamMishra1603","description":"RAG Bot is a Retrieval-Augmented Generation (RAG) chatbot built with Streamlit, FAISS, HuggingFace embeddings, and Google Gemini. It lets you upload PDFs, process them into a vector store, and interact with your documents in a natural chat interface with conversational memory and real-time processing.","archived":false,"fork":false,"pushed_at":"2025-08-27T00:41:11.000Z","size":942,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-27T09:21:13.202Z","etag":null,"topics":["chatbot","faiss","google-gemini","huggingface","langchain","rag","streamlit","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShivamMishra1603.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-26T01:17:08.000Z","updated_at":"2025-08-27T00:49:55.000Z","dependencies_parsed_at":"2025-08-27T09:22:01.661Z","dependency_job_id":null,"html_url":"https://github.com/ShivamMishra1603/rag-bot","commit_stats":null,"previous_names":["shivammishra1603/rag-bot"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ShivamMishra1603/rag-bot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShivamMishra1603%2Frag-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShivamMishra1603%2Frag-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShivamMishra1603%2Frag-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShivamMishra1603%2Frag-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShivamMishra1603","download_url":"https://codeload.github.com/ShivamMishra1603/rag-bot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShivamMishra1603%2Frag-bot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31803159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T11:13:53.975Z","status":"ssl_error","status_checked_at":"2026-04-14T11:13:53.299Z","response_time":153,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","faiss","google-gemini","huggingface","langchain","rag","streamlit","vector-database"],"created_at":"2025-08-31T06:33:14.085Z","updated_at":"2026-04-14T15:31:22.287Z","avatar_url":"https://github.com/ShivamMishra1603.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG Bot\r\n\r\nRetrieval-Augmented Generation (RAG) chatbot that lets you upload PDFs, index them locally with FAISS, and chat using Google’s Gemini models — all wrapped in a clean Streamlit UI.\r\n\r\n## Demo\r\n\r\n![Demo](public/videos/Demo.gif)\r\n\r\n\r\n## Features\r\n\r\n- **Document Upload \u0026 Processing**: Upload multiple PDF files for AI analysis\r\n- **RAG-Powered Responses**: AI answers questions based on uploaded document content\r\n- **Conversational Memory**: Maintains chat history for contextual conversations\r\n- **Vector Store Persistence**: Saves processed documents for future sessions\r\n- **Modern UI**: Clean, WhatsApp-inspired chat interface\r\n- **Real-time Processing**: Live document processing with progress indicators\r\n\r\n\r\n## Installation\r\n\r\n### Prerequisites\r\n- Python 3.8+\r\n- Google API key for Gemini model\r\n\r\n### Setup Steps\r\n\r\n1. **Clone the repository**\r\n   ```bash\r\n   git clone https://github.com/ShivamMishra1603/rag-bot\r\n   cd rag-bot\r\n   ```\r\n\r\n2. **Install dependencies**\r\n   ```bash\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n3. **Environment Configuration**\r\n   Create a `.env` file in the root directory:\r\n   ```env\r\n   GOOGLE_API_KEY=your_google_api_key_here\r\n   ```\r\n\r\n4. **Run the application**\r\n   ```bash\r\n   streamlit run app.py\r\n\r\n## Usage\r\n\r\n### Getting Started\r\n\r\n1. **Launch the Application**\r\n   ```bash\r\n   streamlit run app.py\r\n   ```\r\n\r\n2. **Upload Documents**\r\n   - Use the sidebar to upload one or more PDF files\r\n   - Click \"Process Documents\" to create the vector store\r\n\r\n3. **Start Chatting**\r\n   - Ask questions about your uploaded documents\r\n   - The AI will provide answers based on the document content\r\n\r\n### System Status Indicators\r\n\r\nThe sidebar shows real-time system status:\r\n- **Vector Store**: Shows if documents are loaded and ready\r\n- **RAG Chain**: Indicates if the conversational system is active\r\n- **Controls**: Clear chat history and reset memory\r\n\r\n\r\n## File Structure\r\n\r\n```\r\nrag-bot/\r\n├── app.py                 # Main Streamlit application\r\n├── requirements.txt       # Python dependencies\r\n├── .env                   # Environment variables (create this)\r\n├── src/                   # Core application modules\r\n│   ├── chain.py           # RAG chain implementation\r\n│   ├── embeddings.py      # Embedding management\r\n│   ├── loaders.py         # Document processing\r\n│   └── vectorstore.py     # Vector database operations\r\n├── vectorstore/           # Persistent vector storage (auto-created)\r\n│   ├── faiss_index.faiss  # FAISS index file (auto-generated)\r\n│   ├── faiss_index.pkl    # FAISS metadata file (auto-generated)\r\n└── README.md              # This file\r\n```\r\n\r\n### Core Components\r\n\r\n#### Component Details\r\n#### `app.py` - Main Application\r\nThe Streamlit-based web interface that orchestrates all components:\r\n- **Chat Interface**: WhatsApp-style messaging UI with timestamps\r\n- **Document Upload**: PDF file upload and processing workflow\r\n- **RAG System Setup**: Initializes and manages the RAG pipeline\r\n- **Session Management**: Handles chat history and system state\r\n\r\n\r\n#### `src/embeddings.py` - Embedding Management\r\nHandles text vectorization using HuggingFace embeddings:\r\n- **Model**: `BAAI/bge-base-en-v1.5` (base English model)\r\n- **Configuration**: CPU-based processing with normalized embeddings\r\n- **Caching**: Streamlit resource caching for performance\r\n\r\n#### `src/loaders.py` - Document Processing\r\nManages PDF document loading and text chunking:\r\n- **PDF Processing**: Uses PyPDFLoader for document extraction\r\n- **Text Splitting**: Recursive character-based splitting\r\n- **Chunk Configuration**:\r\n  - Default chunk size: 1000 characters\r\n  - Overlap: 200 characters\r\n  - Smart separators: `[\"\\n\\n\", \"\\n\", \" \", \"\"]`\r\n- **Metadata Preservation**: Maintains source file information\r\n\r\n#### `src/chain.py` - Conversational RAG Chain\r\nImplements the core conversational retrieval system:\r\n- **ConversationalRAGChain**: Main class managing the RAG pipeline\r\n- **Google Gemini Integration**: Uses `Gemini` model for responses\r\n- **Memory Management**: Conversation buffer with configurable window size\r\n- **Custom Prompting**: Tailored prompts for document-based Q\u0026A\r\n\r\n\r\n#### `src/vectorstore.py` - Vector Database\r\nManages FAISS vector store operations:\r\n- **FAISS Integration**: Facebook AI Similarity Search for vector storage\r\n- **Persistence**: Save/load vector stores to/from disk\r\n- **Document Management**: Add new documents to existing stores\r\n- **Retrieval**: Configurable similarity search (default: top-4 results)\r\n\r\n\r\n\r\n## Dependencies\r\n\r\n### Core Libraries\r\n- **streamlit**: Web application framework\r\n- **langchain**: LLM application framework\r\n- **langchain-community**: Community integrations\r\n- **langchain-google-genai**: Google Generative AI integration\r\n- **langchain-huggingface**: HuggingFace embeddings integration\r\n\r\n### Processing Libraries\r\n- **sentence-transformers**: Sentence embedding models\r\n- **pypdf**: PDF document processing\r\n- **faiss-cpu**: Vector similarity search\r\n- **python-dotenv**: Environment variable management\r\n\r\n\r\n\r\n\r\n## Configuration\r\n\r\n### Model Settings\r\n- **LLM Model**: `gemini-1.5-flash` (configurable in `chain.py`)\r\n- **Temperature**: 0.7 (controls response creativity)\r\n- **Memory Window**: 10 messages (conversation history)\r\n\r\n### Embedding Settings\r\n- **Model**: `BAAI/bge-base-en-v1.5`\r\n- **Device**: CPU (configurable for GPU)\r\n- **Normalization**: Enabled for better similarity matching\r\n\r\n### Document Processing\r\n- **Chunk Size**: 1000 characters\r\n- **Chunk Overlap**: 200 characters\r\n- **Supported Formats**: PDF files only\r\n\r\n### Vector Store\r\n- **Backend**: FAISS (Facebook AI Similarity Search)\r\n- **Persistence**: Local disk storage in `vectorstore/` directory\r\n- **Retrieval**: Top-4 similar documents per query\r\n\r\n\r\n\r\n## Error Handling\r\n\r\nThe application includes comprehensive error handling:\r\n- **Document Processing**: Graceful handling of corrupt PDFs\r\n- **API Failures**: Fallback messages for API issues\r\n- **Vector Store**: Error recovery for storage operations\r\n- **Memory Management**: Automatic cleanup and reset options\r\n\r\n\r\n## Troubleshooting\r\n\r\n### Common Issues\r\n\r\n1. **\"GOOGLE_API_KEY not found\"**\r\n   - Ensure `.env` file exists with valid API key\r\n\r\n2. **\"No vector store available\"**\r\n   - Upload and process documents first\r\n\r\n3. **\"Error processing PDF\"**\r\n   - Check if PDF is not corrupted or password-protected\r\n\r\n4. **Memory issues with large documents**\r\n   - Reduce chunk size or process fewer documents at once\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshivammishra1603%2Frag-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshivammishra1603%2Frag-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshivammishra1603%2Frag-bot/lists"}