{"id":28863237,"url":"https://github.com/adityabhatt3010/universal-ai-chatbot","last_synced_at":"2026-05-08T14:48:27.067Z","repository":{"id":298128978,"uuid":"998969394","full_name":"AdityaBhatt3010/Universal-AI-ChatBot","owner":"AdityaBhatt3010","description":"A domain-adaptable AI chatbot powered by RAG, FAISS, and LangChain to answer questions from your custom PDFs using HuggingFace LLMs.","archived":false,"fork":false,"pushed_at":"2025-06-12T15:55:18.000Z","size":11599,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-12T16:41:10.315Z","etag":null,"topics":["ai","ai-chatbot","chatbot","faiss","langchain","rag","rag-chatbot"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdityaBhatt3010.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-09T14:36:03.000Z","updated_at":"2025-06-12T15:55:21.000Z","dependencies_parsed_at":"2025-06-12T16:43:32.069Z","dependency_job_id":"60c75dc3-f332-4aa6-a318-53ca605a7a60","html_url":"https://github.com/AdityaBhatt3010/Universal-AI-ChatBot","commit_stats":null,"previous_names":["adityabhatt3010/phoenix-ai-chatbot","adityabhatt3010/universal-ai-chatbot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AdityaBhatt3010/Universal-AI-ChatBot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaBhatt3010%2FUniversal-AI-ChatBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaBhatt3010%2FUniversal-AI-ChatBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaBhatt3010%2FUniversal-AI-ChatBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaBhatt3010%2FUniversal-AI-ChatBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdityaBhatt3010","download_url":"https://codeload.github.com/AdityaBhatt3010/Universal-AI-ChatBot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdityaBhatt3010%2FUniversal-AI-ChatBot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260898760,"owners_count":23079263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-chatbot","chatbot","faiss","langchain","rag","rag-chatbot"],"created_at":"2025-06-20T07:02:19.231Z","updated_at":"2026-05-08T14:48:27.052Z","avatar_url":"https://github.com/AdityaBhatt3010.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 Universal AI Chatbot (RAG + FAISS + LangChain)\n\nA **domain-adaptable AI chatbot framework** built using **Retrieval-Augmented Generation (RAG)**, **FAISS**, and **LangChain**, capable of answering questions from **custom document-based knowledge** like cybersecurity books, medical encyclopedias, and more.\n\nThis project supports both research (via Jupyter Notebooks) and production deployment (via Python scripts).\n\n![Universal AI ChatBot Cover](https://github.com/user-attachments/assets/ecec53e7-687f-411f-bd3f-b0858e324d11) \u003cbr/\u003e\n\n---\n\n## 📌 Table of Contents\n\n* [🔍 What is this Chatbot?](#-what-is-this-chatbot)\n* [🧠 Key Concepts (RAG, FAISS, etc.)](#-key-concepts-rag-faiss-etc)\n* [🛠️ Project Structure](#️-project-structure)\n* [⚙️ How It Works](#️-how-it-works-behind-the-scenes)\n* [📚 Models Used](#-models-used)\n* [🚀 How to Run](#-how-to-run)\n* [🪄 Setup Script](#-setup-script)\n* [📁 Data \u0026 Vectorstore Info](#-data--vectorstore-info)\n* [🐋 Docker Support](#-Docker-Support)\n* [🎓 Use Cases](#-use-cases)\n* [🙌 Credits](#-credits)\n\n---\n\n## 🔍 What is this Chatbot?\n\nThis is a **plug-and-play AI chatbot engine** capable of retrieving answers from your **own documents**. Currently, it includes:\n\n* 🧑‍💻 **HackerBot** trained on Bug Bounty \u0026 Web Hacking books.\n* 🏥 **MedicBot** trained on Medical Encyclopedias.\n* 🧠 A base Python script (`ChatBot.py`) for creating more bots easily.\n\n\u003e Jupyter chat logs preserve conversations, useful for debugging and audit trails.\n\n---\n\n## 🧠 Key Concepts (RAG, FAISS, etc.)\n\n### 🔁 Retrieval-Augmented Generation (RAG)\n\nCombines **document retrieval** + **LLM generation**:\n\n1. Retrieves the top-k relevant document chunks.\n2. Passes them to a language model for generating the final answer.\n\n### 🔍 FAISS (Facebook AI Similarity Search)\n\nA high-performance library for **semantic vector search** using approximate nearest neighbors (ANN).\n\nUsed to:\n\n* Store text chunks as embeddings.\n* Retrieve the most relevant ones based on query similarity.\n\n### 💡 Semantic Search\n\nGoes **beyond keyword matching**—it uses vector embeddings to find conceptually similar content even if phrased differently.\n\n---\n\n## 🛠️ Project Structure\n\n```\nUniversal-AI-ChatBot/\n│\n├── data/                      # Place your PDF datasets here\n│   └── Instructions.md        # Instructions for dataset placement\n├── vectorstore/              # Stores FAISS + pickle index files\n│   └── Instructions.md        # Instructions for vector DB\n├── HackerBot.ipynb           # Chatbot trained on Web Hacking books\n├── MedicBot.ipynb            # Chatbot trained on Medical encyclopedia\n├── ChatBot.py                # General chatbot template (script version)\n├── Setup_env.ps1             # PowerShell script to auto-setup environment\n├── requirements.txt\n└── README.md\n```\n\n---\n\n## ⚙️ How It Works (Behind the Scenes)\n\n### 🔸 Step 1: Load and Split PDFs\n\n```python\nDirectoryLoader → PyPDFLoader → RecursiveCharacterTextSplitter\n```\n\n* All `.pdf` files in `/data/` are extracted and broken into 500-token chunks.\n* 50-token overlap helps preserve context across splits.\n\n---\n\n### 🔸 Step 2: Create Embeddings \u0026 Store in FAISS\n\n```python\ntext_chunks → MiniLM Embeddings → FAISS.from_documents()\n```\n\n* Each chunk is transformed into a vector using MiniLM.\n* FAISS stores them in `/vectorstore/db_faiss/` as `.faiss` and `.pkl`.\n\n---\n\n### 🔸 Step 3: Query Retrieval \u0026 Prompt Assembly\n\n```python\nUser Query → Embed → Top-3 Match → Inject into Prompt\n```\n\n* Input is embedded and compared against the FAISS index.\n* Top 3 chunks are selected and formatted into a custom prompt.\n\n---\n\n### 🔸 Step 4: Generate Answer via LLM\n\n```python\nPromptTemplate + Mistral LLM → Final Answer\n```\n\n* The prompt is passed to `mistralai/Mistral-7B-Instruct-v0.3` on HuggingFace.\n* It follows strict instruction: “don’t make up answers.”\n\n---\n\n### 🔸 Step 5: Chat Loop (Script Mode)\n\n```python\nwhile True → input() → RetrievalQA → print()\n```\n\n* Interactive command-line chatbot runs until user types `Exit the Chatbot`.\n\n---\n\n## 📚 Models Used\n\n### 🧠 `mistralai/Mistral-7B-Instruct-v0.3`\n\n\u003e A lightweight, instruction-tuned 7B parameter model.\n\n* Balances **speed and comprehension**.\n* Follows custom prompt instructions like “No small talk.”\n\n**Usage:**\n\n```python\nHuggingFaceEndpoint(repo_id=\"mistralai/Mistral-7B-Instruct-v0.3\", ...)\n```\n\n---\n\n### 🧬 `sentence-transformers/all-MiniLM-L6-v2`\n\n\u003e Fast \u0026 efficient transformer model for semantic embeddings.\n\n* Converts text into high-dimensional vectors.\n* Ideal for **document retrieval** and similarity scoring.\n\n**Usage:**\n\n```python\nHuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n```\n\n---\n\n## 🚀 How to Run\n\n### ▶️ Using Notebooks (Exploratory Mode)\n\n```bash\njupyter notebook HackerBot.ipynb\n```\n\nor\n\n```bash\njupyter notebook MedicBot.ipynb\n```\n\n### ▶️ Using Python Script (Production Mode)\n\n```bash\npython ChatBot.py\n```\n\n### ✅ Manual Environment Setup\n\n```bash\npython -m venv venv\n.\\venv\\Scripts\\activate           # For Windows\npip install -r requirements.txt\n```\n\n---\n\n## 🪄 Setup Script\n\nTo simplify setup on Windows, run the included PowerShell script:\n\n```powershell\n.\\Setup_env.ps1\n```\n\nThis script will:\n\n* Create virtual environment\n* Activate it\n* Install dependencies silently\n* Display success banner ✅\n\n---\n\n## 📁 Data \u0026 Vectorstore Info\n\n**Note:** No copyrighted books or embeddings are provided.\n\nInstead:\n\n* `data/Instructions.md`: Add your own `.pdf` files here.\n* `vectorstore/Instructions.md`: Explains how indexes will be **auto-created** when PDFs are processed.\n\nGenerated files:\n\n* `index.faiss` — vector similarity data\n* `index.pkl` — metadata (e.g., document sources)\n\n---\n\n\nSure thing Bub🗿🔥 — here’s the updated `README.md` with the **Docker section** seamlessly added **after** the existing content, and without touching any of your original formatting or headings:\n\n---\n\n## 🐋 Docker Support\n\nYou can now run the Universal-AI-ChatBot inside a Docker container!\n\n### 🛠 Prerequisites\n\n* Make sure Docker is installed and running.\n* Verify with:\n\n  ```bash\n  docker --version\n  ```\n\n### 🚀 Build and Run\n\n```bash\n# Build the Docker image\ndocker build -t ai-chatbot .\n\n# Run the Docker container with environment variables\ndocker run --env-file .env ai-chatbot\n```\n\nThe `.env` file must contain your Hugging Face token as:\n\n```env\nHF_TOKEN=your-token-here\n```\n\n---\n\n## 🎓 Use Cases\n\n* 🩺 Medical Bots (trained on medical PDFs)\n* 🛡️ Cybersecurity Advisors (for bug bounty, web security)\n* 🧠 Legal or Finance Q\\\u0026A Assistants\n* 📄 Compliance Documentation Bots (ISO, SOC2, GDPR, etc.)\n* 📘 Educational Assistants (coursebooks, research guides)\n\n---\n\n## 🔁 Visual Pipeline\n\n```mermaid\ngraph TD\nA[PDF Files in /data] --\u003e B[Text Chunking]\nB --\u003e C[Embedding Chunks with MiniLM-L6-v2]\nC --\u003e D[Store Embeddings in FAISS Vector DB]\n\nE[User Query] --\u003e F[Embed Query with MiniLM-L6-v2]\nF --\u003e G[Semantic Search in FAISS]\nD --\u003e G\n\nG --\u003e H[Retrieve Top-k Relevant Chunks]\n\nH --\u003e I[Insert Context into Prompt Template]\nI --\u003e J[Mistral-7B-Instruct-v0.3]\nJ --\u003e K[Answer Generated]\nK --\u003e L[Display Answer in Chat Loop]\n```\n\n---\n\n## 🙌 Credits\n\n\u003e Special Thanks \u0026 Shout-out to the community and devs whose work made this possible:\n\n* 🎥 [AIwithHassan on YouTube](https://youtu.be/OP0FYjF-37c?si=HJOGBVR4Izgs_8RM)\n* 💻 [GitHub - AIwithhassan/medical-chatbot](https://github.com/AIwithhassan/medical-chatbot)\n\n---\n\n## 🙋 Contribution \u0026 Feedback\n\nFeel free to fork, star 🌟, open issues, or contribute new bot variants!\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadityabhatt3010%2Funiversal-ai-chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadityabhatt3010%2Funiversal-ai-chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadityabhatt3010%2Funiversal-ai-chatbot/lists"}