https://github.com/adityabhatt3010/universal-ai-chatbot
A domain-adaptable AI chatbot powered by RAG, FAISS, and LangChain to answer questions from your custom PDFs using HuggingFace LLMs.
https://github.com/adityabhatt3010/universal-ai-chatbot
ai ai-chatbot chatbot faiss langchain rag rag-chatbot
Last synced: about 2 months ago
JSON representation
A domain-adaptable AI chatbot powered by RAG, FAISS, and LangChain to answer questions from your custom PDFs using HuggingFace LLMs.
- Host: GitHub
- URL: https://github.com/adityabhatt3010/universal-ai-chatbot
- Owner: AdityaBhatt3010
- License: mit
- Created: 2025-06-09T14:36:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-12T15:55:18.000Z (about 1 year ago)
- Last Synced: 2025-06-12T16:41:10.315Z (about 1 year ago)
- Topics: ai, ai-chatbot, chatbot, faiss, langchain, rag, rag-chatbot
- Language: Jupyter Notebook
- Homepage:
- Size: 11.1 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π€ Universal AI Chatbot (RAG + FAISS + LangChain)
A **domain-adaptable AI chatbot framework** built using **Retrieval-Augmented Generation (RAG)**, **FAISS**, and **LangChain**, capable of answering questions from **custom document-based knowledge** like cybersecurity books, medical encyclopedias, and more.
This project supports both research (via Jupyter Notebooks) and production deployment (via Python scripts).

---
## π Table of Contents
* [π What is this Chatbot?](#-what-is-this-chatbot)
* [π§ Key Concepts (RAG, FAISS, etc.)](#-key-concepts-rag-faiss-etc)
* [π οΈ Project Structure](#οΈ-project-structure)
* [βοΈ How It Works](#οΈ-how-it-works-behind-the-scenes)
* [π Models Used](#-models-used)
* [π How to Run](#-how-to-run)
* [πͺ Setup Script](#-setup-script)
* [π Data & Vectorstore Info](#-data--vectorstore-info)
* [π Docker Support](#-Docker-Support)
* [π Use Cases](#-use-cases)
* [π Credits](#-credits)
---
## π What is this Chatbot?
This is a **plug-and-play AI chatbot engine** capable of retrieving answers from your **own documents**. Currently, it includes:
* π§βπ» **HackerBot** trained on Bug Bounty & Web Hacking books.
* π₯ **MedicBot** trained on Medical Encyclopedias.
* π§ A base Python script (`ChatBot.py`) for creating more bots easily.
> Jupyter chat logs preserve conversations, useful for debugging and audit trails.
---
## π§ Key Concepts (RAG, FAISS, etc.)
### π Retrieval-Augmented Generation (RAG)
Combines **document retrieval** + **LLM generation**:
1. Retrieves the top-k relevant document chunks.
2. Passes them to a language model for generating the final answer.
### π FAISS (Facebook AI Similarity Search)
A high-performance library for **semantic vector search** using approximate nearest neighbors (ANN).
Used to:
* Store text chunks as embeddings.
* Retrieve the most relevant ones based on query similarity.
### π‘ Semantic Search
Goes **beyond keyword matching**βit uses vector embeddings to find conceptually similar content even if phrased differently.
---
## π οΈ Project Structure
```
Universal-AI-ChatBot/
β
βββ data/ # Place your PDF datasets here
β βββ Instructions.md # Instructions for dataset placement
βββ vectorstore/ # Stores FAISS + pickle index files
β βββ Instructions.md # Instructions for vector DB
βββ HackerBot.ipynb # Chatbot trained on Web Hacking books
βββ MedicBot.ipynb # Chatbot trained on Medical encyclopedia
βββ ChatBot.py # General chatbot template (script version)
βββ Setup_env.ps1 # PowerShell script to auto-setup environment
βββ requirements.txt
βββ README.md
```
---
## βοΈ How It Works (Behind the Scenes)
### πΈ Step 1: Load and Split PDFs
```python
DirectoryLoader β PyPDFLoader β RecursiveCharacterTextSplitter
```
* All `.pdf` files in `/data/` are extracted and broken into 500-token chunks.
* 50-token overlap helps preserve context across splits.
---
### πΈ Step 2: Create Embeddings & Store in FAISS
```python
text_chunks β MiniLM Embeddings β FAISS.from_documents()
```
* Each chunk is transformed into a vector using MiniLM.
* FAISS stores them in `/vectorstore/db_faiss/` as `.faiss` and `.pkl`.
---
### πΈ Step 3: Query Retrieval & Prompt Assembly
```python
User Query β Embed β Top-3 Match β Inject into Prompt
```
* Input is embedded and compared against the FAISS index.
* Top 3 chunks are selected and formatted into a custom prompt.
---
### πΈ Step 4: Generate Answer via LLM
```python
PromptTemplate + Mistral LLM β Final Answer
```
* The prompt is passed to `mistralai/Mistral-7B-Instruct-v0.3` on HuggingFace.
* It follows strict instruction: βdonβt make up answers.β
---
### πΈ Step 5: Chat Loop (Script Mode)
```python
while True β input() β RetrievalQA β print()
```
* Interactive command-line chatbot runs until user types `Exit the Chatbot`.
---
## π Models Used
### π§ `mistralai/Mistral-7B-Instruct-v0.3`
> A lightweight, instruction-tuned 7B parameter model.
* Balances **speed and comprehension**.
* Follows custom prompt instructions like βNo small talk.β
**Usage:**
```python
HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.3", ...)
```
---
### 𧬠`sentence-transformers/all-MiniLM-L6-v2`
> Fast & efficient transformer model for semantic embeddings.
* Converts text into high-dimensional vectors.
* Ideal for **document retrieval** and similarity scoring.
**Usage:**
```python
HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
```
---
## π How to Run
### βΆοΈ Using Notebooks (Exploratory Mode)
```bash
jupyter notebook HackerBot.ipynb
```
or
```bash
jupyter notebook MedicBot.ipynb
```
### βΆοΈ Using Python Script (Production Mode)
```bash
python ChatBot.py
```
### β
Manual Environment Setup
```bash
python -m venv venv
.\venv\Scripts\activate # For Windows
pip install -r requirements.txt
```
---
## πͺ Setup Script
To simplify setup on Windows, run the included PowerShell script:
```powershell
.\Setup_env.ps1
```
This script will:
* Create virtual environment
* Activate it
* Install dependencies silently
* Display success banner β
---
## π Data & Vectorstore Info
**Note:** No copyrighted books or embeddings are provided.
Instead:
* `data/Instructions.md`: Add your own `.pdf` files here.
* `vectorstore/Instructions.md`: Explains how indexes will be **auto-created** when PDFs are processed.
Generated files:
* `index.faiss` β vector similarity data
* `index.pkl` β metadata (e.g., document sources)
---
Sure thing BubπΏπ₯ β hereβs the updated `README.md` with the **Docker section** seamlessly added **after** the existing content, and without touching any of your original formatting or headings:
---
## π Docker Support
You can now run the Universal-AI-ChatBot inside a Docker container!
### π Prerequisites
* Make sure Docker is installed and running.
* Verify with:
```bash
docker --version
```
### π Build and Run
```bash
# Build the Docker image
docker build -t ai-chatbot .
# Run the Docker container with environment variables
docker run --env-file .env ai-chatbot
```
The `.env` file must contain your Hugging Face token as:
```env
HF_TOKEN=your-token-here
```
---
## π Use Cases
* π©Ί Medical Bots (trained on medical PDFs)
* π‘οΈ Cybersecurity Advisors (for bug bounty, web security)
* π§ Legal or Finance Q\&A Assistants
* π Compliance Documentation Bots (ISO, SOC2, GDPR, etc.)
* π Educational Assistants (coursebooks, research guides)
---
## π Visual Pipeline
```mermaid
graph TD
A[PDF Files in /data] --> B[Text Chunking]
B --> C[Embedding Chunks with MiniLM-L6-v2]
C --> D[Store Embeddings in FAISS Vector DB]
E[User Query] --> F[Embed Query with MiniLM-L6-v2]
F --> G[Semantic Search in FAISS]
D --> G
G --> H[Retrieve Top-k Relevant Chunks]
H --> I[Insert Context into Prompt Template]
I --> J[Mistral-7B-Instruct-v0.3]
J --> K[Answer Generated]
K --> L[Display Answer in Chat Loop]
```
---
## π Credits
> Special Thanks & Shout-out to the community and devs whose work made this possible:
* π₯ [AIwithHassan on YouTube](https://youtu.be/OP0FYjF-37c?si=HJOGBVR4Izgs_8RM)
* π» [GitHub - AIwithhassan/medical-chatbot](https://github.com/AIwithhassan/medical-chatbot)
---
## π Contribution & Feedback
Feel free to fork, star π, open issues, or contribute new bot variants!
---