https://github.com/vaidehishyara14/ayurveda-pdf-q-a-chatbot
An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.
https://github.com/vaidehishyara14/ayurveda-pdf-q-a-chatbot
embeddings fastapi fiass langchain langchain-groq llama3 llm pdf pdfprocessing pymupdf python question-answering text-splitting vector-database
Last synced: 5 months ago
JSON representation
An intelligent chatbot that allows users to upload text-based Ayurveda PDFs and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining semantic search and LLM-based responses.
- Host: GitHub
- URL: https://github.com/vaidehishyara14/ayurveda-pdf-q-a-chatbot
- Owner: VaidehiShyara14
- Created: 2025-06-28T12:50:33.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-06-28T14:59:37.000Z (5 months ago)
- Last Synced: 2025-06-28T15:21:11.042Z (5 months ago)
- Topics: embeddings, fastapi, fiass, langchain, langchain-groq, llama3, llm, pdf, pdfprocessing, pymupdf, python, question-answering, text-splitting, vector-database
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🌿 Ayurveda Q/A Chatbot
An intelligent chatbot that allows users to upload **text-based Ayurveda PDFs** and ask questions based on the content using RAG (Retrieval-Augmented Generation) combining **semantic search** and **LLM-based responses**.
---
## Features
- Upload Ayurveda PDFs (text-based only)
- Ask natural-language questions based on uploaded content
- Chunk text smartly using LangChain
- Semantic search with MiniLM embeddings
- Fast retrieval using **FAISS**
- Powered by **LLaMA 3** via **Groq API**
- Based on the **RAG architecture** (Retrieval-Augmented Generation)
- Easy-to-use interface via **Streamlit**
---
## Tech Stack
| Component | Technology |
|------------------|-----------------------------------------|
| Frontend | Streamlit |
| Backend | FastAPI |
| Embeddings | HuggingFace MiniLM-L6-v2 |
| Vector Search | FAISS |
| Language Model | LLaMA 3 via Groq API |
| PDF Processing | PyMuPDF + LangChain |
| Prompting | LangChain + Custom PromptTemplate |
| Environment Var | python-dotenv |
---
## How It Works (RAG Workflow)
1. **Upload**: User uploads a **text-based PDF**
2. **Text Extraction**: PDF is read using **PyMuPDF**
3. **Chunking**: Text is broken into smaller pieces using **RecursiveCharacterTextSplitter**
4. **Embedding**: Chunks are embedded using **HuggingFace MiniLM**
5. **Vector DB**: Chunks are stored in a **FAISS** vector store
6. **Q&A**:Question → Similar chunks retrieved
Chunks + Question → sent to **LLaMA 3**
LLM generates a final context-based answer
This is a **Retrieval-Augmented Generation (RAG)** system.
---
## 🚀 Getting Started
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/ayurveda-chatbot.git
cd ayurveda-chatbot
```
### 2. Create a Virtual Environment
```bash
python -m venv ayurveda_env
ayurveda_env\Scripts\activate
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
### 4. Add Your `.env` File
Create a `.env` file with your Groq key:
```env
GROQ_API_KEY=your_groq_api_key
```
### 5. Run the Backend
```bash
python app.py
```
### 6. Run the Streamlit Frontend
```bash
streamlit run streamlit_app.py
```