https://github.com/sathviknayak123/agentic-rag
Agentic RAG with Llama-3.1-8b model Fine-tuned on medical conversational dataset
https://github.com/sathviknayak123/agentic-rag
agentic-rag chromadb context-aware-chat fastapi huggingface langchain langgraph llama3 ollama peft-fine-tuning-llm qlora quantization rag rouge-evaluation unsloth
Last synced: about 2 months ago
JSON representation
Agentic RAG with Llama-3.1-8b model Fine-tuned on medical conversational dataset
- Host: GitHub
- URL: https://github.com/sathviknayak123/agentic-rag
- Owner: SathvikNayak123
- Created: 2024-12-07T16:40:23.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-21T19:59:28.000Z (over 1 year ago)
- Last Synced: 2025-03-01T14:44:19.099Z (over 1 year ago)
- Topics: agentic-rag, chromadb, context-aware-chat, fastapi, huggingface, langchain, langgraph, llama3, ollama, peft-fine-tuning-llm, qlora, quantization, rag, rouge-evaluation, unsloth
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/sathvik123/llama3-ChatDoc
- Size: 215 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Agentic RAG Medical Assistant
Built an advanced medical assistant chatbot using a fine-tuned LLaMA-3.1-8B model and a Retrieval-Augmented Generation (RAG) pipeline with intelligent agent functionality.
- **RAG Architecture**:

- **Agentic RAG Workflow**:

## Features
- **History-Aware Responses**: Provides precise medical advice by integrating over **20+ medical resources** through a RAG pipeline.
- **Intelligent Query Routing**: Implemented agents to dynamically route off-topic queries to web search (e.g., Wikipedia) and process on-topic queries through document retrieval pipelines.
- **Document Relevance Grading**: Graded retrieved documents for relevance, generating responses if relevant or rewriting queries for improved retrieval.
- **Fine-Tuned LLaMA Model**: Fine-tuned **LLaMA 3.1 8B** using **LoRA**, achieving a **0.29 ROUGE1 score** for accuracy and reliability.
- **Low-Latency Interface**: Built an asynchronous chat interface with **FastAPI**, reducing response latency by **40%** for a seamless user experience.
---
## Tech Stack
### 1. Language Model
- **LLaMA 3.1 (8B)** fine-tuned on **medical conversational datasets** using **PEFT (LoRA)** for domain-specific expertise.
- **Unsloth**: Accelerated fine-tuning with **4-bit quantization**, reducing resource usage without compromising performance.
```bash
https://GitHub.com/unslothai/unsloth.git
```
- **Ollama**: Used for model integration and serving.
### 2. RAG Pipeline
- **LangChain**: Enabled context-aware responses and integrated the LLaMA model with document retrieval capabilities.
- **ChromaDB**: Stored and retrieved embeddings for efficient and accurate responses.
### 3. Backend
- **FastAPI**: Provided a robust and asynchronous backend for a seamless chat interface.
### 4. Other Tools
- **Hugging Face**: Hosted and served the optimized model in **GGUF format** for efficient inference.
---
## Setup Instructions
1. **Clone the Repository**
```bash
git clone https://github.com/SathvikNayak123/Agentic-RAG.git
```
2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
3. **Setup**
- Populate the database with medical documents.
- Generate and store embeddings using the fine-tuned LLaMA 3.1 model.
- Install Ollama and pull the model from Hugging Face:
```bash
ollama pull hf.co/sathvik123/llama3-ChatDoc
```
4. **Run the Application**
```bash
uvicorn app:app --reload
```
---
## Results
- **Model Performance**: Achieved a **0.29 ROUGE1 score** with fine-tuned LLaMA 3.1.
- **RAG Responses**: Demonstrated accurate and history-aware conversational capabilities.

- **Agent Functionality**: Effectively routed and processed queries based on topic relevance.
