https://github.com/arulkumarann/legalrag
A FastAPI-based Retrieval-Augmented Generation (RAG) system for legal document analysis, powered by Pinecone, LangChain, and Llama 3.3
https://github.com/arulkumarann/legalrag
chatbot fastapi langchain llama pinecone rag vector
Last synced: 2 months ago
JSON representation
A FastAPI-based Retrieval-Augmented Generation (RAG) system for legal document analysis, powered by Pinecone, LangChain, and Llama 3.3
- Host: GitHub
- URL: https://github.com/arulkumarann/legalrag
- Owner: arulkumarann
- License: mit
- Created: 2025-03-20T20:33:38.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-21T22:38:54.000Z (about 1 year ago)
- Last Synced: 2025-03-21T23:28:38.844Z (about 1 year ago)
- Topics: chatbot, fastapi, langchain, llama, pinecone, rag, vector
- Language: Python
- Homepage:
- Size: 21.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **LegalRAG**
A Retrieval-Augmented Generation (RAG) system for legal document analysis, built with **FastAPI**, **Pinecone**, and **Llama 3.3 via Groq Inference**.
## **Overview**
LegalRAG is an AI-powered system that retrieves and analyzes legal documents based on user queries. It utilizes **vector-based search** to provide relevant case law, legal statutes, and precedents in response to legal queries.
## **Features**
**Legal Document Retrieval** – Uses Pinecone for fast and efficient case law retrieval.
**Natural Language Querying** – Allows users to search legal documents using plain English.
**Session-Based Handling** – Supports session-based query tracking for better user experience.
**FastAPI Backend** – Provides a scalable and efficient REST API.
**Concurrency Handling** – Manages multiple legal research sessions efficiently.
**Llama 3.3 via Groq Inference** – High-speed, low-latency language model inference.
---
## **Installation**
### **1. Clone the repository**
```bash
git clone https://github.com/arulkumarann/legalRAG.git
cd legalRAG
```
### **2. Create a virtual environment**
```bash
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
```
### **3. Install dependencies**
```bash
pip install -r requirements.txt
```
---
## **Configuration**
Create a `.env` file and add the following variables:
```
PINECONE_API_KEY=your_pinecone_api_key
INDEX_NAME=your_pinecone_index_name
GROQ_API_KEY=your_groq_api_key
```
---
## **Running the Application**
### **Local Development**
```bash
uvicorn app.main:app --reload
```
API will be available at **http://localhost:8000**
### **Running the Streamlit Interface**
```bash
streamlit run app.py
```
The Streamlit interface will be available at **http://localhost:8501**
---
## **API Endpoints**
### **1. Root Check**
- **GET `/`** – Health check endpoint
### **2. Chat with RAG**
- **POST `/chat`** – Query legal documents with RAG
- **Body:**
```json
{
"session_id": "abc123",
"query": "What are the key points of Smith v. Johnson?"
}
```
- **Response:**
```json
{
"response": "The key points are breach of contract and negligence...",
"sources": ["case_smith_v_johnson.pdf"]
}
```
### **3. Session Management**
- **DELETE `/session/{session_id}`** – End a legal research session
- **POST `/session/reset/{session_id}`** – Reset a session's state
- **GET `/sessions/count`** – Get the total number of active sessions
### **4. Retrieve Source Documents**
- **GET `/sources/{session_id}`** – Fetch relevant legal documents for a session
---
## **Llama 3.3 via Groq Inference**
- **Inference Speed:** <15ms per query response
- **Token Throughput:** ~500 tokens/sec
- **Latency:** Low-latency response optimized for real-time queries
- **Memory Usage:** Efficient memory footprint compared to traditional on-device LLMs
- **Scalability:** Supports concurrent user queries without degradation
---
## **Concurrency & Session Handling**
This RAG system supports multiple concurrent users by assigning unique `session_id` values for each session.
- **Sessions track user queries** to improve context and accuracy.
- **Each session has its own vector search scope** in Pinecone, ensuring faster retrieval of case-specific documents.
- **Automatic cleanup** – Sessions can be ended manually (`DELETE /session/{session_id}`) or reset (`POST /session/reset/{session_id}`).
---
## **Deployment**
### **Deploy on Render**
1. **Build Command:**
```bash
pip install -r requirements.txt
```
2. **Start Command:**
```bash
gunicorn app.main:app -k uvicorn.workers.UvicornWorker --workers 1 --threads 2 --timeout 120
```
3. **Environment Variables:**
- `PINECONE_API_KEY`
- `INDEX_NAME`
- `GROQ_API_KEY`
---
## **Memory Optimization**
- Uses **lazy model loading** to reduce memory usage.
- **Batch processing** to prevent memory spikes.
- **Garbage collection** after query execution.
---
## **License**
This project is open-source under the **MIT License**.