https://github.com/arulkumarann/legalrag

A FastAPI-based Retrieval-Augmented Generation (RAG) system for legal document analysis, powered by Pinecone, LangChain, and Llama 3.3
https://github.com/arulkumarann/legalrag

chatbot fastapi langchain llama pinecone rag vector

Last synced: 3 months ago
JSON representation

A FastAPI-based Retrieval-Augmented Generation (RAG) system for legal document analysis, powered by Pinecone, LangChain, and Llama 3.3

Host: GitHub
URL: https://github.com/arulkumarann/legalrag
Owner: arulkumarann
License: mit
Created: 2025-03-20T20:33:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-21T22:38:54.000Z (over 1 year ago)
Last Synced: 2025-03-21T23:28:38.844Z (over 1 year ago)
Topics: chatbot, fastapi, langchain, llama, pinecone, rag, vector
Language: Python
Homepage:
Size: 21.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # **LegalRAG**  

A Retrieval-Augmented Generation (RAG) system for legal document analysis, built with **FastAPI**, **Pinecone**, and **Llama 3.3 via Groq Inference**.  

## **Overview**  

LegalRAG is an AI-powered system that retrieves and analyzes legal documents based on user queries. It utilizes **vector-based search** to provide relevant case law, legal statutes, and precedents in response to legal queries.  

## **Features**  

**Legal Document Retrieval** – Uses Pinecone for fast and efficient case law retrieval.  

 **Natural Language Querying** – Allows users to search legal documents using plain English.  

 **Session-Based Handling** – Supports session-based query tracking for better user experience.  

 **FastAPI Backend** – Provides a scalable and efficient REST API.  

 **Concurrency Handling** – Manages multiple legal research sessions efficiently.  

 **Llama 3.3 via Groq Inference** – High-speed, low-latency language model inference.  

---  

## **Installation**  

### **1. Clone the repository**  

```bash

git clone https://github.com/arulkumarann/legalRAG.git

cd legalRAG

```

### **2. Create a virtual environment**  

```bash

python -m venv venv

# On Windows

venv\Scripts\activate

# On macOS/Linux

source venv/bin/activate

```

### **3. Install dependencies**  

```bash

pip install -r requirements.txt

```

---  

## **Configuration**  

Create a `.env` file and add the following variables:  

```

PINECONE_API_KEY=your_pinecone_api_key

INDEX_NAME=your_pinecone_index_name

GROQ_API_KEY=your_groq_api_key

```

---  

## **Running the Application**  

### **Local Development**  

```bash

uvicorn app.main:app --reload

```

API will be available at **http://localhost:8000**  

### **Running the Streamlit Interface**

```bash

streamlit run app.py

```

The Streamlit interface will be available at **http://localhost:8501**

---  

## **API Endpoints**  

### **1. Root Check**  

- **GET `/`** – Health check endpoint  

### **2. Chat with RAG**  

- **POST `/chat`** – Query legal documents with RAG  

  - **Body:**  

    ```json

    {

      "session_id": "abc123",

      "query": "What are the key points of Smith v. Johnson?"

    }

    ```

  - **Response:**  

    ```json

    {

      "response": "The key points are breach of contract and negligence...",

      "sources": ["case_smith_v_johnson.pdf"]

    }

    ```

### **3. Session Management**  

- **DELETE `/session/{session_id}`** – End a legal research session  

- **POST `/session/reset/{session_id}`** – Reset a session's state  

- **GET `/sessions/count`** – Get the total number of active sessions  

### **4. Retrieve Source Documents**  

- **GET `/sources/{session_id}`** – Fetch relevant legal documents for a session  

---  

## **Llama 3.3 via Groq Inference**  

- **Inference Speed:** <15ms per query response  

- **Token Throughput:** ~500 tokens/sec  

- **Latency:** Low-latency response optimized for real-time queries  

- **Memory Usage:** Efficient memory footprint compared to traditional on-device LLMs  

- **Scalability:** Supports concurrent user queries without degradation  

---  

## **Concurrency & Session Handling**  

This RAG system supports multiple concurrent users by assigning unique `session_id` values for each session.  

- **Sessions track user queries** to improve context and accuracy.  

- **Each session has its own vector search scope** in Pinecone, ensuring faster retrieval of case-specific documents.  

- **Automatic cleanup** – Sessions can be ended manually (`DELETE /session/{session_id}`) or reset (`POST /session/reset/{session_id}`).  

---  

## **Deployment**  

### **Deploy on Render**  

1. **Build Command:**  

   ```bash

   pip install -r requirements.txt

   ```

2. **Start Command:**  

   ```bash

   gunicorn app.main:app -k uvicorn.workers.UvicornWorker --workers 1 --threads 2 --timeout 120

   ```

3. **Environment Variables:**  

   - `PINECONE_API_KEY`  

   - `INDEX_NAME`  

   - `GROQ_API_KEY`  

---  

## **Memory Optimization**  

- Uses **lazy model loading** to reduce memory usage.  

- **Batch processing** to prevent memory spikes.  

- **Garbage collection** after query execution.  

---  

## **License**  

This project is open-source under the **MIT License**.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arulkumarann/legalrag

Awesome Lists containing this project

README