An open API service indexing awesome lists of open source software.

https://github.com/persteenolsen/fastapi-jwt-auth-rag-one

Python FastAPI with JWT Auth serving RAG using LLM by Groq
https://github.com/persteenolsen/fastapi-jwt-auth-rag-one

fastapi jwt llm postgresql python rag vector

Last synced: 9 days ago
JSON representation

Python FastAPI with JWT Auth serving RAG using LLM by Groq

Awesome Lists containing this project

README

          

# Python + FastAPI + JWT Auth + RAG + (Mock Embeddings Version)

A lightweight **Retrieval-Augmented Generation (RAG)** API built with **FastAPI**, designed for **learning, testing, and prototyping**.

This version uses **deterministic fake embeddings** instead of real embedding modelsβ€”allowing you to test the full RAG pipeline **without external embedding APIs or costs**.

---

## πŸ“Œ Project Info

- **Version:** 0.0.1
- **Python:** 3.12
- **Last Updated:** 15-04-2026

---

## ✨ Features

### πŸ” Authentication
- JWT-based authentication (HS256)
- Protected endpoints with Bearer tokens
- Simple environment-based credentials

---

### 🧠 RAG Pipeline (Fully Functional)
- Ingest `.txt` files from URLs
- Chunk text with overlap
- Generate embeddings (**mocked**)
- Store vectors in PostgreSQL (`pgvector`)
- Retrieve relevant context for queries
- Generate answers with Groq LLM

---

### πŸ§ͺ Fake Embeddings (Key Feature)
- Deterministic embeddings based on text hashing
- 384-dimensional normalized vectors
- No external API calls required
- Perfect for:
- Local development
- Testing pipelines
- Learning RAG architecture

> ⚠️ Note: These embeddings do **not understand semantics**β€”they only simulate the pipeline.

---

### πŸ€– LLM Integration (Groq)
- Model: `llama-3.1-8b-instant`
- Generates responses from retrieved context
- Temperature-controlled outputs

---

### πŸ”Ž Semantic Retrieval (Simulated)
- Query β†’ fake embedding
- Vector similarity search via `pgvector`
- Top-K document retrieval

---

### πŸ—„οΈ Database (PostgreSQL + pgvector)

Stores:
- Document content
- Embeddings (`VECTOR(384)`)
- Source URL
- Embedding metadata
- Timestamp

Optimizations:
- `ivfflat` index
- Cosine similarity (`<->`)

---

### βš™οΈ Background Processing
- Uses FastAPI `BackgroundTasks`
- Async ingestion pipeline
- Non-blocking embedding + database insert

---

### πŸ§ͺ Debugging Tools
- `/debug/retrieve` β†’ test retrieval without auth or LLM
- Console logs for retrieval inspection

---

## πŸ“‘ API Endpoints

| Method | Endpoint | Description |
|--------|--------------------|--------------------------------------|
| GET | `/` | Health check |
| POST | `/login` | Get JWT token |
| POST | `/ask` | RAG question answering πŸ” |
| POST | `/ingest` | Ingest `.txt` from URL πŸ” |
| POST | `/debug/retrieve` | Test retrieval only |

πŸ” = Requires authentication

---

## βš™οΈ Getting Started

### 1. Clone Repository

```bash
git clone https://github.com/your-username/your-repo.git
cd your-repo
```

---

### 2. Create Virtual Environment

```bash
python -m venv venv
```

Activate:

**Windows:**
```bash
venv\Scripts\activate
```

**Mac/Linux:**
```bash
source venv/bin/activate
```

---

### 3. Install Dependencies

```bash
pip install -r requirements.txt
```

---

## πŸ”‘ Environment Variables

Create a `.env` file:

```env
DATABASE_URL=your_postgres_connection
GROQ_API_KEY=your_groq_api_key
SECRET_KEY=your_secret_key

FAKE_USERNAME=admin
FAKE_PASSWORD=password
```

---

## ▢️ Run the Application

```bash
uvicorn app:app --reload
```

Available at:

- 🌐 API: http://127.0.0.1:8000

- πŸ“„ Swagger UI: http://127.0.0.1:8000/docs

---

## πŸ” Authentication Flow

1. Call `/login` with credentials
2. Receive JWT token
3. Use in headers:

```http
Authorization: Bearer
```

---

## 🧠 How It Works

```text
User Query
↓
Fake Embedding (Deterministic)
↓
pgvector Similarity Search
↓
Top-K Chunks
↓
Groq LLM (LLaMA 3.1)
↓
Final Answer + Sources
```

---

## πŸ“₯ Document Ingestion

### `/ingest`
- Accepts `.txt` file URLs
- Fetches and cleans text
- Splits into chunks (with overlap)
- Generates fake embeddings
- Stores in PostgreSQL

---

## πŸ› οΈ Core Components

### πŸ”Ή Chunking
- Fixed-size chunks (default: 500 chars)
- Overlap: 50 chars

---

### πŸ”Ή Fake Embedding Logic
- Seed based on text hash
- Generates reproducible vectors
- Normalized for cosine similarity

---

### πŸ”Ή Retrieval
- Uses `embedding <-> query_vector`
- Returns top-K most similar chunks

---

## πŸ—„οΈ Database Initialization

On startup:
- Enables `pgvector` extension
- Creates `documents` table
- Builds similarity index (`ivfflat`)

---

## πŸ“Œ Use Cases

This version is ideal for:

- πŸ§ͺ Learning RAG systems
- βš™οΈ Backend prototyping
- πŸ’» Local development without API costs
- πŸ” Debugging retrieval pipelines

---

## 🚧 Limitations

- ❌ No real semantic understanding (fake embeddings)
- ❌ Retrieval quality is not meaningful
- βœ… Pipeline behavior is realistic

---

## πŸ“Œ Future Improvements

- πŸ”„ Replace fake embeddings with real models (Hugging Face / OpenAI)
- πŸ“Š Add monitoring/logging
- πŸ” Hybrid search (BM25 + vectors)
- 🧩 Add document formats (PDF, HTML)
- Splitting up the app.py into seperate files and folders for improved structure

---

## πŸ“„ License

MIT License

---

## πŸ™Œ Final Notes

This project demonstrates a **complete RAG architecture** without external dependencies for embeddingsβ€”making it perfect for understanding how everything fits together before scaling to production systems.