An open API service indexing awesome lists of open source software.

https://github.com/persteenolsen/fastapi-jwt-auth-rag-two

Python FastAPI with JWT Auth serving RAG using Real embeddings
https://github.com/persteenolsen/fastapi-jwt-auth-rag-two

fastapi llp pgvector postgresql python rag-pipeline

Last synced: 12 days ago
JSON representation

Python FastAPI with JWT Auth serving RAG using Real embeddings

Awesome Lists containing this project

README

          

# Python + FastAPI + JWT Auth + RAG + Hugging Face embeddings

A production-style **Retrieval-Augmented Generation (RAG)** API built with **FastAPI**.
This project combines **secure JWT authentication**, **vector search with pgvector**, **Hugging Face embeddings**, and **Groq LLMs** to deliver context-aware answers from your own data.

---

## ๐Ÿ“Œ Project Info

- **Last Updated:** 16-06-2026
- **Python Version:** 3.12

---

## โœจ Features

### ๐Ÿ” Authentication
- JWT-based authentication (HS256)
- Protected endpoints using Bearer tokens
- Environment-based credentials

---

### ๐Ÿง  RAG Pipeline
- Ingests `.txt` documents from URLs
- Splits content into **topic-based chunks**
- Generates embeddings using Hugging Face
- Stores vectors in PostgreSQL (`pgvector`)
- Retrieves relevant context for queries

---

### ๐Ÿค– LLM Integration (Groq)
- Model: `llama-3.1-8b-instant`
- Context-aware answer generation
- Structured prompting for grounded responses

---

### ๐Ÿ”Ž Semantic Search
- Query โ†’ embedding
- Top-K similarity search via `pgvector`
- Cosine distance (`<->`)

---

### ๐Ÿ—„๏ธ Vector Database (PostgreSQL + pgvector)
Stores:
- Document content
- Embeddings (384-dim vectors)
- Source URL
- Metadata
- Timestamp

Optimizations:
- `VECTOR(384)` column
- `ivfflat` index for fast retrieval

---

### โš™๏ธ Background Processing
- FastAPI `BackgroundTasks`
- Async ingestion pipeline
- Non-blocking embedding + DB insert

---

### ๐Ÿงช Debug Tools
- `/debug/retrieve` โ†’ test retrieval without LLM
- Console logging for inspection

---

## ๐Ÿ“ก API Endpoints

| Method | Endpoint | Description |
|--------|--------------------|--------------------------------------|
| POST | `/token` | Get JWT access token |
| POST | `/ask` | Ask questions (RAG-powered) ๐Ÿ” |
| POST | `/ingest` | Ingest `.txt` files from URLs |
| GET | `/debug/retrieve` | Debug semantic search |

๐Ÿ” = Requires authentication

---

## โš™๏ธ Getting Started

### 1. Clone the Repository

```bash
git clone https://github.com/your-username/your-repo.git
cd your-repo
```

---

### 2. Create Virtual Environment

```bash
python -m venv venv
```

Activate it:

**Windows (PowerShell):**
```bash
venv\Scripts\activate
```

**Mac/Linux:**
```bash
source venv/bin/activate
```

---

### 3. Install Dependencies

```bash
pip install -r requirements.txt
```

---

## โ–ถ๏ธ Run the Application

```bash
uvicorn main:app --reload
```

Once running:

- ๐ŸŒ API: http://127.0.0.1:8000

- ๐Ÿ“„ Swagger Docs: http://127.0.0.1:8000/docs

Use Swagger UI to:
1. Authenticate via `/token`
2. Copy the JWT token
3. Authorize requests

---

## ๐Ÿ”‘ Authentication Flow

1. Call `/token` with credentials
2. Receive JWT access token
3. Use in headers:

```http
Authorization: Bearer
```

---

## ๐Ÿง  How RAG Works

```text
User Query
โ†“
Embedding (Hugging Face)
โ†“
pgvector Similarity Search
โ†“
Top-K Relevant Chunks
โ†“
Groq LLM (LLaMA 3.1)
โ†“
Final Answer + Sources
```

---

## ๐Ÿ“ฅ Document Ingestion

### `/ingest`
- Accepts `.txt` file URLs
- Downloads and cleans content
- Splits into topic-based chunks
- Generates embeddings
- Stores results in PostgreSQL

---

## ๐Ÿงพ Embeddings

- Model: `sentence-transformers/all-MiniLM-L6-v2`
- 384-dimensional normalized vectors
- Batch processing with retry support
- Powered via Hugging Face Inference API

---

## ๐Ÿ—„๏ธ Database Initialization

On application startup:
- Creates `pgvector` extension
- Creates `documents` table
- Builds `ivfflat` similarity index

---

## ๐Ÿ› ๏ธ Text Processing

- Fetches `.txt` files from URLs
- Validates content type
- Cleans and normalizes text

---

## ๐Ÿ“Œ Future Improvements

- ๐Ÿ”„ Refresh tokens
- ๐Ÿ“Š Admin dashboard
- ๐Ÿ” Hybrid search (BM25 + vector)
- ๐Ÿ“ˆ Monitoring & logging
- ๐Ÿงฉ Plugin/tool integrations
- Splitting the code of the app.py into seperates files inside folders for improved structure

---

## ๐Ÿ“„ License

MIT License

---

## ๐Ÿ™Œ Final Notes

This project is designed as a **clean, production-style RAG backend** and can be extended into:
- Chatbots
- Internal knowledge systems
- AI assistants
- Document search platforms

Happy coding :-)