https://github.com/persteenolsen/fastapi-jwt-auth-rag-two
Python FastAPI with JWT Auth serving RAG using Real embeddings
https://github.com/persteenolsen/fastapi-jwt-auth-rag-two
fastapi llp pgvector postgresql python rag-pipeline
Last synced: 12 days ago
JSON representation
Python FastAPI with JWT Auth serving RAG using Real embeddings
- Host: GitHub
- URL: https://github.com/persteenolsen/fastapi-jwt-auth-rag-two
- Owner: persteenolsen
- Created: 2026-04-11T15:09:06.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-06-16T05:34:48.000Z (13 days ago)
- Last Synced: 2026-06-16T07:47:02.345Z (13 days ago)
- Topics: fastapi, llp, pgvector, postgresql, python, rag-pipeline
- Language: Python
- Homepage: https://fastapi-jwt-auth-rag-two.vercel.app/docs
- Size: 27.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Python + FastAPI + JWT Auth + RAG + Hugging Face embeddings
A production-style **Retrieval-Augmented Generation (RAG)** API built with **FastAPI**.
This project combines **secure JWT authentication**, **vector search with pgvector**, **Hugging Face embeddings**, and **Groq LLMs** to deliver context-aware answers from your own data.
---
## ๐ Project Info
- **Last Updated:** 16-06-2026
- **Python Version:** 3.12
---
## โจ Features
### ๐ Authentication
- JWT-based authentication (HS256)
- Protected endpoints using Bearer tokens
- Environment-based credentials
---
### ๐ง RAG Pipeline
- Ingests `.txt` documents from URLs
- Splits content into **topic-based chunks**
- Generates embeddings using Hugging Face
- Stores vectors in PostgreSQL (`pgvector`)
- Retrieves relevant context for queries
---
### ๐ค LLM Integration (Groq)
- Model: `llama-3.1-8b-instant`
- Context-aware answer generation
- Structured prompting for grounded responses
---
### ๐ Semantic Search
- Query โ embedding
- Top-K similarity search via `pgvector`
- Cosine distance (`<->`)
---
### ๐๏ธ Vector Database (PostgreSQL + pgvector)
Stores:
- Document content
- Embeddings (384-dim vectors)
- Source URL
- Metadata
- Timestamp
Optimizations:
- `VECTOR(384)` column
- `ivfflat` index for fast retrieval
---
### โ๏ธ Background Processing
- FastAPI `BackgroundTasks`
- Async ingestion pipeline
- Non-blocking embedding + DB insert
---
### ๐งช Debug Tools
- `/debug/retrieve` โ test retrieval without LLM
- Console logging for inspection
---
## ๐ก API Endpoints
| Method | Endpoint | Description |
|--------|--------------------|--------------------------------------|
| POST | `/token` | Get JWT access token |
| POST | `/ask` | Ask questions (RAG-powered) ๐ |
| POST | `/ingest` | Ingest `.txt` files from URLs |
| GET | `/debug/retrieve` | Debug semantic search |
๐ = Requires authentication
---
## โ๏ธ Getting Started
### 1. Clone the Repository
```bash
git clone https://github.com/your-username/your-repo.git
cd your-repo
```
---
### 2. Create Virtual Environment
```bash
python -m venv venv
```
Activate it:
**Windows (PowerShell):**
```bash
venv\Scripts\activate
```
**Mac/Linux:**
```bash
source venv/bin/activate
```
---
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
---
## โถ๏ธ Run the Application
```bash
uvicorn main:app --reload
```
Once running:
- ๐ API: http://127.0.0.1:8000
- ๐ Swagger Docs: http://127.0.0.1:8000/docs
Use Swagger UI to:
1. Authenticate via `/token`
2. Copy the JWT token
3. Authorize requests
---
## ๐ Authentication Flow
1. Call `/token` with credentials
2. Receive JWT access token
3. Use in headers:
```http
Authorization: Bearer
```
---
## ๐ง How RAG Works
```text
User Query
โ
Embedding (Hugging Face)
โ
pgvector Similarity Search
โ
Top-K Relevant Chunks
โ
Groq LLM (LLaMA 3.1)
โ
Final Answer + Sources
```
---
## ๐ฅ Document Ingestion
### `/ingest`
- Accepts `.txt` file URLs
- Downloads and cleans content
- Splits into topic-based chunks
- Generates embeddings
- Stores results in PostgreSQL
---
## ๐งพ Embeddings
- Model: `sentence-transformers/all-MiniLM-L6-v2`
- 384-dimensional normalized vectors
- Batch processing with retry support
- Powered via Hugging Face Inference API
---
## ๐๏ธ Database Initialization
On application startup:
- Creates `pgvector` extension
- Creates `documents` table
- Builds `ivfflat` similarity index
---
## ๐ ๏ธ Text Processing
- Fetches `.txt` files from URLs
- Validates content type
- Cleans and normalizes text
---
## ๐ Future Improvements
- ๐ Refresh tokens
- ๐ Admin dashboard
- ๐ Hybrid search (BM25 + vector)
- ๐ Monitoring & logging
- ๐งฉ Plugin/tool integrations
- Splitting the code of the app.py into seperates files inside folders for improved structure
---
## ๐ License
MIT License
---
## ๐ Final Notes
This project is designed as a **clean, production-style RAG backend** and can be extended into:
- Chatbots
- Internal knowledge systems
- AI assistants
- Document search platforms
Happy coding :-)