https://github.com/persteenolsen/fastapi-jwt-auth-rag-one
Python FastAPI with JWT Auth serving RAG using LLM by Groq
https://github.com/persteenolsen/fastapi-jwt-auth-rag-one
fastapi jwt llm postgresql python rag vector
Last synced: 9 days ago
JSON representation
Python FastAPI with JWT Auth serving RAG using LLM by Groq
- Host: GitHub
- URL: https://github.com/persteenolsen/fastapi-jwt-auth-rag-one
- Owner: persteenolsen
- Created: 2026-04-08T10:07:55.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-08T10:29:51.000Z (3 months ago)
- Last Synced: 2026-04-08T12:20:41.258Z (3 months ago)
- Topics: fastapi, jwt, llm, postgresql, python, rag, vector
- Language: Python
- Homepage: https://fastapi-jwt-auth-rag-one.vercel.app
- Size: 7.81 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Python + FastAPI + JWT Auth + RAG + (Mock Embeddings Version)
A lightweight **Retrieval-Augmented Generation (RAG)** API built with **FastAPI**, designed for **learning, testing, and prototyping**.
This version uses **deterministic fake embeddings** instead of real embedding modelsβallowing you to test the full RAG pipeline **without external embedding APIs or costs**.
---
## π Project Info
- **Version:** 0.0.1
- **Python:** 3.12
- **Last Updated:** 15-04-2026
---
## β¨ Features
### π Authentication
- JWT-based authentication (HS256)
- Protected endpoints with Bearer tokens
- Simple environment-based credentials
---
### π§ RAG Pipeline (Fully Functional)
- Ingest `.txt` files from URLs
- Chunk text with overlap
- Generate embeddings (**mocked**)
- Store vectors in PostgreSQL (`pgvector`)
- Retrieve relevant context for queries
- Generate answers with Groq LLM
---
### π§ͺ Fake Embeddings (Key Feature)
- Deterministic embeddings based on text hashing
- 384-dimensional normalized vectors
- No external API calls required
- Perfect for:
- Local development
- Testing pipelines
- Learning RAG architecture
> β οΈ Note: These embeddings do **not understand semantics**βthey only simulate the pipeline.
---
### π€ LLM Integration (Groq)
- Model: `llama-3.1-8b-instant`
- Generates responses from retrieved context
- Temperature-controlled outputs
---
### π Semantic Retrieval (Simulated)
- Query β fake embedding
- Vector similarity search via `pgvector`
- Top-K document retrieval
---
### ποΈ Database (PostgreSQL + pgvector)
Stores:
- Document content
- Embeddings (`VECTOR(384)`)
- Source URL
- Embedding metadata
- Timestamp
Optimizations:
- `ivfflat` index
- Cosine similarity (`<->`)
---
### βοΈ Background Processing
- Uses FastAPI `BackgroundTasks`
- Async ingestion pipeline
- Non-blocking embedding + database insert
---
### π§ͺ Debugging Tools
- `/debug/retrieve` β test retrieval without auth or LLM
- Console logs for retrieval inspection
---
## π‘ API Endpoints
| Method | Endpoint | Description |
|--------|--------------------|--------------------------------------|
| GET | `/` | Health check |
| POST | `/login` | Get JWT token |
| POST | `/ask` | RAG question answering π |
| POST | `/ingest` | Ingest `.txt` from URL π |
| POST | `/debug/retrieve` | Test retrieval only |
π = Requires authentication
---
## βοΈ Getting Started
### 1. Clone Repository
```bash
git clone https://github.com/your-username/your-repo.git
cd your-repo
```
---
### 2. Create Virtual Environment
```bash
python -m venv venv
```
Activate:
**Windows:**
```bash
venv\Scripts\activate
```
**Mac/Linux:**
```bash
source venv/bin/activate
```
---
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
---
## π Environment Variables
Create a `.env` file:
```env
DATABASE_URL=your_postgres_connection
GROQ_API_KEY=your_groq_api_key
SECRET_KEY=your_secret_key
FAKE_USERNAME=admin
FAKE_PASSWORD=password
```
---
## βΆοΈ Run the Application
```bash
uvicorn app:app --reload
```
Available at:
- π API: http://127.0.0.1:8000
- π Swagger UI: http://127.0.0.1:8000/docs
---
## π Authentication Flow
1. Call `/login` with credentials
2. Receive JWT token
3. Use in headers:
```http
Authorization: Bearer
```
---
## π§ How It Works
```text
User Query
β
Fake Embedding (Deterministic)
β
pgvector Similarity Search
β
Top-K Chunks
β
Groq LLM (LLaMA 3.1)
β
Final Answer + Sources
```
---
## π₯ Document Ingestion
### `/ingest`
- Accepts `.txt` file URLs
- Fetches and cleans text
- Splits into chunks (with overlap)
- Generates fake embeddings
- Stores in PostgreSQL
---
## π οΈ Core Components
### πΉ Chunking
- Fixed-size chunks (default: 500 chars)
- Overlap: 50 chars
---
### πΉ Fake Embedding Logic
- Seed based on text hash
- Generates reproducible vectors
- Normalized for cosine similarity
---
### πΉ Retrieval
- Uses `embedding <-> query_vector`
- Returns top-K most similar chunks
---
## ποΈ Database Initialization
On startup:
- Enables `pgvector` extension
- Creates `documents` table
- Builds similarity index (`ivfflat`)
---
## π Use Cases
This version is ideal for:
- π§ͺ Learning RAG systems
- βοΈ Backend prototyping
- π» Local development without API costs
- π Debugging retrieval pipelines
---
## π§ Limitations
- β No real semantic understanding (fake embeddings)
- β Retrieval quality is not meaningful
- β
Pipeline behavior is realistic
---
## π Future Improvements
- π Replace fake embeddings with real models (Hugging Face / OpenAI)
- π Add monitoring/logging
- π Hybrid search (BM25 + vectors)
- π§© Add document formats (PDF, HTML)
- Splitting up the app.py into seperate files and folders for improved structure
---
## π License
MIT License
---
## π Final Notes
This project demonstrates a **complete RAG architecture** without external dependencies for embeddingsβmaking it perfect for understanding how everything fits together before scaling to production systems.