https://github.com/henrymorgandibie/knowledge-rag-api

Production RAG backend — Document360 & SharePoint ingestion, pgvector hybrid search, ACL filtering, and grounded answer generation with citations. Built with FastAPI + Aurora PostgreSQL.
https://github.com/henrymorgandibie/knowledge-rag-api

aws document360 fastapi hybrid-search knowledge-base langchain llm openai pgvector postgresql python retrieval-augmented-generation sharepoint vector-search

Last synced: 14 days ago
JSON representation

Production RAG backend — Document360 & SharePoint ingestion, pgvector hybrid search, ACL filtering, and grounded answer generation with citations. Built with FastAPI + Aurora PostgreSQL.

Host: GitHub
URL: https://github.com/henrymorgandibie/knowledge-rag-api
Owner: HenryMorganDibie
License: mit
Created: 2026-05-09T22:20:19.000Z (about 2 months ago)
Default Branch: master
Last Pushed: 2026-05-31T16:01:52.000Z (about 1 month ago)
Last Synced: 2026-05-31T18:06:07.491Z (about 1 month ago)
Topics: aws, document360, fastapi, hybrid-search, knowledge-base, langchain, llm, openai, pgvector, postgresql, python, retrieval-augmented-generation, sharepoint, vector-search
Language: Python
Homepage:
Size: 162 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Knowledge RAG API

Production-grade Retrieval-Augmented Generation backend for internal technical knowledge bases.

Ingests content from **Document360** and **SharePoint**, processes text, tables, and images, stores embeddings in **Aurora PostgreSQL with pgvector**, and exposes a clean API for retrieval, grounded answer generation, feedback, and diagnostics.

---

## Live Demo

![Knowledge RAG API Swagger UI](docs/images/swagger-ui.jpg)

The full API is documented and testable via Swagger UI at `/docs`. All endpoints — Ingestion, Retrieval, Orchestrator, Feedback, and Debug — are live and interactive.

---

## Architecture

```
┌─────────────────────────────────────────────┐
│ Ingestion Layer │
│ │
Document360 ───▶│ Connector → Fingerprint Check → Chunker │
SharePoint ───▶│ → Image Describer → Embedder → S3 Upload │
│ → Atomic Publish to Aurora PostgreSQL │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ Aurora PostgreSQL + pgvector │
│ │
│ document_sources (canonical registry) │
│ document_revisions (immutable audit trail) │
│ document_chunks (embeddings + BM25 GIN) │
│ feedback_logs (thumbs up/down) │
│ ingestion_jobs (run audit) │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ Retrieval Layer │
│ │
│ Vector Search (HNSW cosine) + │
│ BM25 Full-Text (tsvector/tsquery) + │
│ RRF Merge + ACL Filter + Cross-Encoder │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ APIs (FastAPI) │
│ │
│ POST /retrieve Hybrid search + ACL │
│ POST /ask Grounded answer + citations│
│ POST /feedback Thumbs up/down capture │
│ POST /debug/trace Full retrieval trace │
│ POST /ingest/* Trigger ingestion sync │
└─────────────────────────────────────────────┘
```

---

## Key Design Decisions

### Fingerprint-Based Change Detection
Every document is SHA-256 fingerprinted on raw content before any processing begins. Unchanged documents are skipped entirely — no re-chunking, no re-embedding, no S3 writes. This keeps incremental syncs fast even at scale.

### Atomic Chunk Publishing
Old chunks are deleted and new chunks inserted in a single database transaction. There is no window where a query can return a mix of stale and fresh chunks for the same document. This is the most critical correctness guarantee in the system.

### Structure-Aware Chunking
The chunker walks the HTML DOM rather than splitting on raw character count. Every chunk carries its full `section_path` (e.g. `"Setup > Installation > Windows"`) and `heading` so retrieval context is never lost. Tables are serialized to markdown. Images are described by GPT-4o vision so diagrams and screenshots are searchable.

### Hybrid Retrieval with RRF
Vector search and BM25 full-text search run in parallel. Results are merged using Reciprocal Rank Fusion — chunks appearing in both ranked lists get a significant boost. A cross-encoder reranker (sentence-transformers) handles final precision ordering.

### ACL Filtering
Every chunk stores the ACL groups from its source document. The retrieval layer filters chunks at query time — a user only sees chunks their group has access to. ACL bleed (returning restricted chunks to unauthorized users) is tested explicitly.

### Presigned S3 Citation URLs
Source documents are stored in S3. Citation endpoints return time-limited presigned URLs — callers get temporary, auth-gated access to the original document without any credentials being exposed.

---

## Project Structure

```
knowledge-rag-api/
├── api/
│ ├── main.py # FastAPI app + lifespan
│ └── routes/
│ ├── health.py
│ ├── ingest.py # Ingestion triggers
│ ├── retrieval.py # Hybrid search endpoint
│ ├── orchestrator.py # Grounded answer endpoint
│ ├── feedback.py # Thumbs up/down capture
│ └── debug.py # Retrieval trace endpoint
├── core/
│ ├── config.py # All settings via env vars
│ ├── database.py # Async SQLAlchemy + pgvector init
│ ├── models.py # ORM models
│ └── logger.py # CloudWatch-friendly JSON logger
├── ingestion/
│ ├── pipeline.py # Core ingestion with atomic publish
│ ├── connectors/
│ │ ├── document360.py # Document360 REST API connector
│ │ └── sharepoint.py # Microsoft Graph / SharePoint connector
│ └── processors/
│ ├── chunker.py # Structure-aware HTML chunker
│ ├── embedder.py # OpenAI batch embedding
│ └── image_describer.py # GPT-4o vision image description
├── retrieval/
│ └── hybrid_retriever.py # Vector + BM25 + RRF + reranking
├── orchestrator/
│ └── answer_engine.py # Grounded LLM answer generation
├── storage/
│ └── s3_client.py # S3/MinIO abstraction + presigned URLs
├── tests/
│ ├── unit/
│ │ ├── test_chunker.py
│ │ ├── test_fingerprint.py
│ │ └── test_retriever.py
│ └── integration/
│ └── test_pipeline.py
├── docs/
│ └── images/
│ └── swagger-ui.jpg # Live API screenshot
├── docker-compose.yml # PostgreSQL + pgvector + MinIO
├── Dockerfile
├── requirements.txt
├── alembic.ini
└── .env.example
```

---

## Quickstart

### Option A — GitHub Codespaces (Recommended)

The fastest way to run the full stack with zero local setup.

1. Click the green **Code** button on this repo → **Codespaces** tab → **Create codespace on main**
2. Wait ~60 seconds for the environment to load, then in the terminal:

```bash
cp .env.example .env
# Open .env and add your OPENAI_API_KEY
```

3. Start the database and MinIO storage:

```bash
docker-compose up db minio -d
```

4. Install dependencies:

```bash
pip install -r requirements.txt
```

5. Run the API:

```bash
python -m uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
```

6. Go to the **Ports** tab in VS Code → click the 🌐 globe icon next to port **8000** → add `/docs` to the URL.

> **Tip:** Store your `OPENAI_API_KEY` under repo **Settings → Secrets → Codespaces** so it's injected automatically every time you open the Codespace.

---

### Option B — Local Dev

#### 1. Clone and configure

```bash
git clone https://github.com/HenryMorganDibie/knowledge-rag-api.git
cd knowledge-rag-api
cp .env.example .env
# Fill in OPENAI_API_KEY and optionally Document360/SharePoint credentials
```

#### 2. Start infrastructure

```bash
docker-compose up db minio -d
```

#### 3. Install dependencies

```bash
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

#### 4. Run the API

```bash
python -m uvicorn api.main:app --reload
```

The API will be live at `http://localhost:8000/docs`

On first startup, the app automatically:
- Enables the `pgvector` extension
- Creates all tables
- Builds HNSW and GIN indexes

#### 5. Run tests

```bash
pytest tests/ -v
```

---

## API Reference

### `POST /retrieve`
Hybrid vector + BM25 search with ACL filtering and reranking.

```json
{
"query": "How do I configure SSO?",
"acl_groups": ["engineering", "it-ops"],
"top_k": 5,
"diagnostics": true
}
```

### `POST /ask`
Grounded answer generation with structured citation blocks.

```json
{
"query": "What are the rate limits for the REST API?",
"acl_groups": ["engineering"],
"top_k": 5
}
```

Response:
```json
{
"answer": "The REST API enforces a limit of 100 requests per minute per API key...",
"citations": [
{
"chunk_id": "3f2a...",
"section_path": "API Reference > Rate Limiting",
"heading": "Rate Limiting",
"excerpt": "The API enforces 100 requests per minute..."
}
],
"chunks_used": 3
}
```

### `POST /feedback`
Capture thumbs up/down with optional failure category.

```json
{
"query": "How do I reset my password?",
"rating": "negative",
"failure_category": "wrong_answer",
"comment": "Answer was about API keys, not user passwords",
"chunk_ids": ["abc123", "def456"]
}
```

### `POST /debug/trace`
Full retrieval trace showing vector scores, BM25 ranks, RRF merge, and rerank scores.

### `POST /ingest/document360`
Trigger a full Document360 sync (runs in background, returns job ID).

### `POST /ingest/sharepoint`
Trigger a full SharePoint sync.

---

## Production Deployment (AWS)

| Component | AWS Service |
|-----------|-------------|
| API | ECS Fargate (containerized FastAPI) |
| Database | Aurora PostgreSQL + pgvector |
| Raw storage | S3 (raw docs + images) |
| Chunk artifacts | S3 (JSON chunk snapshots) |
| Async ingestion | SQS + EventBridge scheduled triggers |
| Secrets | AWS Secrets Manager |
| Observability | CloudWatch (structured JSON logs) |

To switch from local PostgreSQL to Aurora, update `DATABASE_URL` in your environment:
```
DATABASE_URL=postgresql+asyncpg://user:pass@your-aurora-cluster.rds.amazonaws.com:5432/knowledge_rag
```

To use real AWS S3 instead of MinIO, leave `S3_ENDPOINT_URL` empty and set proper IAM credentials.

---

## Retrieval Quality Evaluation

The system is evaluated across four dimensions:

| Metric | What it tests |
|--------|--------------|
| Chunk boundary coherence | Chunks don't split mid-sentence or mid-table |
| Citation grounding rate | Every claim in the answer maps to a retrieved chunk |
| Stale content prevention | Re-ingested documents never return old chunks |
| ACL safety | Restricted chunks never surface for unauthorized groups |

---

## Environment Variables

See `.env.example` for the full list. Key variables:

| Variable | Description |
|----------|-------------|
| `DATABASE_URL` | PostgreSQL connection string (asyncpg) |
| `OPENAI_API_KEY` | Used for embeddings and LLM answer generation |
| `S3_ENDPOINT_URL` | Leave empty for AWS S3; set for local MinIO |
| `DOCUMENT360_API_KEY` | Document360 API token |
| `AZURE_TENANT_ID` / `AZURE_CLIENT_ID` / `AZURE_CLIENT_SECRET` | Microsoft Graph credentials for SharePoint |
| `EMBEDDING_MODEL` | Default: `text-embedding-3-small` |
| `LLM_MODEL` | Default: `gpt-4o` |
| `CHUNK_SIZE` | Token target per chunk (default: 512) |

---

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/henrymorgandibie/knowledge-rag-api

Awesome Lists containing this project

README