{"id":50958310,"url":"https://github.com/henrymorgandibie/knowledge-rag-api","last_synced_at":"2026-06-18T10:03:09.761Z","repository":{"id":361650534,"uuid":"1234211071","full_name":"HenryMorganDibie/knowledge-rag-api","owner":"HenryMorganDibie","description":"Production RAG backend — Document360 \u0026 SharePoint ingestion, pgvector hybrid search, ACL filtering, and grounded answer generation with citations. Built with FastAPI + Aurora PostgreSQL.","archived":false,"fork":false,"pushed_at":"2026-05-31T16:01:52.000Z","size":166,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-31T18:06:07.491Z","etag":null,"topics":["aws","document360","fastapi","hybrid-search","knowledge-base","langchain","llm","openai","pgvector","postgresql","python","retrieval-augmented-generation","sharepoint","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HenryMorganDibie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-09T22:20:19.000Z","updated_at":"2026-05-31T16:01:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/HenryMorganDibie/knowledge-rag-api","commit_stats":null,"previous_names":["henrymorgandibie/knowledge-rag-api"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/HenryMorganDibie/knowledge-rag-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryMorganDibie%2Fknowledge-rag-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryMorganDibie%2Fknowledge-rag-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryMorganDibie%2Fknowledge-rag-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryMorganDibie%2Fknowledge-rag-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HenryMorganDibie","download_url":"https://codeload.github.com/HenryMorganDibie/knowledge-rag-api/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryMorganDibie%2Fknowledge-rag-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34485169,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","document360","fastapi","hybrid-search","knowledge-base","langchain","llm","openai","pgvector","postgresql","python","retrieval-augmented-generation","sharepoint","vector-search"],"created_at":"2026-06-18T10:03:08.985Z","updated_at":"2026-06-18T10:03:09.751Z","avatar_url":"https://github.com/HenryMorganDibie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knowledge RAG API\n\nProduction-grade Retrieval-Augmented Generation backend for internal technical knowledge bases.\n\nIngests content from **Document360** and **SharePoint**, processes text, tables, and images, stores embeddings in **Aurora PostgreSQL with pgvector**, and exposes a clean API for retrieval, grounded answer generation, feedback, and diagnostics.\n\n---\n\n## Live Demo\n\n![Knowledge RAG API Swagger UI](docs/images/swagger-ui.jpg)\n\nThe full API is documented and testable via Swagger UI at `/docs`. All endpoints — Ingestion, Retrieval, Orchestrator, Feedback, and Debug — are live and interactive.\n\n---\n\n## Architecture\n\n```\n                    ┌─────────────────────────────────────────────┐\n                    │              Ingestion Layer                 │\n                    │                                              │\n    Document360 ───▶│  Connector → Fingerprint Check → Chunker   │\n    SharePoint  ───▶│  → Image Describer → Embedder → S3 Upload  │\n                    │  → Atomic Publish to Aurora PostgreSQL       │\n                    └────────────────────┬────────────────────────┘\n                                         │\n                    ┌────────────────────▼────────────────────────┐\n                    │           Aurora PostgreSQL + pgvector       │\n                    │                                              │\n                    │  document_sources  (canonical registry)      │\n                    │  document_revisions (immutable audit trail)  │\n                    │  document_chunks   (embeddings + BM25 GIN)   │\n                    │  feedback_logs     (thumbs up/down)          │\n                    │  ingestion_jobs    (run audit)               │\n                    └────────────────────┬────────────────────────┘\n                                         │\n                    ┌────────────────────▼────────────────────────┐\n                    │              Retrieval Layer                 │\n                    │                                              │\n                    │  Vector Search (HNSW cosine) +              │\n                    │  BM25 Full-Text (tsvector/tsquery) +        │\n                    │  RRF Merge + ACL Filter + Cross-Encoder     │\n                    └────────────────────┬────────────────────────┘\n                                         │\n                    ┌────────────────────▼────────────────────────┐\n                    │                  APIs (FastAPI)              │\n                    │                                              │\n                    │  POST /retrieve   Hybrid search + ACL       │\n                    │  POST /ask        Grounded answer + citations│\n                    │  POST /feedback   Thumbs up/down capture    │\n                    │  POST /debug/trace Full retrieval trace      │\n                    │  POST /ingest/*   Trigger ingestion sync     │\n                    └─────────────────────────────────────────────┘\n```\n\n---\n\n## Key Design Decisions\n\n### Fingerprint-Based Change Detection\nEvery document is SHA-256 fingerprinted on raw content before any processing begins. Unchanged documents are skipped entirely — no re-chunking, no re-embedding, no S3 writes. This keeps incremental syncs fast even at scale.\n\n### Atomic Chunk Publishing\nOld chunks are deleted and new chunks inserted in a single database transaction. There is no window where a query can return a mix of stale and fresh chunks for the same document. This is the most critical correctness guarantee in the system.\n\n### Structure-Aware Chunking\nThe chunker walks the HTML DOM rather than splitting on raw character count. Every chunk carries its full `section_path` (e.g. `\"Setup \u003e Installation \u003e Windows\"`) and `heading` so retrieval context is never lost. Tables are serialized to markdown. Images are described by GPT-4o vision so diagrams and screenshots are searchable.\n\n### Hybrid Retrieval with RRF\nVector search and BM25 full-text search run in parallel. Results are merged using Reciprocal Rank Fusion — chunks appearing in both ranked lists get a significant boost. A cross-encoder reranker (sentence-transformers) handles final precision ordering.\n\n### ACL Filtering\nEvery chunk stores the ACL groups from its source document. The retrieval layer filters chunks at query time — a user only sees chunks their group has access to. ACL bleed (returning restricted chunks to unauthorized users) is tested explicitly.\n\n### Presigned S3 Citation URLs\nSource documents are stored in S3. Citation endpoints return time-limited presigned URLs — callers get temporary, auth-gated access to the original document without any credentials being exposed.\n\n---\n\n## Project Structure\n\n```\nknowledge-rag-api/\n├── api/\n│   ├── main.py                  # FastAPI app + lifespan\n│   └── routes/\n│       ├── health.py\n│       ├── ingest.py            # Ingestion triggers\n│       ├── retrieval.py         # Hybrid search endpoint\n│       ├── orchestrator.py      # Grounded answer endpoint\n│       ├── feedback.py          # Thumbs up/down capture\n│       └── debug.py             # Retrieval trace endpoint\n├── core/\n│   ├── config.py                # All settings via env vars\n│   ├── database.py              # Async SQLAlchemy + pgvector init\n│   ├── models.py                # ORM models\n│   └── logger.py                # CloudWatch-friendly JSON logger\n├── ingestion/\n│   ├── pipeline.py              # Core ingestion with atomic publish\n│   ├── connectors/\n│   │   ├── document360.py       # Document360 REST API connector\n│   │   └── sharepoint.py        # Microsoft Graph / SharePoint connector\n│   └── processors/\n│       ├── chunker.py           # Structure-aware HTML chunker\n│       ├── embedder.py          # OpenAI batch embedding\n│       └── image_describer.py   # GPT-4o vision image description\n├── retrieval/\n│   └── hybrid_retriever.py      # Vector + BM25 + RRF + reranking\n├── orchestrator/\n│   └── answer_engine.py         # Grounded LLM answer generation\n├── storage/\n│   └── s3_client.py             # S3/MinIO abstraction + presigned URLs\n├── tests/\n│   ├── unit/\n│   │   ├── test_chunker.py\n│   │   ├── test_fingerprint.py\n│   │   └── test_retriever.py\n│   └── integration/\n│       └── test_pipeline.py\n├── docs/\n│   └── images/\n│       └── swagger-ui.jpg       # Live API screenshot\n├── docker-compose.yml           # PostgreSQL + pgvector + MinIO\n├── Dockerfile\n├── requirements.txt\n├── alembic.ini\n└── .env.example\n```\n\n---\n\n## Quickstart\n\n### Option A — GitHub Codespaces (Recommended)\n\nThe fastest way to run the full stack with zero local setup.\n\n1. Click the green **Code** button on this repo → **Codespaces** tab → **Create codespace on main**\n2. Wait ~60 seconds for the environment to load, then in the terminal:\n\n```bash\ncp .env.example .env\n# Open .env and add your OPENAI_API_KEY\n```\n\n3. Start the database and MinIO storage:\n\n```bash\ndocker-compose up db minio -d\n```\n\n4. Install dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n5. Run the API:\n\n```bash\npython -m uvicorn api.main:app --reload --host 0.0.0.0 --port 8000\n```\n\n6. Go to the **Ports** tab in VS Code → click the 🌐 globe icon next to port **8000** → add `/docs` to the URL.\n\n\u003e **Tip:** Store your `OPENAI_API_KEY` under repo **Settings → Secrets → Codespaces** so it's injected automatically every time you open the Codespace.\n\n---\n\n### Option B — Local Dev\n\n#### 1. Clone and configure\n\n```bash\ngit clone https://github.com/HenryMorganDibie/knowledge-rag-api.git\ncd knowledge-rag-api\ncp .env.example .env\n# Fill in OPENAI_API_KEY and optionally Document360/SharePoint credentials\n```\n\n#### 2. Start infrastructure\n\n```bash\ndocker-compose up db minio -d\n```\n\n#### 3. Install dependencies\n\n```bash\npython -m venv .venv \u0026\u0026 source .venv/bin/activate  # Windows: .venv\\Scripts\\activate\npip install -r requirements.txt\n```\n\n#### 4. Run the API\n\n```bash\npython -m uvicorn api.main:app --reload\n```\n\nThe API will be live at `http://localhost:8000/docs`\n\nOn first startup, the app automatically:\n- Enables the `pgvector` extension\n- Creates all tables\n- Builds HNSW and GIN indexes\n\n#### 5. Run tests\n\n```bash\npytest tests/ -v\n```\n\n---\n\n## API Reference\n\n### `POST /retrieve`\nHybrid vector + BM25 search with ACL filtering and reranking.\n\n```json\n{\n  \"query\": \"How do I configure SSO?\",\n  \"acl_groups\": [\"engineering\", \"it-ops\"],\n  \"top_k\": 5,\n  \"diagnostics\": true\n}\n```\n\n### `POST /ask`\nGrounded answer generation with structured citation blocks.\n\n```json\n{\n  \"query\": \"What are the rate limits for the REST API?\",\n  \"acl_groups\": [\"engineering\"],\n  \"top_k\": 5\n}\n```\n\nResponse:\n```json\n{\n  \"answer\": \"The REST API enforces a limit of 100 requests per minute per API key...\",\n  \"citations\": [\n    {\n      \"chunk_id\": \"3f2a...\",\n      \"section_path\": \"API Reference \u003e Rate Limiting\",\n      \"heading\": \"Rate Limiting\",\n      \"excerpt\": \"The API enforces 100 requests per minute...\"\n    }\n  ],\n  \"chunks_used\": 3\n}\n```\n\n### `POST /feedback`\nCapture thumbs up/down with optional failure category.\n\n```json\n{\n  \"query\": \"How do I reset my password?\",\n  \"rating\": \"negative\",\n  \"failure_category\": \"wrong_answer\",\n  \"comment\": \"Answer was about API keys, not user passwords\",\n  \"chunk_ids\": [\"abc123\", \"def456\"]\n}\n```\n\n### `POST /debug/trace`\nFull retrieval trace showing vector scores, BM25 ranks, RRF merge, and rerank scores.\n\n### `POST /ingest/document360`\nTrigger a full Document360 sync (runs in background, returns job ID).\n\n### `POST /ingest/sharepoint`\nTrigger a full SharePoint sync.\n\n---\n\n## Production Deployment (AWS)\n\n| Component | AWS Service |\n|-----------|-------------|\n| API | ECS Fargate (containerized FastAPI) |\n| Database | Aurora PostgreSQL + pgvector |\n| Raw storage | S3 (raw docs + images) |\n| Chunk artifacts | S3 (JSON chunk snapshots) |\n| Async ingestion | SQS + EventBridge scheduled triggers |\n| Secrets | AWS Secrets Manager |\n| Observability | CloudWatch (structured JSON logs) |\n\nTo switch from local PostgreSQL to Aurora, update `DATABASE_URL` in your environment:\n```\nDATABASE_URL=postgresql+asyncpg://user:pass@your-aurora-cluster.rds.amazonaws.com:5432/knowledge_rag\n```\n\nTo use real AWS S3 instead of MinIO, leave `S3_ENDPOINT_URL` empty and set proper IAM credentials.\n\n---\n\n## Retrieval Quality Evaluation\n\nThe system is evaluated across four dimensions:\n\n| Metric | What it tests |\n|--------|--------------|\n| Chunk boundary coherence | Chunks don't split mid-sentence or mid-table |\n| Citation grounding rate | Every claim in the answer maps to a retrieved chunk |\n| Stale content prevention | Re-ingested documents never return old chunks |\n| ACL safety | Restricted chunks never surface for unauthorized groups |\n\n---\n\n## Environment Variables\n\nSee `.env.example` for the full list. Key variables:\n\n| Variable | Description |\n|----------|-------------|\n| `DATABASE_URL` | PostgreSQL connection string (asyncpg) |\n| `OPENAI_API_KEY` | Used for embeddings and LLM answer generation |\n| `S3_ENDPOINT_URL` | Leave empty for AWS S3; set for local MinIO |\n| `DOCUMENT360_API_KEY` | Document360 API token |\n| `AZURE_TENANT_ID` / `AZURE_CLIENT_ID` / `AZURE_CLIENT_SECRET` | Microsoft Graph credentials for SharePoint |\n| `EMBEDDING_MODEL` | Default: `text-embedding-3-small` |\n| `LLM_MODEL` | Default: `gpt-4o` |\n| `CHUNK_SIZE` | Token target per chunk (default: 512) |\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrymorgandibie%2Fknowledge-rag-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenrymorgandibie%2Fknowledge-rag-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrymorgandibie%2Fknowledge-rag-api/lists"}