{"id":51400517,"url":"https://github.com/qtuanph/chatbot-rag","last_synced_at":"2026-07-04T06:03:57.763Z","repository":{"id":360179993,"uuid":"1202880781","full_name":"qtuanph/chatbot-rag","owner":"qtuanph","description":"Production-ready Vietnamese RAG platform with LlamaIndex, Qdrant hybrid retrieval, provider management, and real-time analytics.","archived":false,"fork":false,"pushed_at":"2026-06-24T10:24:52.000Z","size":16634,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-24T10:45:54.351Z","etag":null,"topics":["celery","fastapi","llamaindex","nextjs","postgresql","qdrant","rag","redis","vector-database","vietnamese"],"latest_commit_sha":null,"homepage":"https://github.com/qtuanph/chatbot-rag","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qtuanph.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-06T13:53:35.000Z","updated_at":"2026-06-24T10:24:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/qtuanph/chatbot-rag","commit_stats":null,"previous_names":["qtuanph/chatbot-rag"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/qtuanph/chatbot-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qtuanph%2Fchatbot-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qtuanph%2Fchatbot-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qtuanph%2Fchatbot-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qtuanph%2Fchatbot-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qtuanph","download_url":"https://codeload.github.com/qtuanph/chatbot-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qtuanph%2Fchatbot-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35111429,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["celery","fastapi","llamaindex","nextjs","postgresql","qdrant","rag","redis","vector-database","vietnamese"],"created_at":"2026-07-04T06:03:53.887Z","updated_at":"2026-07-04T06:03:57.744Z","avatar_url":"https://github.com/qtuanph.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# chatbot-rag\n\n[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL%203.0-blue.svg)](./LICENSE)\n[![Backend: FastAPI](https://img.shields.io/badge/Backend-FastAPI-009688)](https://fastapi.tiangolo.com/)\n[![Frontend: Next.js 16](https://img.shields.io/badge/Frontend-Next.js%2016-black)](https://nextjs.org/)\n[![Vector DB: Qdrant](https://img.shields.io/badge/Vector%20DB-Qdrant-red)](https://qdrant.tech/)\n[![UI: shadcn/ui](https://img.shields.io/badge/UI-shadcn%2Fui-black)](https://ui.shadcn.com/)\n\nA self-hosted, multi-tenant RAG chatbot platform built for SaaS-style operations and real product integration.\n\n`chatbot-rag` is designed as an AI gateway between tenant applications and enterprise knowledge retrieval. It combines tenant-scoped document ingestion, stateless chat, OpenAI-compatible APIs, hybrid retrieval, usage tracking, and an internal admin console in one deployable stack.\n\n---\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Key Capabilities](#key-capabilities)\n- [Why This Architecture](#why-this-architecture)\n- [System Architecture](#system-architecture)\n- [Technology Stack](#technology-stack)\n- [Product Model](#product-model)\n- [Retrieval Pipeline](#retrieval-pipeline)\n- [Public API Example](#public-api-example)\n- [Quick Start](#quick-start)\n- [Operational Notes](#operational-notes)\n- [Repository Guide](#repository-guide)\n- [Engineering Principles](#engineering-principles)\n- [License](#license)\n\n---\n\n## Overview\n\nMost internal chatbots stop at \"upload files and ask questions.\" This project is intentionally built for a more demanding use case:\n\n- multiple tenants on shared infrastructure\n- strict tenant isolation\n- stateless chat flows\n- integration into tenant software through a familiar API\n- provider-aware retrieval and generation\n- operational visibility for usage, quota, and model behavior\n\nThe result is a platform that is useful not just as a demo chatbot, but as a foundation for embedding AI assistance inside real business software.\n\n---\n\n## Key Capabilities\n\n### Multi-tenant by design\n- tenant-scoped documents\n- tenant-scoped usage and quota\n- tenant-scoped instructions and welcome messages\n- tenant-scoped API keys\n\n### Stateless chat\n- no product dependency on persisted chat sessions\n- frontend holds recent transcript in memory only\n- backend receives recent `messages`, injects tenant instruction and retrieved context, then answers\n- premium glassmorphism chat interface for smooth testing\n\n### OpenAI-compatible public API\n- easy integration for tenant applications\n- compatible mental model for existing AI clients and internal tooling\n\n### Hybrid retrieval pipeline\n- Qdrant-backed search\n- PostgreSQL section hydration\n- reranking with NVIDIA NIM by default\n- adaptive rerank skipping for short, obvious queries to save latency and token cost\n\n### Admin-first operations\n- platform-wide tenant management\n- tenant-scoped document operations\n- API key management\n- usage and spend visibility\n- provider/runtime configuration through the webapp\n\n### Self-hosted deployment\n- Docker Compose topology\n- object storage, vector store, queue/cache, reverse proxy, and web UI included\n\n---\n\n## Why This Architecture\n\nThis repository intentionally favors boundaries that scale operationally:\n\n- **Browser -\u003e `/api/bep/*` -\u003e Next.js proxy -\u003e FastAPI**\n  - browser code never holds backend bearer tokens\n- **Route -\u003e Service -\u003e Repository**\n  - HTTP handling, business logic, and data access stay separated\n- **Tenant ID as the primary boundary**\n  - avoids reintroducing legacy user-owned document assumptions\n- **Stateless chat**\n  - simplifies product behavior and reduces persistence complexity\n- **Provider-aware runtime**\n  - 9Router for LLM access\n  - Docker Model Runner for local embeddings\n  - NVIDIA NIM as the reranker happy path\n\n---\n\n## System Architecture\n\n```mermaid\nflowchart LR\n    A[Browser / Tenant App] --\u003e B[Next.js Webapp]\n    B --\u003e C[/api/bep/* Proxy]\n    C --\u003e D[FastAPI Backend]\n\n    D --\u003e E[PostgreSQL]\n    D --\u003e F[Redis]\n    D --\u003e G[Qdrant]\n    D --\u003e H[RustFS]\n    D --\u003e I[9Router]\n    D --\u003e J[Docker Model Runner]\n    D --\u003e K[NVIDIA NIM]\n\n    L[Celery Workers] --\u003e E\n    L --\u003e F\n    L --\u003e G\n    L --\u003e H\n    L --\u003e J\n```\n\n### Internal request flow\n\n```text\nBrowser -\u003e Next.js Webapp (Cloudflare Pages) -\u003e /api/bep/* -\u003e Next.js Route Handler -\u003e FastAPI\n```\n\n### Public integration flow\n\n```text\nTenant Software -\u003e OpenAI-compatible API -\u003e FastAPI -\u003e Retrieval + LLM orchestration\n```\n\n---\n\n## Technology Stack\n\n### Application Layer\n- **Frontend:** Next.js 16\n- **UI:** shadcn/ui + Base UI primitives\n- **Backend:** FastAPI\n- **Workers:** Celery\n\n### Data and Infrastructure\n- **Primary database:** PostgreSQL\n- **Vector database:** Qdrant\n- **Cache / queue:** Redis\n- **Object storage:** RustFS (S3-compatible)\n- **Reverse proxy:** Traefik\n\n### AI Runtime\n- **LLM gateway:** 9Router\n- **Default embedding runtime:** Docker Model Runner\n- **Default reranker:** NVIDIA NIM\n- **Local reranker:** optional fallback path\n\n### Retrieval / AI Libraries\n- **LlamaIndex**\n- **qdrant-client**\n- **FastEmbed**\n\n---\n\n## Product Model\n\n### Roles\n\n#### `platform_admin`\n- creates tenants\n- provisions tenant admin accounts\n- manages tenant API keys\n- uploads and manages tenant documents\n- tests chat inside tenant scope\n- reviews cross-tenant usage and spend\n\n#### `tenant_admin`\n- views tenant documents\n- tests chat in tenant scope\n- views tenant usage and quota\n- edits tenant-specific chatbot settings and instructions\n- cannot manage platform-wide resources\n\n### Chat Model\n\nThe product uses **stateless chat**:\n\n- no persisted `chat_sessions` / `chat_messages` product flow\n- no legacy session sidebar dependency\n- transcript lives in frontend memory while the chat stays open\n- backend only needs recent `messages` plus tenant context\n\n### Tenant Integration Model\n\nTenant applications typically need only:\n\n- `base_url`\n- `api_key`\n- `model`\n- `messages`\n\n---\n\n## Retrieval Pipeline\n\nAt a high level:\n\n1. accept the latest user query\n2. enforce tenant boundary\n3. run hybrid retrieval in Qdrant\n4. hydrate top sections from PostgreSQL\n5. rerank when useful\n6. build final generation context\n7. call the LLM through 9Router\n\n### Notable implementation details\n\n- payload-indexed tenant/document/section metadata in Qdrant\n- latest-query retrieval by default\n- chat history used for LLM context, not as default RAG expansion\n- adaptive rerank skipping for short, high-confidence queries\n- usage and cost tracking across LLM, embedding, and reranker calls\n- SSE-based streaming for chat and ingestion progress\n\n---\n\n## Public API Example\n\n```http\nPOST /v1/chat/completions\nAuthorization: Bearer \u003ctenant_api_key\u003e\nContent-Type: application/json\n```\n\n```json\n{\n  \"model\": \"chatbot-rag\",\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": \"How do I create a warehouse receipt?\"\n    }\n  ],\n  \"stream\": true,\n  \"temperature\": 0.2,\n  \"max_tokens\": 1024\n}\n```\n\n---\n\n### Quick Start\n\n**Backend (API):**\n```bash\ncd chatbot-api\ncp .env.example .env\ndocker compose build\ndocker compose up -d\n```\n\n**Frontend (Webapp):**\n```bash\ncd chatbot-webapp\nnpm install\nnpm run dev\n```\n\n### Useful endpoints\n\n- **Web app (Local):** `http://localhost:3000`\n- **Backend API:** `https://api.qtuanph.dev/v1/health`\n- **Qdrant dashboard:** `http://localhost:6333/dashboard`\n- **9Router:** `http://localhost:2908`\n- **Traefik dashboard:** `http://localhost:8080`\n\n---\n\n## Operational Notes\n\n- Chat uses **SSE** for response streaming\n- Document ingestion progress also uses **SSE**\n- The current stack is better aligned with real deployment than single-machine demos\n- Throughput at scale still depends on:\n  - LLM provider capacity\n  - embedding throughput\n  - reranking throughput\n  - worker concurrency\n  - Redis / PostgreSQL / Qdrant sizing\n\nIf your target is production traffic rather than local demo load, scale planning should focus on `api`, `workers`, `ai-proxy`, and retrieval/runtime capacity rather than frontend-only tuning.\n\n---\n\n## Repository Guide\n\nIf you are contributing or maintaining the project, start here:\n\n| Topic | File |\n|---|---|\n| Project guardrails | `AGENTS.md` |\n| Architecture | `docs/1_ARCHITECTURE.md` |\n| Workflows | `docs/2_WORKFLOWS.json` |\n| API contracts | `docs/3_API_CONTRACTS.md` |\n| Deployment | `docs/4_DEPLOYMENT.md` |\n| Runtime snapshot | `docs/7_CURRENT_SETTINGS.json` |\n\n---\n\n## Engineering Principles\n\n- strict tenant isolation\n- stateless chat by default\n- route -\u003e service -\u003e repository separation\n- no backend bearer token in browser code\n- synchronized code and documentation changes\n- no hardcoded \"pass-the-bug\" fixes\n\n---\n\n## License\n\nLicensed under **AGPL-3.0**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqtuanph%2Fchatbot-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqtuanph%2Fchatbot-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqtuanph%2Fchatbot-rag/lists"}