{"id":31284860,"url":"https://github.com/cbratkovics/chatbot-ai-system","last_synced_at":"2026-04-03T23:36:33.240Z","repository":{"id":312166900,"uuid":"1046574174","full_name":"cbratkovics/chatbot-ai-system","owner":"cbratkovics","description":"Production-grade, multi-tenant chat service with FastAPI + WebSockets, OpenAI/Anthropic orchestration, semantic caching, and K8s/IaC deployment, Includes observability (Prometheus/Grafana/Jaeger), FinOps cost tracking, and DR runbooks.","archived":false,"fork":false,"pushed_at":"2025-09-06T03:57:34.000Z","size":1257,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-06T04:21:31.257Z","etag":null,"topics":["anthropic","docker","fastapi","grafana","kubernetes","llm-orchestration","mlops","multitenancy","openai","postgresql","rag","rbac","redis","vector-search","websockets"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cbratkovics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-28T22:24:10.000Z","updated_at":"2025-09-06T03:57:38.000Z","dependencies_parsed_at":"2025-08-29T03:52:37.438Z","dependency_job_id":"80e1d8d2-2326-4ad2-82a0-9bc1a0bd0dce","html_url":"https://github.com/cbratkovics/chatbot-ai-system","commit_stats":null,"previous_names":["cbratkovics/chatbot-ai-system"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/cbratkovics/chatbot-ai-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbratkovics%2Fchatbot-ai-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbratkovics%2Fchatbot-ai-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbratkovics%2Fchatbot-ai-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbratkovics%2Fchatbot-ai-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cbratkovics","download_url":"https://codeload.github.com/cbratkovics/chatbot-ai-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbratkovics%2Fchatbot-ai-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276713254,"owners_count":25691387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","docker","fastapi","grafana","kubernetes","llm-orchestration","mlops","multitenancy","openai","postgresql","rag","rbac","redis","vector-search","websockets"],"created_at":"2025-09-24T07:34:02.510Z","updated_at":"2025-12-30T19:49:49.715Z","avatar_url":"https://github.com/cbratkovics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-Tenant AI Chat Platform\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![Node 20+](https://img.shields.io/badge/node-20+-green.svg)](https://nodejs.org/)\n[![FastAPI](https://img.shields.io/badge/FastAPI-0.104+-green.svg)](https://fastapi.tiangolo.com/)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n[![CI Pipeline](https://github.com/cbratkovics/chatbot-ai-system/actions/workflows/ci.yml/badge.svg)](https://github.com/cbratkovics/chatbot-ai-system/actions)\n[![codecov](https://codecov.io/gh/cbratkovics/chatbot-ai-system/branch/main/graph/badge.svg)](https://codecov.io/gh/cbratkovics/chatbot-ai-system)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)\n\n## 🌐 Live Demo\n\n**Try it now:** [chatbot-ai-system.vercel.app](https://chatbot-ai-system.vercel.app)\n\n\u003e Demo instance uses limited API quotas. For full features, deploy your own instance following the Quick Start below.\n\n---\n\n## Overview\n\nProduction-ready multi-tenant AI chatbot platform with intelligent LLM orchestration, WebSocket streaming, and reliable failover patterns. Built for performance and cost efficiency through semantic caching and provider redundancy.\n\n**Built as a reusable template** - Easily customize for different use cases (customer support, code assistant, education, etc.) with pre-configured templates.\n\n---\n\n## What This Project Demonstrates\n\nThis project showcases production-grade LLMOps and AI engineering skills:\n\n| Skill | Implementation | Location |\n|-------|---------------|----------|\n| **Multi-Provider Orchestration** | Unified interface for OpenAI, Anthropic, Llama, Gemini with intelligent routing | [`src/chatbot_ai_system/orchestration/`](src/chatbot_ai_system/orchestration/) |\n| **Semantic Caching** | Redis-backed semantic similarity caching (~73% hit rate) | [`src/chatbot_ai_system/cache/`](src/chatbot_ai_system/cache/) |\n| **WebSocket Streaming** | Real-time token streaming with ~186ms P95 latency | [`src/chatbot_ai_system/websocket/`](src/chatbot_ai_system/websocket/) |\n| **Multi-Tenancy \u0026 Auth** | Tenant isolation, JWT authentication, rate limiting | [`src/chatbot_ai_system/middleware/`](src/chatbot_ai_system/middleware/) |\n| **Observability** | Prometheus, Grafana, Jaeger distributed tracing | [`monitoring/`](monitoring/) |\n| **Infrastructure as Code** | Kubernetes manifests, Docker Compose, CI/CD | [`infrastructure/`](infrastructure/), [`k8s/`](k8s/) |\n| **Template Architecture** | Reusable configurations for multiple use cases | [`use-cases/`](use-cases/) |\n\n**[View full architecture documentation →](docs/architecture/ARCHITECTURE.md)**\n\n---\n\n## Key Features\n\n- **Multi-Provider Orchestration**: Intelligent routing between OpenAI, Anthropic, Llama, and Gemini with automatic failover\n- **WebSocket Streaming**: Token-by-token streaming with ~186ms P95 latency (local benchmarks)\n- **Cost Optimization**: Semantic caching achieving ~73% hit rate and ~70% cost reduction\n- **Production Patterns**: Circuit breakers, rate limiting, health monitoring, and comprehensive observability\n- **Multi-Tenancy Support**: Complete tenant isolation with usage tracking and horizontal scaling\n- **Template-Ready**: Pre-configured use cases (customer support, code assistant) for rapid deployment\n\n---\n\n## Verified Performance Metrics (Local Synthetic Benchmarks)\n\n| Metric | Target | Achieved | Evidence |\n|--------|--------|----------|----------|\n| **P95 Latency** | \u003c 200ms | **~186ms** | [`benchmark_summary.json`](benchmarks/results/benchmark_summary.json) |\n| **P99 Latency** | \u003c 300ms | **~245ms** | [`benchmark_summary.json`](benchmarks/results/benchmark_summary.json) |\n| **Throughput** | 400+ RPS | **~250 RPS** | [`benchmark_summary.json`](benchmarks/results/benchmark_summary.json) |\n| **Cache Hit Rate** | ≥ 60% | **~73%** | [`cache_metrics_latest.json`](benchmarks/results/cache_metrics_latest.json) |\n| **Cost Reduction** | ≥ 30% | **~70-73%** | [`cache_metrics_latest.json`](benchmarks/results/cache_metrics_latest.json) |\n| **Provider Failover** | \u003c 500ms | **~463ms** | [`benchmark_summary.json`](benchmarks/results/benchmark_summary.json) |\n| **WebSocket Sessions** | 100+ | **~100** | [`benchmark_summary.json`](benchmarks/results/benchmark_summary.json) |\n\n\u003e **Note**: Results are from local synthetic benchmarks on developer hardware, not production SLAs.\n\n**Run benchmarks yourself:** `python benchmarks/run_all_benchmarks.py`\n\n---\n\n## 🚀 Quick Start\n\n### Docker Compose (Recommended)\n\nThe fastest way to get started:\n\n```bash\n# 1. Clone and configure\ngit clone https://github.com/cbratkovics/chatbot-ai-system.git\ncd chatbot-ai-system\ncp .env.example .env\n# Add your API keys to .env\n\n# 2. Start all services\ndocker compose up -d\n\n# 3. Access the application\n# Frontend:  http://localhost:3000\n# API Docs:  http://localhost:8000/docs\n# Health:    http://localhost:8000/health\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAlternative: Local Development (Poetry + npm)\u003c/strong\u003e\u003c/summary\u003e\n\nFor active development with hot reload:\n\n```bash\n# Backend\npoetry install\ncp .env.example .env\n# Add your API keys to .env\npoetry run uvicorn chatbot_ai_system.server.main:app --reload\n\n# Frontend (new terminal)\ncd frontend\nnpm ci\ncp .env.example .env.local\n# Configure API URLs in .env.local\nnpm run dev\n```\n\n**Access:**\n- Frontend: http://localhost:3000\n- Backend API: http://localhost:8000\n- API Docs: http://localhost:8000/docs\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eTemplate Mode: Use Case Quick Start\u003c/strong\u003e\u003c/summary\u003e\n\nDeploy a pre-configured chatbot for specific use cases:\n\n```bash\n# Example: Customer Support Template\ncp use-cases/customer-support/.env.example .env\ncp use-cases/customer-support/system-prompt.txt src/chatbot_ai_system/config/\n\n# Customize branding in .env\n# Then start with docker compose up -d\n```\n\n**Available Templates:**\n- [`customer-support/`](use-cases/customer-support/) - Professional customer service assistant\n- More templates coming soon!\n\nSee [`use-cases/`](use-cases/) for template documentation.\n\n\u003c/details\u003e\n\n---\n\n## Architecture\n\n```mermaid\nflowchart TB\n    subgraph \"Client Layer\"\n        UI[Next.js UI]\n        WS[WebSocket Client]\n        REST[REST Client]\n    end\n\n    subgraph \"API Gateway\"\n        LB[Load Balancer]\n        ASGI[FastAPI Server]\n    end\n\n    subgraph \"Core Services\"\n        MW[Middleware Stack]\n        ORCH[Provider Orchestrator]\n        CACHE[Semantic Cache]\n    end\n\n    subgraph \"Providers\"\n        OAI[OpenAI API]\n        ANTH[Anthropic API]\n        LLAMA[Meta Llama]\n        GEM[Google Gemini]\n    end\n\n    subgraph \"Storage\"\n        REDIS[(Redis Cache)]\n        PG[(PostgreSQL)]\n    end\n\n    subgraph \"Observability\"\n        PROM[Prometheus]\n        GRAF[Grafana]\n        TRACE[Jaeger]\n    end\n\n    UI --\u003e LB\n    WS --\u003e LB\n    REST --\u003e LB\n    LB --\u003e ASGI\n    ASGI --\u003e MW\n    MW --\u003e ORCH\n    MW --\u003e CACHE\n    ORCH --\u003e OAI\n    ORCH --\u003e ANTH\n    ORCH --\u003e LLAMA\n    ORCH --\u003e GEM\n    CACHE --\u003e REDIS\n    MW --\u003e PG\n    ASGI --\u003e PROM\n    PROM --\u003e GRAF\n    ASGI --\u003e TRACE\n\n    style UI fill:#e1f5fe\n    style ASGI fill:#c8e6c9\n    style ORCH fill:#ffccbc\n    style REDIS fill:#ffecb3\n    style PROM fill:#f8bbd0\n```\n\n---\n\n## Configuration\n\n### Environment Variables\n\n```env\n# Required API Keys\nOPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\n\n# Infrastructure\nREDIS_URL=redis://localhost:6379/0\nDATABASE_URL=postgresql://user:pass@localhost/chatbot\n\n# Performance Tuning\nRATE_LIMIT_REQUESTS=100\nCACHE_TTL_SECONDS=3600\nSEMANTIC_CACHE_THRESHOLD=0.85\nREQUEST_TIMEOUT=30\n\n# Feature Flags\nENABLE_STREAMING=true\nENABLE_FAILOVER=true\nENABLE_SEMANTIC_CACHE=true\n\n# Frontend Configuration (in frontend/.env.local)\nNEXT_PUBLIC_API_URL=http://localhost:8000\nNEXT_PUBLIC_WS_URL=ws://localhost:8000/ws\nNEXT_PUBLIC_APP_NAME=\"AI Chat System\"\n```\n\n**Full configuration guide:** [`docs/CONFIGURATION.md`](docs/CONFIGURATION.md) (if exists)\n\n---\n\n## Production Deployment\n\nThis project is production-ready and can be deployed to Vercel + Render in under 30 minutes.\n\n### Quick Deploy to Vercel + Render (Recommended)\n\n**Infrastructure:**\n- **Vercel**: Next.js frontend hosting (Free tier)\n- **Render**: FastAPI backend + Redis cache ($14/month)\n- **Total Cost**: $14/month + AI API usage\n\n**Steps:**\n\n1. **Deploy Backend to Render:**\n   - Connect your GitHub repository to Render\n   - Render auto-detects `render.yaml` configuration\n   - Set environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY)\n   - Deploy Redis instance ($7/month)\n\n2. **Deploy Frontend to Vercel:**\n   ```bash\n   cd frontend\n   vercel --prod\n   ```\n   - Set environment variables in Vercel dashboard:\n     - `NEXT_PUBLIC_API_URL`: Your Render backend URL\n     - `NEXT_PUBLIC_WS_URL`: Your Render WebSocket URL\n\n3. **Update CORS:**\n   - Add your Vercel domain to `CORS_ORIGINS` in Render dashboard\n\n**Documentation:**\n- Full deployment guide: [`docs/PRODUCTION_DEPLOYMENT.md`](docs/PRODUCTION_DEPLOYMENT.md)\n- Production checklist: [`docs/DEPLOYMENT_CHECKLIST.md`](docs/DEPLOYMENT_CHECKLIST.md)\n\n**Production URLs** (after deployment):\n- Frontend: `https://your-app.vercel.app`\n- Backend API: `https://your-backend.onrender.com`\n- API Docs: `https://your-backend.onrender.com/docs`\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAlternative: Docker Deployment\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\n# Build production image\ndocker build -f docker/dockerfiles/Dockerfile.production -t chatbot-ai-system:latest .\n\n# Run with production compose\ndocker compose -f docker-compose.prod.yml up -d\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAlternative: Kubernetes Deployment\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\n# Apply Kubernetes configurations\nkubectl apply -f k8s/namespace.yaml\nkubectl apply -f k8s/configmap.yaml\nkubectl apply -f k8s/deployment.yaml\nkubectl apply -f k8s/service.yaml\nkubectl apply -f k8s/ingress.yaml\n```\n\n**Kubernetes documentation:** [`docs/kubernetes/README.md`](docs/kubernetes/README.md) (if exists)\n\n\u003c/details\u003e\n\n### Scaling Considerations\n\n- **Horizontal Scaling**: Stateless design supports multiple replicas\n- **Database**: PostgreSQL with read replicas for high availability\n- **Cache**: Redis Cluster for distributed caching\n- **Load Balancing**: Nginx or cloud load balancer\n- **Monitoring**: Prometheus + Grafana dashboards included\n\n---\n\n## Testing \u0026 Validation\n\n```bash\n# Run all quality checks\nmake lint          # Code linting with ruff\nmake type-check    # Type checking with mypy\nmake test          # Unit tests with pytest\nmake test-cov      # Tests with coverage report\n\n# Individual test suites\npoetry run pytest tests/unit -v           # Unit tests\npoetry run pytest tests/integration -v    # Integration tests\npoetry run pytest tests/e2e -v           # End-to-end tests\n\n# Load testing\nk6 run benchmarks/load_tests/k6_api_test.js\nk6 run benchmarks/load_tests/k6_websocket_test.js\n\n# Verify benchmark claims\npython benchmarks/verify_metrics.py\n```\n\n**CI/CD:** All tests run automatically on pull requests via [GitHub Actions](.github/workflows/ci.yml)\n\n---\n\n## Monitoring \u0026 Observability\n\n### Metrics Collection\n- **Prometheus**: Application and system metrics\n- **Grafana**: Real-time dashboards and alerts\n- **Jaeger**: Distributed tracing for request flows\n\n### Key Metrics Tracked\n- Request latency (P50, P95, P99)\n- Provider availability and failover events\n- Cache hit rates and cost savings\n- Token usage and rate limiting\n- WebSocket connection metrics\n\n**Access monitoring:**\n- Prometheus: `http://localhost:9090`\n- Grafana: `http://localhost:3001`\n- Jaeger: `http://localhost:16686`\n\n---\n\n## Security Features\n\n- **Authentication**: JWT-based with refresh tokens\n- **Rate Limiting**: Token bucket algorithm per tenant\n- **Input Validation**: Pydantic models with strict validation\n- **Secrets Management**: Environment-based configuration\n- **CORS Protection**: Configurable origin restrictions\n- **Content Filtering**: Optional content moderation\n\n**Security documentation:** [`docs/security/SECURITY.md`](docs/security/SECURITY.md)\n\n---\n\n## Technology Stack\n\n### Backend\n- **Framework**: FastAPI 0.104+ (async Python 3.12+)\n- **LLM Providers**: OpenAI, Anthropic, Meta Llama, Google Gemini\n- **Caching**: Redis with semantic similarity\n- **Database**: PostgreSQL with SQLAlchemy ORM\n- **Message Queue**: Redis Streams\n\n### Frontend\n- **Framework**: Next.js 14 (App Router)\n- **Language**: TypeScript\n- **UI**: Tailwind CSS + shadcn/ui components\n- **State Management**: React Context + Hooks\n- **WebSocket**: Native WebSocket API\n\n### Infrastructure\n- **Containerization**: Docker, Docker Compose\n- **Orchestration**: Kubernetes-ready\n- **CI/CD**: GitHub Actions\n- **Monitoring**: Prometheus, Grafana, Jaeger\n- **Deployment**: Vercel (frontend) + Render (backend)\n\n---\n\n## Project Structure\n\n```\n├── src/chatbot_ai_system/    # Backend application\n│   ├── server/               # FastAPI app and routes\n│   ├── providers/            # LLM provider implementations\n│   ├── orchestration/        # Routing and failover logic\n│   ├── cache/                # Semantic caching system\n│   ├── middleware/           # Auth, rate limiting, tracing\n│   ├── websocket/            # WebSocket handlers\n│   └── config/               # Configuration management\n├── frontend/                 # Next.js frontend\n│   ├── app/                  # Next.js 14 app directory\n│   ├── components/           # React components\n│   └── config/               # Frontend configuration\n├── use-cases/                # Pre-configured templates\n│   └── customer-support/     # Customer support template\n├── benchmarks/               # Performance testing suite\n│   ├── results/              # Benchmark results\n│   └── load_tests/           # k6 load tests\n├── tests/                    # Test suites\n│   ├── unit/                 # Unit tests\n│   ├── integration/          # Integration tests\n│   └── e2e/                  # End-to-end tests\n├── docs/                     # Documentation\n│   ├── architecture/         # Architecture docs\n│   ├── security/             # Security docs\n│   └── deployment/           # Deployment guides\n├── docker/                   # Docker configurations\n│   ├── dockerfiles/          # Dockerfile variants\n│   └── compose/              # Docker Compose files\n├── k8s/                      # Kubernetes manifests\n├── infrastructure/           # IaC and deployment configs\n└── monitoring/               # Monitoring configurations\n```\n\n---\n\n## Contributing\n\nWe welcome contributions! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct, development process, and how to submit pull requests.\n\n**Key areas for contribution:**\n- New AI provider integrations\n- Additional use-case templates\n- Performance optimizations\n- Documentation improvements\n- Bug fixes and feature requests\n\n**Community standards:**\n- [Code of Conduct](CODE_OF_CONDUCT.md)\n- [Security Policy](SECURITY.md)\n\n---\n\n## Acknowledgments\n\nBuilt with excellent open-source tools:\n\n- [FastAPI](https://fastapi.tiangolo.com/) - Modern Python web framework\n- [Next.js](https://nextjs.org/) - React framework for production\n- [Redis](https://redis.io/) - In-memory data structure store\n- [PostgreSQL](https://www.postgresql.org/) - Robust relational database\n- [Prometheus](https://prometheus.io/) \u0026 [Grafana](https://grafana.com/) - Monitoring stack\n- OpenAI \u0026 Anthropic for powerful LLM APIs\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Contact\n\n**Christopher J. Bratkovics**\n- **LinkedIn:** [linkedin.com/in/cbratkovics](https://linkedin.com/in/cbratkovics)\n- **Portfolio:** [cbratkovics.dev](https://cbratkovics.dev)\n- **GitHub:** [@cbratkovics](https://github.com/cbratkovics)\n\n---\n\n## Project Stats\n\n- **Lines of Code**: ~15,000+\n- **Test Coverage**: 85%+\n- **Docker Images**: Backend, Frontend, Monitoring Stack\n- **Supported Providers**: OpenAI, Anthropic, Meta Llama, Google Gemini\n- **Performance**: \u003c200ms P95 latency, 100+ concurrent WebSocket connections\n- **Production-Ready**: Deployed and tested in production environments\n\n---\n\n⭐ **Star this repo** if you find it useful!\n\nBuilt with ❤️ for production AI systems\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcbratkovics%2Fchatbot-ai-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcbratkovics%2Fchatbot-ai-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcbratkovics%2Fchatbot-ai-system/lists"}