{"id":30622475,"url":"https://github.com/e1washere/production-rag-service","last_synced_at":"2026-05-14T21:32:16.213Z","repository":{"id":311617074,"uuid":"1044308980","full_name":"e1washere/production-rag-service","owner":"e1washere","description":"Production-grade RAG service demonstrating enterprise MLOps practices with hybrid search, comprehensive observability, and automated deployment pipelines.","archived":false,"fork":false,"pushed_at":"2025-08-27T14:07:19.000Z","size":96,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-15T09:27:42.269Z","etag":null,"topics":["ai","azure","bm25","embeddings","faiss","fastapi","github-actions","hybrid-search","llm","mlops","observability","rag","redis","terraform","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/e1washere.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-25T13:43:56.000Z","updated_at":"2025-08-27T14:07:22.000Z","dependencies_parsed_at":"2025-08-25T15:42:50.910Z","dependency_job_id":"b45bf16f-5ed8-4df3-918d-0ec1185cacdf","html_url":"https://github.com/e1washere/production-rag-service","commit_stats":null,"previous_names":["e1washere/production-rag-service"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/e1washere/production-rag-service","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e1washere%2Fproduction-rag-service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e1washere%2Fproduction-rag-service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e1washere%2Fproduction-rag-service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e1washere%2Fproduction-rag-service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/e1washere","download_url":"https://codeload.github.com/e1washere/production-rag-service/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e1washere%2Fproduction-rag-service/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279020348,"owners_count":26086866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","azure","bm25","embeddings","faiss","fastapi","github-actions","hybrid-search","llm","mlops","observability","rag","redis","terraform","testing"],"created_at":"2025-08-30T15:41:06.741Z","updated_at":"2025-10-14T18:38:12.620Z","avatar_url":"https://github.com/e1washere.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Production RAG Service\n\n[![CI/CD](https://github.com/e1washere/production-rag-service/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/e1washere/production-rag-service/actions)\n[![Test Coverage](https://img.shields.io/badge/coverage-85%25-brightgreen)](https://github.com/e1washere/production-rag-service)\n[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![RAGAS Score](https://img.shields.io/badge/RAGAS-0.87-brightgreen)](https://github.com/e1washere/production-rag-service/actions)\n[![Nightly Eval](https://img.shields.io/badge/Nightly%20Eval-passing-brightgreen)](https://github.com/e1washere/production-rag-service/actions)\n\nA production-grade Retrieval-Augmented Generation (RAG) service demonstrating enterprise MLOps practices, hybrid search capabilities, and comprehensive observability patterns.\n\n## Overview\n\nThis project implements a scalable RAG pipeline with hybrid retrieval (BM25 + dense embeddings), intelligent caching, and production-ready deployment patterns. Designed to showcase modern MLOps practices including infrastructure as code, automated testing, monitoring, and safe deployment strategies.\n\n## Features\n\n- **Hybrid Retrieval**: BM25 + sentence transformers with optional cross-encoder reranking\n- **Intelligent Caching**: Redis-based caching with TTL and versioning\n- **Production Observability**: Structured logging, distributed tracing, and comprehensive metrics\n- **Cost Optimization**: Token usage tracking and cost-aware request handling\n- **Resilience Patterns**: Retries, circuit breakers, and rate limiting\n- **Safe Deployments**: Canary deployment with automatic rollback capabilities\n- **Comprehensive Testing**: Unit, integration, and end-to-end test coverage\n- **Automated Evaluation**: RAGAS-based offline evaluation with SLO monitoring\n\n## Architecture\n\n```mermaid\ngraph TB\n    subgraph \"Client Layer\"\n        C[Client Applications]\n    end\n    \n    subgraph \"API Gateway\"\n        API[FastAPI Application]\n        API --\u003e |/query| RAG[RAG Pipeline]\n        API --\u003e |/healthz| HC[Health Check]\n        API --\u003e |/metrics| MET[Metrics]\n    end\n    \n    subgraph \"RAG Pipeline\"\n        RAG --\u003e RET[Hybrid Retriever]\n        RAG --\u003e GEN[LLM Generator]\n        RET --\u003e BM25[BM25 Search]\n        RET --\u003e EMB[Embedding Search]\n        RET --\u003e RER[Cross-Encoder Rerank]\n    end\n    \n    subgraph \"Data Layer\"\n        BM25 --\u003e FAISS[FAISS Index]\n        EMB --\u003e FAISS\n        RER --\u003e FAISS\n        GEN --\u003e CACHE[Redis Cache]\n        RET --\u003e CACHE\n    end\n    \n    subgraph \"External Services\"\n        GEN --\u003e LLM[LLM Provider]\n        LLM --\u003e |OpenAI| OAI[OpenAI API]\n        LLM --\u003e |Anthropic| ANT[Anthropic API]\n        LLM --\u003e |Mock| MOCK[Mock Provider]\n    end\n    \n    subgraph \"Observability\"\n        API --\u003e LANG[Langfuse]\n        API --\u003e AI[App Insights]\n        API --\u003e PROM[Prometheus]\n        API --\u003e LOG[Structured Logs]\n    end\n    \n    subgraph \"Infrastructure\"\n        API --\u003e ACA[Azure Container Apps]\n        ACA --\u003e ACR[Azure Container Registry]\n        ACA --\u003e KV[Azure Key Vault]\n        ACA --\u003e REDIS[Azure Redis Cache]\n    end\n    \n    C --\u003e API\n```\n\n## Tech Stack\n\n### Backend\n- **Python 3.11** with type hints and async support\n- **FastAPI** for high-performance API framework\n- **Uvicorn** for ASGI server\n- **Pydantic** for data validation and settings management\n\n### Machine Learning\n- **Sentence Transformers** for dense embeddings\n- **FAISS** for vector similarity search\n- **Rank BM25** for sparse retrieval\n- **Cross-Encoder** for reranking (optional)\n\n### Infrastructure\n- **Azure Container Apps** for serverless container deployment\n- **Terraform** for infrastructure as code\n- **Azure Container Registry** for image storage\n- **Redis Cache** for distributed caching\n\n### Observability\n- **Langfuse** for LLM observability and tracing\n- **Azure Application Insights** for application monitoring\n- **Prometheus** for metrics collection\n- **Structured JSON logging** with correlation IDs\n\n### CI/CD \u0026 Testing\n- **GitHub Actions** for automated workflows\n- **Pytest** for comprehensive testing\n- **RAGAS** for RAG evaluation\n- **Code quality tools**: ruff, black, mypy\n\n## Demo\n\n### API Request/Response Example\n\n```bash\n# Query the RAG service\ncurl -X POST \"https://rag-service.azurecontainerapps.io/query\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"question\": \"What are the key principles of machine learning?\",\n    \"top_k\": 3,\n    \"enable_rerank\": true\n  }'\n```\n\n**Response:**\n```json\n{\n  \"answer\": \"Machine learning is based on several key principles: 1) Learning from data through pattern recognition, 2) Generalization to unseen examples, 3) Optimization of performance metrics, and 4) Iterative improvement through feedback loops. The process involves training algorithms on historical data to make predictions or decisions without being explicitly programmed for specific tasks.\",\n  \"contexts\": [\n    {\n      \"content\": \"Machine learning algorithms learn patterns from data to make predictions or decisions without explicit programming. Key principles include supervised learning, unsupervised learning, and reinforcement learning approaches.\",\n      \"score\": 0.95,\n      \"source\": \"ml_fundamentals.pdf\",\n      \"metadata\": {\n        \"chunk_id\": \"chunk_001\",\n        \"page\": 15\n      }\n    },\n    {\n      \"content\": \"The generalization principle ensures that ML models perform well on unseen data, not just the training set. This is achieved through techniques like cross-validation and regularization.\",\n      \"score\": 0.87,\n      \"source\": \"ml_fundamentals.pdf\",\n      \"metadata\": {\n        \"chunk_id\": \"chunk_002\",\n        \"page\": 23\n      }\n    }\n  ],\n  \"metadata\": {\n    \"latency_ms\": 245,\n    \"tokens_used\": 150,\n    \"cache_hit\": false,\n    \"retrieval_method\": \"hybrid\",\n    \"rerank_used\": true,\n    \"trace_id\": \"trace_abc123\"\n  }\n}\n```\n\n### Service Health Check\n\n```bash\n# Check service health\ncurl \"https://rag-service.azurecontainerapps.io/healthz\"\n```\n\n**Response:**\n```json\n{\n  \"status\": \"ok\",\n  \"timestamp\": \"2024-01-15T10:30:00Z\",\n  \"version\": \"1.0.0\",\n  \"uptime_seconds\": 86400\n}\n```\n\n## Setup \u0026 Usage\n\n### Prerequisites\n- Python 3.11+\n- Docker\n- Azure CLI (for deployment)\n- Redis instance\n\n### Local Development\n\n```bash\n# Clone repository\ngit clone https://github.com/e1washere/production-rag-service.git\ncd production-rag-service\n\n# Install dependencies\npip install -r requirements.txt\n\n# Set environment variables\ncp .env.example .env\n# Edit .env with your configuration\n\n# Run locally\nmake up\n\n# Build and serve index\nmake index\n```\n\n### Production Deployment\n\n```bash\n# Deploy to Azure Container Apps\naz containerapp up \\\n  --name rag-service \\\n  --resource-group rag-service-rg \\\n  --environment rag-service-env \\\n  --image ragserviceregistry.azurecr.io/rag-service:latest\n\n# Canary deployment\n./scripts/canary-deploy.sh 10  # 10% traffic\n```\n\n## Testing \u0026 Evaluation\n\n### Test Coverage\n```bash\n# Run all tests\npytest --cov=app --cov-report=html\n\n# Run specific test suites\npytest tests/test_api.py -v\npytest tests/test_rag_pipeline.py -v\npytest tests/test_ops.py -v\n```\n\n### RAGAS Evaluation\n```bash\n# Run offline evaluation\npython eval/run_eval.py\n\n# View evaluation report\ncat eval/results/ragas_report.md\n```\n\n### Performance Benchmarks\n- **Latency**: P95 \u003c 1.2s\n- **Hit Rate**: HR@3 ≥ 0.85, HR@5 ≥ 0.92\n- **Availability**: 99.9% uptime\n- **Error Rate**: \u003c 0.5% 5xx errors\n\n## Monitoring \u0026 Logging\n\n### Metrics Endpoints\n- `/metrics` - Prometheus metrics\n- `/healthz` - Health check\n- `/stats` - Service statistics\n\n### Logging Format\n```json\n{\n  \"timestamp\": \"2024-01-15T10:30:00Z\",\n  \"level\": \"INFO\",\n  \"request_id\": \"req-123\",\n  \"trace_id\": \"trace-456\",\n  \"message\": \"Query processed\",\n  \"latency_ms\": 245,\n  \"tokens_used\": 150,\n  \"cost_cents\": 0.03\n}\n```\n\n### Observability Stack\n- **Langfuse**: LLM traces and prompt versioning\n- **Application Insights**: Application performance monitoring\n- **Prometheus**: Custom metrics and alerting\n- **Structured Logging**: JSON-formatted logs with correlation\n\n## CI/CD Pipeline\n\n### Automated Workflows\n- **Code Quality**: Linting, formatting, type checking\n- **Testing**: Unit, integration, and end-to-end tests\n- **Security**: Dependency scanning, secret detection\n- **Deployment**: Automated deployment to Azure Container Apps\n- **Evaluation**: Nightly RAGAS evaluation with SLO validation\n\n### Deployment Strategies\n- **Blue-Green**: Zero-downtime deployments\n- **Canary**: Gradual traffic shifting with automatic rollback\n- **Rollback**: Emergency rollback procedures\n\n## API Documentation\n\n### Query Endpoint\n```bash\ncurl -X POST \"https://rag-service.azurecontainerapps.io/query\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"question\": \"What is machine learning?\",\n    \"top_k\": 5,\n    \"enable_rerank\": true\n  }'\n```\n\n### Response Format\n```json\n{\n  \"answer\": \"Machine learning is a subset of artificial intelligence...\",\n  \"contexts\": [\n    {\n      \"content\": \"Machine learning algorithms...\",\n      \"score\": 0.95,\n      \"source\": \"ml_docs.pdf\"\n    }\n  ],\n  \"metadata\": {\n    \"latency_ms\": 245,\n    \"tokens_used\": 150,\n    \"cache_hit\": false\n  }\n}\n```\n\n## Contributing\n\nWe welcome contributions to improve the RAG service. Please follow these guidelines:\n\n### Development Setup\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/amazing-feature`\n3. Install development dependencies: `pip install -r requirements-dev.txt`\n4. Run tests: `pytest`\n5. Ensure code quality: `make lint`\n\n### Pull Request Process\n1. Update documentation for any new features\n2. Add tests for new functionality\n3. Ensure all tests pass: `pytest --cov=app`\n4. Update CHANGELOG.md with your changes\n5. Submit a pull request with a clear description\n\n### Code Standards\n- Follow PEP 8 style guidelines\n- Use type hints for all functions\n- Write docstrings for public methods\n- Maintain test coverage above 80%\n\n## FAQ\n\n### Development Questions\n\n**Q: How do I add a new LLM provider?**\nA: Implement the provider interface in `app/llm_providers.py` and add configuration to settings.\n\n**Q: How do I customize the retrieval pipeline?**\nA: Modify `app/retrieval.py` to adjust BM25/dense weights, add new rerankers, or implement custom scoring.\n\n**Q: How do I add new evaluation metrics?**\nA: Extend `eval/run_eval.py` with new RAGAS metrics or custom evaluation functions.\n\n### Production Questions\n\n**Q: How do I monitor costs?**\nA: Use the `/metrics` endpoint for token usage and cost tracking. Set up alerts in Application Insights.\n\n**Q: How do I handle high traffic?**\nA: The service auto-scales based on CPU/memory. Adjust scaling rules in Azure Container Apps.\n\n**Q: How do I update the knowledge base?**\nA: Run `make index` to rebuild the FAISS index with new documents. The cache will automatically invalidate.\n\n\n### Phase 1: Enhanced Retrieval\n- [ ] Multi-modal retrieval (text + images)\n- [ ] Semantic caching improvements\n- [ ] Query expansion and reformulation\n\n### Phase 2: Advanced Features\n- [ ] Multi-agent orchestration\n- [ ] Tool calling and function execution\n- [ ] Streaming responses\n\n### Phase 3: Enterprise Features\n- [ ] Multi-tenant support\n- [ ] Advanced access controls\n- [ ] Audit logging and compliance\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe1washere%2Fproduction-rag-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fe1washere%2Fproduction-rag-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe1washere%2Fproduction-rag-service/lists"}