{"id":31608666,"url":"https://github.com/bielacki/cinerag","last_synced_at":"2026-04-05T08:32:17.278Z","repository":{"id":318198773,"uuid":"1070325498","full_name":"bielacki/cinerag","owner":"bielacki","description":"🎬 Movie discovery RAG system using natural language queries. Features hybrid search (BM25+Vector), reranking, LLM integration, and Grafana monitoring.","archived":false,"fork":false,"pushed_at":"2025-10-05T17:56:50.000Z","size":11359,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-03T15:30:24.928Z","etag":null,"topics":["elasticsearch","fastapi","grafana","hybrid-search","llm","machine-learning","movie-recommendation-app","nlp","prometheus","qdrant","rag","retrieval-augmented-generation","streamlit","tmdb","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bielacki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T17:51:34.000Z","updated_at":"2025-10-05T18:00:37.000Z","dependencies_parsed_at":"2025-10-05T19:35:58.236Z","dependency_job_id":null,"html_url":"https://github.com/bielacki/cinerag","commit_stats":null,"previous_names":["bielacki/cinerag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bielacki/cinerag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bielacki%2Fcinerag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bielacki%2Fcinerag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bielacki%2Fcinerag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bielacki%2Fcinerag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bielacki","download_url":"https://codeload.github.com/bielacki/cinerag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bielacki%2Fcinerag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31430009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T08:13:15.228Z","status":"ssl_error","status_checked_at":"2026-04-05T08:13:11.839Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","fastapi","grafana","hybrid-search","llm","machine-learning","movie-recommendation-app","nlp","prometheus","qdrant","rag","retrieval-augmented-generation","streamlit","tmdb","vector-search"],"created_at":"2025-10-06T08:22:45.987Z","updated_at":"2026-04-05T08:32:17.246Z","avatar_url":"https://github.com/bielacki.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CineRAG - Movie Recommendation RAG System 🎬\n\nA Retrieval-Augmented Generation (RAG) application that helps users discover movies through natural language queries using TMDB data, multiple retrieval strategies, and intelligent ranking.\n\n---\n\n## 📋 Problem Description\n\n**The Challenge**: Traditional movie search relies on exact keyword matching or requires users to know specific titles, actors, or genres. Users often remember movies by plot elements, themes, or vague descriptions like \"that movie with blue aliens on another planet.\"\n\n**The Solution**: CineRAG bridges this gap by combining:\n- **Semantic understanding** of movie descriptions through embeddings\n- **Traditional keyword search** for precise matching\n- **Hybrid retrieval** that balances both approaches\n- **RAG architecture** to provide contextually relevant recommendations\n\n**Why RAG for Movies?**\n- Movies have rich metadata (plot, reviews, keywords, cast) perfect for RAG\n- Natural language queries are more intuitive than filters\n- Combining multiple retrieval methods improves accuracy\n- User feedback loop enables continuous improvement\n\n**Target Users**:\n- Movie enthusiasts searching by plot elements\n- Recommendation seekers looking for similar films\n- Users with vague movie memories\n- Data scientists exploring RAG implementations\n\n---\n\n## 🎯 Project Overview\n\nCineRAG is a full-stack RAG application featuring:\n- **Multiple retrieval backends** (BM25, Vector, Hybrid, Reranking)\n- **Comprehensive evaluation** framework with multiple metrics\n- **Interactive Streamlit UI** with user feedback collection\n- **Production-ready monitoring** via Prometheus \u0026 Grafana\n- **Automated data ingestion** from TMDB API\n- **Fully containerized** deployment\n\n### Architecture\n\n![Architecture Diagram](./docs/images/architecture.png)\n\n**Key Components**:\n1. **Data Ingestion** - TMDB API pipeline enriches movies with metadata\n2. **Knowledge Bases** - Elasticsearch (BM25) + Qdrant (vectors)\n3. **Retrieval Layer** - Multiple strategies with evaluation\n4. **FastAPI Backend** - RESTful API with Prometheus metrics\n5. **Streamlit UI** - User-friendly interface with feedback\n6. **Monitoring Stack** - Real-time observability\n\n---\n\n## 🏗️ Architecture \u0026 Components\n\n### 1. Data Ingestion Pipeline\n\n**Implementation**: `flows/tmdb_ingest.py`\n\nThe ingestion pipeline fetches and enriches movie data from TMDB.\n\n\u003e **Note**: TMDB has over 1 million movies in their database. For this demonstration project, we use approximately 10,000 movies (500 pages of results). This sample size is sufficient to showcase RAG capabilities while keeping resource requirements manageable for development and testing.\n\n**Process**:\n1. Discover popular movies via TMDB `/discover/movie` endpoint\n2. For each movie, fetch:\n   - Basic metadata (title, year, genres, runtime)\n   - Plot overview and tagline\n   - Keywords and reviews\n   - Director and top 5 cast members\n3. Construct rich `index_text` field combining all metadata\n4. Export to JSON for indexing\n\n**Running Ingestion**:\n\n```bash\n# Full dataset (500 pages = 10000 movies)\nmake ingest\n```\n\n**Output Format**:\n```json\n{\n  \"id\": \"tmdb:movie:19995\",\n  \"title\": \"Avatar\",\n  \"year\": 2009,\n  \"genres\": [\"Action\", \"Adventure\", \"Fantasy\", \"Science Fiction\"],\n  \"keywords\": [\"future\", \"marine\", \"alien planet\", ...],\n  \"people\": {\n    \"director\": [\"James Cameron\"],\n    \"cast\": [\"Sam Worthington\", \"Zoe Saldana\", ...]\n  },\n  \"index_text\": \"Avatar — Enter the World of Pandora. In the 22nd century...\"\n}\n```\n\n---\n\n### 2. Knowledge Base Setup\n\n**Elasticsearch (BM25 Search)**:\n- Index: `movies_bm25`\n- Field mapping: `index_text` with standard analyzer\n- Setup: `retrieval/es_setup.py`\n- Indexing: `retrieval/es_index.py`\n\n**Qdrant (Vector Search)**:\n- Collection: `movies_vec`\n- Embedding model: `all-MiniLM-L6-v2` (384 dimensions)\n- Distance: Cosine similarity\n- Payload: Full movie metadata\n- Setup \u0026 upload: `retrieval/qdrant_upsert.py`\n\n**Indexing Commands**:\n```bash\n# Start infrastructure\ndocker compose up -d elasticsearch qdrant\n\n# Index into both systems\nmake es-index\nmake qdrant-upsert\n```\n\n---\n\n### 3. Retrieval Evaluation\n\n**Evaluation Script**: `eval/eval_retrieval.py`\n\n#### Retrieval Approaches Tested\n\n| Backend | Description | Implementation |\n|---------|-------------|----------------|\n| **BM25** | Classic keyword-based search via Elasticsearch | `retrieval/es_search.py` |\n| **Vector** | Semantic search using embeddings via Qdrant | `retrieval/qdrant_search.py` |\n| **Hybrid** | RRF fusion of BM25 + Vector results | `retrieval/hybrid.py` |\n| **Hybrid + Rerank** | Hybrid results re-ranked by cross-encoder | `retrieval/reranker.py` |\n\n#### Evaluation Metrics\n\n**Metrics Implementation**: `eval/eval_metrics.py`\n\n- **Recall@5 / Recall@10**: Percentage of relevant docs in top K results\n- **MRR (Mean Reciprocal Rank)**: Average of reciprocal ranks of first relevant result\n- **nDCG@10**: Normalized Discounted Cumulative Gain considering ranking quality\n\n#### Evaluation Dataset\n\n**File**: `eval/eval_queries.jsonl`\n\nFormat:\n```json\n{\"query\": \"blue aliens on Pandora with human avatars\", \"gold\": [\"tmdb:movie:19995\"]}\n{\"query\": \"heist movie with dream within a dream\", \"gold\": [\"tmdb:movie:27205\"]}\n```\n\n#### Running Evaluation\n\n```bash\n# Run evaluation on all backends\nmake eval\n\n# Custom evaluation\nuv run python -m eval.eval_retrieval \\\n  --eval eval/eval_queries.jsonl \\\n  --backends es qdrant hybrid hybrid_rerank \\\n  --k 10 \\\n  --docs movies_docs.json\n```\n\n#### Results\n\n| Backend | Recall@5 | Recall@10 | MRR | nDCG@10 |\n|---------|----------|-----------|-----|---------|\n| BM25 | 0.20 | 0.40 | 0.237 | 0.26 |\n| Vector | 0.40 | 0.80 | 0.306 | 0.418 |\n| Hybrid | 0.60 | 0.80 | 0.495 | 0.567 |\n| **Hybrid + Rerank** | **0.80** | **0.80** | **0.70** | **0.726** |\n\n**Analysis**:\n- **BM25** excels at exact keyword matches (titles, actor names)\n- **Vector search** captures semantic similarity and thematic elements\n- **Hybrid RRF** balances both approaches, improving overall recall\n- **Reranking** provides final quality boost by re-scoring top candidates with a more powerful cross-encoder model\n\n---\n\n### 4. RAG Flow\n\n**Implementation**: `app/main.py`\n\n#### Query Processing Pipeline\n\n1. **User submits query** via UI or API\n2. **Backend selection** (auto-selected based on available models, or user-specified)\n3. **Retrieval**:\n   - Apply year/genre filters if specified\n   - Retrieve top-K candidates from selected backend\n   - For hybrid_rerank: retrieve 50 candidates, rerank to top-K\n4. **Context construction**: Gather full metadata for retrieved movies\n5. **Response generation**: Format results with citations\n6. **Metrics collection**: Track latency, backend usage, errors\n\n#### API Endpoints\n\n**POST `/ask`**:\n```bash\ncurl -X POST http://localhost:8000/ask \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"query\": \"space movie with AI\",\n    \"top_k\": 5,\n    \"backend\": \"hybrid_rerank\",\n    \"year\": [2000, 2024],\n    \"genres\": [\"Science Fiction\"]\n  }'\n```\n\n**Response**:\n```json\n{\n  \"answer\": \"Top matches for 'space movie with AI': ...\",\n  \"citations\": [\n    {\"tmdb_id\": \"tmdb:movie:157336\", \"title\": \"Interstellar\", \"year\": 2014}\n  ],\n  \"retrieved\": [\n    {\"score\": 0.952, \"tmdb_id\": \"tmdb:movie:157336\"}\n  ],\n  \"backend\": \"hybrid_rerank\"\n}\n```\n\n**POST `/feedback`**: Collect user thumbs up/down feedback\n\n**GET `/healthz`**: Health check endpoint\n\n**GET `/metrics`**: Prometheus metrics endpoint\n\n---\n\n### 5. LLM Evaluation\n\n**Implementation**: `eval/eval_llm.py`, `eval/eval_llm_metrics.py`\n\nThe system generates natural language answers using LLMs and evaluates them across multiple dimensions.\n\n#### Answer Generation\n\nThe RAG flow (`app/main.py`) uses retrieved movie documents as context to generate answers:\n\n1. **Retrieve** top-K relevant movies\n2. **Build context** from movie metadata (title, year, genres, overview)\n3. **Generate answer** using LLM with prompt engineering\n4. **Return** natural language response with citations\n\n#### Evaluation Dataset\n\n**File**: `eval/eval_llm_queries.jsonl`\n\nFormat:\n```json\n{\n  \"qid\": 1,\n  \"query\": \"blue aliens on Pandora with human avatars\",\n  \"gold\": [\"tmdb:movie:19995\"],\n  \"expected_aspects\": [\"blue aliens\", \"Pandora\", \"avatars\", \"humans\"]\n}\n```\n\n#### Evaluation Metrics (LLM-as-a-Judge)\n\n**Metrics Implementation**: `eval/eval_llm_metrics.py`\n\n- **Relevance** (0-1): Does the answer address the user's query?\n- **Faithfulness** (0-1): Is the answer grounded in the provided context without hallucinations?\n- **Coherence** (0-1): Is the answer well-structured and easy to understand?\n- **Aspect Coverage** (0-1): Does the answer cover expected aspects from the query?\n- **Overall** (0-1): Average of all metrics\n\nUses LLM-as-a-judge approach with structured prompts to evaluate each dimension.\n\n#### Running LLM Evaluation\n\n```bash\n# Run LLM evaluation\nmake eval-llm\n\n# Or with custom parameters\nuv run python -m eval.eval_llm \\\n  --eval eval/eval_llm_queries.jsonl \\\n  --docs movies_docs.json \\\n  --backend hybrid_rerank \\\n  --top-k 5 \\\n  --out reports/llm_eval_results.json\n\n# Run both retrieval and LLM evaluation\nmake eval-all\n```\n\n#### Results\n\n| Metric | Mean | Median | StdDev | Min | Max |\n|--------|------|--------|--------|-----|-----|\n| **Relevance** | 0.925 | 1.0 | 0.121 | 0.75 | 1.0 |\n| **Faithfulness** | 0.85 | 1.0 | 0.211 | 0.5 | 1.0 |\n| **Coherence** | 0.775 | 0.75 | 0.079 | 0.75 | 1.0 |\n| **Aspect Coverage** | 0.875 | 1.0 | 0.212 | 0.5 | 1.0 |\n| **Overall** | 0.856 | 0.938 | 0.135 | 0.625 | 1.0 |\n\n**Analysis**:\n- Higher relevance scores indicate better query understanding\n- High faithfulness means minimal hallucinations\n- Coherence measures response quality\n- Aspect coverage shows comprehensive answers\n\n#### LLM Configuration\n\nSupports multiple providers. Configure via environment variables:\n\n```bash\n# OpenAI (default)\nexport LLM_PROVIDER=openai\nexport OPENAI_API_KEY=sk-...\nexport OPENAI_MODEL=gpt-4o-mini\n\n# Anthropic\nexport LLM_PROVIDER=anthropic\nexport ANTHROPIC_API_KEY=sk-ant-...\nexport ANTHROPIC_MODEL=claude-3-5-sonnet-20241022\n\n# vLLM (local server)\nexport LLM_PROVIDER=vllm\nexport VLLM_BASE_URL=http://localhost:8001/v1\nexport VLLM_MODEL=qwen2.5-7b-instruct\n```\n\n---\n\n### 6. Interface\n\n**Implementation**: `ui/app.py`\n\nThe Streamlit interface provides an intuitive way to interact with CineRAG.\n\n#### Features\n\n**Search Interface**:\n- Natural language query input\n- Backend selection (auto, hybrid_rerank, hybrid, qdrant, es)\n- Top-K results slider\n- Year range filter (1950-2025)\n- Genre filtering (comma-separated)\n\n**Results Display**:\n- Formatted answer with clickable movie posters links\n- Debug expander showing retrieval scores\n\n![Search Results](./docs/images/ui-main.png)\n\n**User Feedback**:\n- Thumbs up/down buttons\n- Feedback sent to API for monitoring\n- Tracked in Prometheus metrics\n\n![User Feedback](./docs/images/ui-feedback.png)\n\n#### Running the UI\n\n```bash\n# Via Docker Compose (recommended)\ndocker compose up -d ui\n\n# Or locally\nstreamlit run ui/app.py\n```\n\n**Access**: http://localhost:8501\n\n---\n\n### 7. Monitoring\n\n**Implementation**: `monitoring/prometheus.yml`, `monitoring/grafana/`\n\n#### Prometheus Metrics Tracked\n\n**In `app/main.py`**:\n\n- `cinerag_requests_total` - Total requests by endpoint\n- `cinerag_errors_total` - Errors by endpoint and type\n- `cinerag_stage_latency_seconds` - Latency histogram per stage\n- `cinerag_backend_requests_total` - Backend usage distribution\n- `cinerag_feedback_total` - User feedback (thumbs up/down)\n\n#### Grafana Dashboards\n\nPre-configured dashboards visualize:\n- Request rate and error rate over time\n- Latency percentiles (p50, p95, p99)\n- Backend usage distribution\n- User feedback sentiment trends\n\n![Grafana Dashboard](./docs/images/grafana-dashboard.png)\n\n\n#### Accessing Monitoring\n\n```bash\n# Start monitoring stack\nmake monitoring\n\n# Or via Docker Compose\ndocker compose up -d prometheus grafana\n```\n\n**URLs**:\n- Prometheus: http://localhost:9090\n- Grafana: http://localhost:3000 (admin/admin)\n\n---\n\n### 8. Containerization\n\n**Implementation**: `docker-compose.yml`\n\nAll services are containerized for easy deployment:\n\n#### Services\n\n| Service | Image | Port | Purpose |\n|---------|-------|------|---------|\n| **elasticsearch** | elasticsearch:8.14.1 | 9200 | BM25 search engine |\n| **qdrant** | qdrant:v1.9.0 | 6333, 6334 | Vector database |\n| **api** | Custom (Dockerfile.api) | 8000 | FastAPI backend |\n| **ui** | Custom (Dockerfile.ui) | 8501 | Streamlit interface |\n| **prometheus** | prometheus:v2.55.1 | 9090 | Metrics collection |\n| **grafana** | grafana:10.4.8 | 3000 | Visualization |\n\n#### Docker Compose Commands\n\n```bash\n# Start all services\ndocker compose up -d\n\n# Start specific services\ndocker compose up -d elasticsearch qdrant\ndocker compose up -d api ui\ndocker compose up -d prometheus grafana\n\n# View logs\ndocker compose logs -f api\n\n# Stop all\ndocker compose down\n\n# Stop and remove volumes\ndocker compose down -v\n```\n\n#### Resource Requirements\n\n**Minimum**:\n- 4 GB RAM\n- 10 GB disk space\n- Docker Engine 20.10+\n\n**Recommended**:\n- 8 GB RAM\n- 20 GB disk space (for larger datasets)\n\n---\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n- **Python 3.12**\n- **Docker \u0026 Docker Compose**\n- **TMDB API v4 Bearer Token** - [Get one here](https://developer.themoviedb.org/docs/getting-started)\n- **uv** (recommended, for reproducible builds): `curl -LsSf https://astral.sh/uv/install.sh | sh` or see [installation docs](https://docs.astral.sh/uv/getting-started/installation/)\n\n### Installation \u0026 Setup\n\n#### Step 1: Clone and Setup Environment\n\n```bash\n# Navigate to project\ncd cinerag_project\n\n# Sync dependencies from pyproject.toml (creates venv + installs exact versions)\nuv sync\n\n# Activate the virtual environment\nsource .venv/bin/activate\n```\n\n\u003e **Note**: `uv sync` reads from `pyproject.toml` and `uv.lock` to ensure you get the exact package versions used in development, guaranteeing reproducibility. Dependencies are managed in `pyproject.toml` (modern Python standard).\n\n#### Step 2: Configure Environment Variables\n\n```bash\n# Copy example env file\ncp .env.example .env\n\n# Edit .env and add your TMDB token:\necho \"TMDB_API_TOKEN=YOUR_V4_BEARER_TOKEN\" \u003e\u003e .env\n\n# Optional: Add LLM API keys if implementing full RAG\n# echo \"OPENAI_API_KEY=sk-...\" \u003e\u003e .env\n# echo \"OPENAI_MODEL=gpt-4o-mini\" \u003e\u003e .env\n```\n\n#### Step 3: Ingest Movie Data\n\n```bash\n# Full dataset (500 pages = ~10000 movies)\nmake ingest\n```\n\nThis will create `movies_docs.json` in your project root.\n\n#### Step 4: Start Infrastructure\n\n```bash\ndocker compose up -d\n```\n\nThis starts Elasticsearch, Qdrant, and all other services.\n\nWait ~30 seconds for Elasticsearch to be ready.\n\n#### Step 5: Index Data\n\n```bash\n# Index into Elasticsearch (BM25)\nmake es-index\n\n# Upload to Qdrant (vectors)\nmake qdrant-upsert\n```\n\n#### Step 6: Run Application\n\n```bash\n# Start API and UI\nmake api\n\n# Start monitoring (optional)\nmake monitoring\n```\n\n#### Step 7: Access Services\n\n| Service | URL | Credentials |\n|---------|-----|-------------|\n| **Streamlit UI** | http://localhost:8501 | - |\n| **FastAPI Docs** | http://localhost:8000/docs | - |\n| **Prometheus** | http://localhost:9090 | - |\n| **Grafana** | http://localhost:3000 | admin / admin |\n| **Elasticsearch** | http://localhost:9200 | - |\n| **Qdrant** | http://localhost:6333 | - |\n\n![API Documentation (FastAPI Swagger)](./docs/images/api-docs.png)\n\n---\n\n## 📊 Running Evaluations\n\n### Retrieval Evaluation\n\n```bash\n# Run retrieval evaluation (all backends)\nmake eval\n\n# View results\ncat reports/retrieval_results.json\n```\n\n### LLM Evaluation\n\n```bash\n# Run LLM evaluation\nmake eval-llm\n\n# View results\ncat reports/llm_eval_results.json\n```\n\n### Run All Evaluations\n\n```bash\n# Run both retrieval and LLM evaluation\nmake eval-all\n```\n\n### Custom Evaluation\n\n```bash\n# Custom retrieval evaluation\nuv run python -m eval.eval_retrieval \\\n  --eval eval/eval_queries.jsonl \\\n  --backends hybrid_rerank \\\n  --k 10 \\\n  --docs movies_docs.json \\\n  --out reports/my_eval.json\n\n# Custom LLM evaluation\nuv run python -m eval.eval_llm \\\n  --eval eval/eval_llm_queries.jsonl \\\n  --docs movies_docs.json \\\n  --backend hybrid_rerank \\\n  --top-k 5 \\\n  --out reports/my_llm_eval.json\n```\n\n---\n\n## 🎬 Usage Examples\n\n### Example Queries\n\nTry these in the UI at http://localhost:8501:\n\n- **\"blue aliens on Pandora with human avatars\"** → Avatar\n- **\"heist movie with dream within a dream\"** → Inception\n- **\"wizard school with chosen one prophecy\"** → Harry Potter\n- **\"time loop action movie with Tom Cruise\"** → Edge of Tomorrow\n- **\"dinosaur theme park that goes wrong\"** → Jurassic Park\n- **\"virtual reality hacker movie from 1999\"** → The Matrix\n\n### API Usage\n\n```bash\n# Basic query\ncurl -X POST http://localhost:8000/ask \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"query\": \"space movie with AI\",\n    \"top_k\": 5,\n    \"backend\": \"auto\"\n  }'\n\n# With filters\ncurl -X POST http://localhost:8000/ask \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"query\": \"romantic comedy in Paris\",\n    \"top_k\": 7,\n    \"backend\": \"hybrid_rerank\",\n    \"year\": [2000, 2024],\n    \"genres\": [\"Romance\", \"Comedy\"]\n  }'\n\n# Send feedback\ncurl -X POST http://localhost:8000/feedback \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"query\": \"space movie with AI\",\n    \"answer\": \"...\",\n    \"citations\": [\"tmdb:movie:157336\"],\n    \"thumb\": \"up\"\n  }'\n```\n\n---\n\n## 🛠️ Technologies\n\n### Core Stack\n- **Vector Database**: Qdrant \n- **Search Engine**: Elasticsearch\n- **Embedding Model**: sentence-transformers `all-MiniLM-L6-v2` (384-dim)\n- **Reranker Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2`\n- **LLM Clients**: OpenAI, Anthropic, vLLM support\n- **API Framework**: FastAPI\n- **Web Server**: Uvicorn\n- **UI Framework**: Streamlit\n- **Data Source**: TMDB API\n\n### Monitoring \u0026 DevOps\n- **Metrics**: Prometheus\n- **Visualization**: Grafana\n- **Metrics Client**: prometheus-client\n- **Containerization**: Docker Compose\n\n### ML \u0026 Data Processing\n- **ML Framework**: PyTorch\n- **Transformers**: sentence-transformers\n- **Numerical**: NumPy\n- **Progress**: tqdm\n\n---\n\n## 🔍 Monitoring \u0026 Observability\n\nCineRAG includes production-ready monitoring with Prometheus and Grafana.\n\n### Metrics Collected\n\n**Request Metrics**:\n- `cinerag_requests_total{endpoint}` - Total requests per endpoint\n- `cinerag_errors_total{endpoint, type}` - Error counts by type\n\n**Performance Metrics**:\n- `cinerag_stage_latency_seconds{stage}` - Histogram of latency per stage\n  - Buckets: 0.005s to 10s\n  - Tracks: retrieve, rerank, llm stages\n\n**Usage Metrics**:\n- `cinerag_backend_requests_total{backend}` - Backend usage distribution\n- `cinerag_feedback_total{thumb}` - User satisfaction (up/down)\n\n### Grafana Dashboards\n\nPre-configured dashboards include:\n- **Overview**: Request rate, error rate, latency percentiles\n- **Backend Performance**: Latency comparison across retrieval methods\n- **User Engagement**: Feedback sentiment trends over time\n- **System Health**: Error rates, backend availability\n\nAccess at http://localhost:3000 (admin/admin)\n\n---\n\n## 🧪 Testing \u0026 Development\n\n### Running Components Individually\n\n```bash\n# Elasticsearch only\ndocker compose up -d elasticsearch\n\n# Qdrant only\ndocker compose up -d qdrant\n\n# API locally (no Docker)\nexport ES_URL=http://localhost:9200\nexport QDRANT_URL=http://localhost:6333\nuvicorn app.main:app --reload --port 8000\n\n# UI locally\nexport API_URL=http://localhost:8000\nstreamlit run ui/app.py\n```\n\n### Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| **TMDB API** | | |\n| `TMDB_API_TOKEN` | - | **Required**: TMDB v4 API bearer token (starts with \"eyJ...\") |\n| **LLM Provider** | | |\n| `LLM_PROVIDER` | - | LLM provider: `openai`, `anthropic`, or `vllm` |\n| `OPENAI_API_KEY` | - | OpenAI API key (if using OpenAI) |\n| `OPENAI_MODEL` | `gpt-4o-mini` | OpenAI model name |\n| `ANTHROPIC_API_KEY` | - | Anthropic API key (if using Anthropic) |\n| `ANTHROPIC_MODEL` | `claude-3-5-sonnet-20241022` | Anthropic model name |\n| `VLLM_BASE_URL` | `http://localhost:8001/v1` | vLLM server URL (if using local vLLM) |\n| `VLLM_MODEL` | `qwen2.5-7b-instruct` | vLLM model name |\n| **Search \u0026 Retrieval** | | |\n| `ES_URL` | `http://localhost:9200` | Elasticsearch endpoint URL |\n| `ES_INDEX` | `movies_bm25` | Elasticsearch index name |\n| `QDRANT_URL` | `http://localhost:6333` | Qdrant vector database URL |\n| `QDRANT_COLLECTION` | `movies_vec` | Qdrant collection name |\n| `EMBED_MODEL` | `all-MiniLM-L6-v2` | Sentence-transformers embedding model |\n| `RERANK_MODEL` | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-encoder reranking model |\n| **Application** | | |\n| `API_URL` | `http://api:8000` | FastAPI backend endpoint (for UI) |\n| `DOCS_PATH` | `movies_docs.json` | Path to movie documents JSON file |\n| `FEEDBACK_LOG_PATH` | `data/feedback.jsonl` | Path for user feedback storage |\n| **Monitoring** | | |\n| `PROMETHEUS_PORT` | `9090` | Prometheus metrics port |\n| `GRAFANA_PORT` | `3000` | Grafana dashboard port |\n\n---\n\n## 🚧 Current Limitations\n- LLM integration is minimal (returns movie IDs, not full natural language responses)\n- Evaluation dataset is small (expand with more diverse queries)\n- No conversation history/multi-turn dialog\n- Single-language support (English only)\n\n---\n\n## 📚 References \u0026 Attribution\n\n### Data Source\nThis project uses data from **The Movie Database (TMDB)** API but is not endorsed or certified by TMDB. TMDB Terms: https://www.themoviedb.org/documentation/api/terms-of-use\n\n![TMDB Logo](./docs/images/tmdb-logo.svg)\n\n### Course\nThis project was developed as part of the [LLM Zoomcamp](https://github.com/DataTalksClub/llm-zoomcamp) by [DataTalks.Club](https://datatalks.club/).\n\n### Key Technologies \u0026 Papers\n- [Sentence-BERT](https://arxiv.org/abs/1908.10084) - Sentence embeddings\n- [Cross-Encoders for Reranking](https://www.sbert.net/examples/applications/cross-encoder/README.html)\n- [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Hybrid search\n- [FastAPI](https://fastapi.tiangolo.com/)\n- [Streamlit](https://streamlit.io/)\n- [Elasticsearch](https://www.elastic.co/)\n- [Qdrant](https://qdrant.tech/)\n\n---\n\n## 📄 License\n\n[MIT License](LICENSE) - see LICENSE file for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbielacki%2Fcinerag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbielacki%2Fcinerag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbielacki%2Fcinerag/lists"}