{"id":46905533,"url":"https://github.com/smoothemerson/ragscope","last_synced_at":"2026-03-11T01:09:17.217Z","repository":{"id":341048363,"uuid":"1167730795","full_name":"smoothemerson/ragscope","owner":"smoothemerson","description":"Q\u0026A over documents using RAG (FastAPI + ChromaDB + Ollama + MLflow)","archived":false,"fork":false,"pushed_at":"2026-03-09T21:52:56.000Z","size":80,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-10T00:27:39.122Z","etag":null,"topics":["chromadb","docker","fastapi","langchain","llm","llm-evaluation","mlflow","ollama","rag","retrieval-augmented-generation","self-hosted","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smoothemerson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-26T16:10:44.000Z","updated_at":"2026-03-09T21:52:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/smoothemerson/ragscope","commit_stats":null,"previous_names":["smoothemerson/ragscope"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/smoothemerson/ragscope","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smoothemerson%2Fragscope","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smoothemerson%2Fragscope/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smoothemerson%2Fragscope/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smoothemerson%2Fragscope/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smoothemerson","download_url":"https://codeload.github.com/smoothemerson/ragscope/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smoothemerson%2Fragscope/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30364989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"ssl_error","status_checked_at":"2026-03-10T21:40:59.357Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","docker","fastapi","langchain","llm","llm-evaluation","mlflow","ollama","rag","retrieval-augmented-generation","self-hosted","vector-database"],"created_at":"2026-03-11T01:09:16.713Z","updated_at":"2026-03-11T01:09:17.199Z","avatar_url":"https://github.com/smoothemerson.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG API with MLflow Evaluation Dashboard\n\nA portfolio-grade Q\u0026A API that lets you upload PDF/text documents and ask questions about them using Retrieval-Augmented Generation (RAG). Every query is logged as an MLflow run with operational metrics and LLM-as-judge quality scores.\n\n**Fully offline — no external API keys required.**\n\n---\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────┐\n│                  Docker Compose                      │\n│                                                      │\n│  ┌──────────────────────────┐    ┌──────────────┐   │\n│  │  FastAPI  :8000          │    │    MLflow    │   │\n│  │  └─ Chroma (embedded)    │    │    :5000     │   │\n│  └────┬─────────────────────┘    └──────────────┘   │\n│       │                                              │\n│       ▼                                              │\n│  ┌──────────┐                                        │\n│  │  Ollama  │  (qwen3.5:9b · llama3.2 · nomic-embed) │\n│  │  :11434  │                                        │\n│  └──────────┘                                        │\n└─────────────────────────────────────────────────────┘\n```\n\nChroma runs **embedded** inside the API container (no separate ChromaDB service). Vector data is persisted to a named Docker volume (`chroma_data`) via `CHROMA_PERSIST_DIR`.\n\n**RAG Pipeline:**\n1. User uploads a document → `POST /ingest`\n2. Text is extracted, chunked (4 000 chars, 20 overlap), and embedded with `nomic-embed-text`\n3. Embeddings are stored in the embedded Chroma vector store (persisted to volume)\n4. User asks a question → `POST /query`\n5. Question is embedded and top-k chunks retrieved from Chroma by cosine similarity\n6. Retrieved chunks + question are passed to `qwen3.5:9b` (configurable via `OLLAMA_MODEL`) via a LangChain `RunnableSequence`\n7. Answer is returned; metrics and quality scores are logged to MLflow under experiment `ragscope`\n\n---\n\n## Prerequisites\n\n- Docker and Docker Compose installed\n- ~10 GB free disk space (for Ollama models)\n\nThe `./mlflow/data` and `./mlflow/artifacts` directories are created automatically by Docker when the bind mounts are resolved on first startup.\n\n---\n\n## Quickstart\n\n**Step 1 — set your hardware profile in `.env`:**\n\n| Hardware | `COMPOSE_PROFILES` value |\n|---|---|\n| CPU | `cpu` |\n| NVIDIA GPU | `gpu-nvidia` |\n| AMD GPU (ROCm) | `gpu-amd` |\n\n```bash\n# .env\nCOMPOSE_PROFILES=cpu        # or gpu-nvidia or gpu-amd\n```\n\n\u003e **Warning:** `COMPOSE_PROFILES` must be exactly one of `cpu`, `gpu-nvidia`, or `gpu-amd`.\n\u003e Any other value (including leaving it blank) will cause no Ollama service to start and the API will fail to connect.\n\n**Step 2 — start the stack:**\n\n```bash\ndocker compose up\n```\n\n\nWait for all three Ollama models to finish pulling (logged in `api` service output). Then:\n\n- FastAPI docs: http://localhost:8000/docs\n- MLflow UI: http://localhost:5000\n\n---\n\n## Example Usage\n\n### Ingest a document\n\n```bash\ncurl -X POST http://localhost:8000/ingest \\\n  -F \"file=@/path/to/your/document.pdf\"\n```\n\n```json\n{\"status\": \"ok\", \"chunks_stored\": 42, \"filename\": \"document.pdf\"}\n```\n\n### Query the RAG pipeline\n\n```bash\ncurl -X POST http://localhost:8000/query \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\": \"What is the main topic of the document?\", \"top_k\": 4}'\n```\n\n```json\n{\n  \"answer\": \"The document covers...\",\n  \"sources\": [\"chunk text 1\", \"chunk text 2\"]\n}\n```\n\n### Health check\n\n```bash\ncurl http://localhost:8000/health\n```\n\n```json\n{\"status\": \"ok\", \"chromadb\": \"ok\", \"ollama\": \"ok\"}\n```\n\n---\n\n## MLflow Dashboard\n\nEvery call to `POST /query` creates one MLflow run under the **ragscope** experiment.\n\nAccess the dashboard at **http://localhost:5000** → select `ragscope` experiment.\n\nEach run logs:\n- **GenAI Quality Scores** (via MLflow GenAI scorers, evaluated by `llama3.2`):\n  - `retrieval_groundedness` — is the answer grounded in the retrieved context?\n  - `answer_relevancy` — does the answer address the question?\n  - `hallucination` — does the answer contain information not supported by context?\n  - `safety` — is the answer free of harmful content?\n\nQuality scores use a separate LLM judge (`llama3.2`) via MLflow GenAI's built-in scorers (`RetrievalGroundedness`, `AnswerRelevancy`, `Hallucination`, `Safety`).\n\n---\n\n## Environment Variables\n\n| Variable              | Default               | Description                              |\n|-----------------------|-----------------------|------------------------------------------|\n| `OLLAMA_MODEL`        | `qwen3.5:9b`          | Ollama model for answer generation       |\n| `OLLAMA_JUDGE_MODEL`  | `llama3.2`            | Ollama model for LLM-as-judge scoring    |\n| `OLLAMA_EMBED_MODEL`  | `nomic-embed-text`    | Ollama model for embeddings              |\n| `CHROMA_PERSIST_DIR`  | `/chroma/data`        | Path inside the container where Chroma persists its data (mounted to `chroma_data` volume) |\n| `MLFLOW_TRACKING_URI` | `http://mlflow:5000`  | MLflow tracking server URI               |\n\nOverride any variable by setting it before running `docker compose up`:\n\n```bash\nOLLAMA_MODEL=llama3.1 docker compose up\n```\n\n---\n\n## How It Works\n\n1. **Document Ingestion** (`POST /ingest`):\n   - File uploaded as `multipart/form-data`\n   - PDF → `PyPDFLoader.load_and_split()`; TXT → `TextLoader`\n   - Split with `RecursiveCharacterTextSplitter` (chunk_size=4 000, overlap=20)\n   - Embedded with `nomic-embed-text` via Ollama\n   - Stored in embedded Chroma (persisted to `chroma_data` volume)\n\n2. **Query** (`POST /query`):\n   - Question embedded with `nomic-embed-text`\n   - Top-k chunks retrieved from Chroma by cosine similarity\n   - LangChain `RunnableSequence` (`PromptTemplate | ChatOllama`) runs `qwen3.5:9b` (or `OLLAMA_MODEL`) with retrieved context\n   - Answer extracted from `AIMessage.content` and returned with source chunks\n\n3. **MLflow Logging**:\n   - Experiment name: `ragscope`\n   - `autolog()` enabled on startup via `src/tracking/setup.py`\n   - MLflow GenAI `evaluate()` runs scorers (`RetrievalGroundedness`, `AnswerRelevancy`, `Hallucination`, `Safety`) using judge model (`llama3.2`)\n   - All traces and scores visible in MLflow UI under the GenAI section\n\n4. **Model Warm-up**:\n   - On startup, the API pulls all three Ollama models via `POST /api/pull`\n   - FastAPI does not accept requests until all models are confirmed available\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmoothemerson%2Fragscope","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmoothemerson%2Fragscope","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmoothemerson%2Fragscope/lists"}