{"id":50475987,"url":"https://github.com/berkayildi/rag-on-azure","last_synced_at":"2026-06-01T13:03:29.736Z","repository":{"id":354505737,"uuid":"1223731248","full_name":"berkayildi/rag-on-azure","owner":"berkayildi","description":"Production-grade RAG reference implementation on Azure.","archived":false,"fork":false,"pushed_at":"2026-05-19T21:37:36.000Z","size":842,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T00:46:53.329Z","etag":null,"topics":["azure","azure-ai-search","azure-openai","bicep","fastapi","langgraph","llm-evaluation","managed-identity","multi-tenant","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/berkayildi.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-28T15:46:34.000Z","updated_at":"2026-05-19T21:36:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/berkayildi/rag-on-azure","commit_stats":null,"previous_names":["berkayildi/rag-on-azure"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/berkayildi/rag-on-azure","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/berkayildi%2Frag-on-azure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/berkayildi%2Frag-on-azure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/berkayildi%2Frag-on-azure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/berkayildi%2Frag-on-azure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/berkayildi","download_url":"https://codeload.github.com/berkayildi/rag-on-azure/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/berkayildi%2Frag-on-azure/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33775865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azure-ai-search","azure-openai","bicep","fastapi","langgraph","llm-evaluation","managed-identity","multi-tenant","rag"],"created_at":"2026-06-01T13:03:28.820Z","updated_at":"2026-06-01T13:03:29.728Z","avatar_url":"https://github.com/berkayildi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# rag-on-azure\n\nA production-shaped Retrieval-Augmented Generation application on Microsoft Azure, built as a reference implementation of LLMOps discipline: Bicep IaC, FastAPI + LangGraph, multi-tenant via JWT-driven OData filters, and a quality gate (`mcp-llm-eval`) measured continuously on every push to `main`. Calibrated thresholds, eval results pushed to a public dashboard, no long-lived secrets in the deployed runtime. Intended for portfolio reviewers and LLMOps practitioners studying production patterns; forks are welcome as architectural reference.\n\n## Live demo\n\nThe deployed dev stack runs at `https://rag-dev-ca.ashybay-7602179f.swedencentral.azurecontainerapps.io`. Auth-free probes are public; `/query` requires a JWT minted via [`scripts/mint-token.py`](scripts/mint-token.py).\n\n**Quickest verification** — chains healthz → readyz → signed `/query` against the live stack:\n\n```bash\nmake smoke\n```\n\nOr step-by-step:\n\n```bash\nFQDN=\"https://rag-dev-ca.ashybay-7602179f.swedencentral.azurecontainerapps.io\"\n\n# Liveness — auth-free\ncurl -s \"$FQDN/healthz\"\n# {\"status\":\"ok\"}\n\n# Readiness — auth-free, pings each runtime client\ncurl -s \"$FQDN/readyz\"\n# {\"status\":\"ready\",\"checks\":{\"openai\":\"ok\",\"search\":\"ok\",\"key_vault\":\"ok\"}}\n\n# Prometheus exposition — auth-free, public per design (see docs/security.md)\ncurl -s \"$FQDN/metrics\" | head -20\n\n# Real query — admin-or-tenant JWT required\nTOKEN=$(python scripts/mint-token.py demo)\ncurl -s \"$FQDN/query\" \\\n  -H \"Authorization: Bearer $TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\":\"What does PS26/3 say about commission disclosure?\",\"top_k\":5}' \\\n  | jq .\n```\n\nThe retrieval-domain dashboard at [llmshot.vercel.app/retrieval](https://llmshot.vercel.app/retrieval) renders the live eval-gate output as the **rag-on-azure (FCA + HMRC)** tile, refreshed on every push to `main`.\n\n![rag-on-azure on the llmshot retrieval dashboard](docs/assets/llmshot-azure-detail.png)\n\n## Architecture\n\n```mermaid\nflowchart TB\n    subgraph GH[GitHub]\n        Repo[rag-on-azure repo]\n        CI[GitHub Actions ci.yml\u003cbr\u003e10 jobs]\n        GHCR[(GHCR\u003cbr\u003econtainer registry)]\n        LLMB[llm-benchmarks repo\u003cbr\u003eretrieval/azure-*.json]\n    end\n\n    subgraph Azure[Azure — rg-dev — Sweden Central]\n        AAD[(Microsoft Entra ID\u003cbr\u003eOIDC federation)]\n        CAE[Container Apps Environment]\n        CA[Container App\u003cbr\u003eFastAPI + LangGraph]\n        AOAI[(Azure OpenAI\u003cbr\u003egpt-4o + text-embedding-3-small)]\n        AISearch[(Azure AI Search\u003cbr\u003ecorpus index)]\n        KV[(Key Vault\u003cbr\u003ejwt-signing-key)]\n        LAW[(Log Analytics\u003cbr\u003e+ Application Insights)]\n        MI[Managed Identity]\n    end\n\n    User[Client\u003cbr\u003eJWT bearer] --\u003e|POST /query| CA\n    CA --\u003e|MI: Cognitive Services User| AOAI\n    CA --\u003e|MI: Search Index Reader| AISearch\n    CA --\u003e|MI: Key Vault Secrets User| KV\n    CA --\u003e LAW\n\n    Ingest[ingest pipeline\u003cbr\u003efetch / chunk / index] --\u003e|MI: Search Contributor| AISearch\n    Ingest --\u003e|MI: embeddings| AOAI\n\n    CI --\u003e|OIDC\u003cbr\u003eno secrets| AAD\n    AAD --\u003e|Owner on RG| Azure\n    CI --\u003e|build + push| GHCR\n    GHCR --\u003e|sha-pinned image| CA\n    CI --\u003e|eval-gate snapshot| AISearch\n    CI --\u003e|GitHub App token| LLMB\n    LLMB -.-\u003e|GitHub Pages| LLMShot\n\n    LLMShot[llmshot.vercel.app\u003cbr\u003eretrieval dashboard]\n```\n\nCI runs ten jobs on every push to `main`:\n\n![All 10 CI jobs green on main](docs/assets/ci-pipeline-green.png)\n\n`lint` → `gitleaks` → `bicep-validate` → `unit-tests` → `integration-tests` → `build` → `bicep-whatif` → `deploy` → `eval-gate` → `publish-benchmarks`. OIDC-federated; no service principal secret in repo settings. Full topology in [`docs/architecture.md`](docs/architecture.md).\n\n## Quick start\n\n```bash\ngit clone git@github.com:berkayildi/rag-on-azure.git \u0026\u0026 cd rag-on-azure\naz login                                            # tenant + sub the dev RG lives in\nmake plan                                           # az deployment group what-if; read-only\nmake apply                                          # azd provision; ~3 min for a fresh RG\ncd ingest \u0026\u0026 python -m ingest all \u0026\u0026 cd ..          # seed the corpus into AI Search\nmake smoke                                          # verify end-to-end (healthz + readyz + signed /query)\n```\n\nThat's the five-line summary. The real day-1 runbook (twelve steps including OIDC bootstrap, JWT key plumbing, eval-gate operator setup, and llm-benchmarks GitHub App provisioning) lives in [`docs/deployment.md`](docs/deployment.md).\n\n## Tech stack\n\n| Layer | Components |\n|---|---|\n| **Azure platform** | Container Apps (scale-to-zero), Azure AI Search (Free SKU, hybrid BM25 + HNSW vector), Azure OpenAI (`gpt-4o@2024-11-20` + `text-embedding-3-small`), Key Vault (RBAC), Log Analytics + Application Insights, system-assigned Managed Identity |\n| **Application** | FastAPI 0.115+ (async-first), LangGraph 2.x (linear `understand → retrieve → generate`), Pydantic v2 models throughout, `prometheus-client` for `/metrics`, `pyjwt[crypto]` RS256 verification |\n| **Infrastructure** | Bicep (modular, 6 modules), GitHub Actions (~350 lines, 10 jobs), OIDC federation to Microsoft Entra ID (no long-lived service principal secret), Release Please for versioning |\n| **Observability** | `prometheus-client` `/metrics` endpoint with retrieval/generation/total histograms (LLM-tuned buckets, not HTTP defaults), Application Insights for traces and logs, structured logging with `run_id` correlation on `/ingest` |\n| **Eval \u0026 quality** | [`mcp-llm-eval`](https://github.com/berkayildi/mcp-llm-eval) `==0.9.2` from PyPI; 36 grounded golden questions over UK regulatory documents (FCA Policy Statements + HMRC guidance); calibrated thresholds enforced on every push to `main`; results pushed to [llm-benchmarks](https://github.com/berkayildi/llm-benchmarks) and rendered on [llmshot.vercel.app](https://llmshot.vercel.app/retrieval) |\n| **Security tooling** | `gitleaks` (pre-commit + CI step, version-pinned), Dependabot (github-actions + pip ecosystems), GitHub secret scanning + push protection, `mypy --strict` |\n\n## Eval gate\n\nThe eval gate is the load-bearing quality contract. Every push to `main` snapshots the deployed dev AI Search index for a single tenant, runs `mcp-llm-eval evaluate-rag` against `eval/golden.jsonl` (36 questions grounded in real corpus chunks), and asserts retrieval and generation metrics against calibrated thresholds. Threshold misses fail the build; no main commit ships without eval evidence.\n\n**Calibrated thresholds and a representative measurement set** (the latest passing main run is linked from the [Actions tab](https://github.com/berkayildi/rag-on-azure/actions); BM25 retrieval is deterministic against a fixed corpus, so retrieval metrics are stable run-over-run):\n\n| Metric | Threshold | Current |\n|---|---|---|\n| `avg_recall_at_k` | ≥ 0.60 | **0.7778** |\n| `avg_mrr` | ≥ 0.50 | **0.5278** |\n| `avg_ndcg_at_k` | ≥ 0.55 | **0.5908** |\n| `avg_context_relevance` (LLM judge) | ≥ 0.55 | **0.7028** |\n| `avg_citation_faithfulness` (LLM judge) | ≥ 0.90 | **0.9931** |\n| `p95_retrieval_latency_ms` | ≤ 200 | **8.6** |\n| `p95_ttft_ms` | ≤ 5000 | **0** |\n| `max_cost_per_query` | ≤ £0.005 | **£0.0000** |\n\n**Stability proof**: 16 dependabot dependency upgrades through Day 7 (including `azure-search-documents` 11→12, `openai` 1→3, `langgraph` 0.2→2) plus 5 phase merges (`/metrics`, `publish-benchmarks`, `/ingest`, calibration, polish) introduced **zero retrieval-metric drift** (BM25 against an unchanged corpus is deterministic) and **\u003c1% drift** on the LLM-judge metrics — well inside judge variance. The full calibration history sits in `eval/.eval-gate.yml`.\n\nThe threshold *floor* is conservative on purpose. The metrics live above it because the corpus is well-shaped and the questions are grounded; tightening lands as a separate calibration commit when there's run-over-run signal that justifies it.\n\n## Project structure\n\n```\nrag-on-azure/\n├── .github/workflows/ci.yml      # 10-job pipeline including eval-gate + publish-benchmarks\n├── infra/                        # Bicep (main + 6 modules: search, openai, containerapp, keyvault, monitor)\n├── app/                          # FastAPI + LangGraph; production code path\n│   ├── src/rag_on_azure/         # api/, nodes/, clients/, metrics, settings, auth, key_vault\n│   └── tests/                    # unit + integration; 168 tests pass\n├── ingest/                       # corpus pipeline (fetch + chunk + index); idempotent content-hash sweep\n│   ├── src/ingest/               # CLI + 4 modules\n│   └── corpus_manifest.yaml      # 9 sources: FCA Policy Statements / Consultations / Finalised Guidance + HMRC guidance\n├── eval/                         # golden.jsonl (36 rows) + .eval-gate.yml (thresholds) + snapshot_corpus.py\n├── scripts/                      # bootstrap-oidc.sh, mint-token.py, seed-corpus.sh\n└── docs/\n    ├── architecture.md           # onboarding-grade reference (Mermaid + components)\n    ├── deployment.md             # day-1 runbook (12 steps)\n    ├── security.md               # threat model + secret inventory + per-route auth posture\n    ├── design/rag-on-azure.md    # full design spec, single source of truth\n    └── assets/                   # screenshots\n```\n\n## Documentation\n\n- [`docs/architecture.md`](docs/architecture.md) — request-flow diagram, component boundaries, audit-grade invariants\n- [`docs/deployment.md`](docs/deployment.md) — clean-checkout to first green CI in twelve steps\n- [`docs/security.md`](docs/security.md) — threat model, secret inventory, per-route auth posture, hardening upgrades\n- [`docs/design/rag-on-azure.md`](docs/design/rag-on-azure.md) — full design spec (canonical source of truth, ~600 lines)\n- [`AGENTS.md`](AGENTS.md) — operational quirks (the things that cost 10+ minutes the first time) plus working notes for AI agents (Claude Code) that contribute to this repo\n\n## API surface\n\n- `POST /query` — admin-or-tenant JWT, the only route that touches the LangGraph\n- `GET /healthz` — auth-free liveness probe\n- `GET /readyz` — auth-free readiness probe; pings each runtime client\n- `GET /metrics` — auth-free Prometheus exposition (counters + LLM-tuned histograms + standard process collectors)\n- `POST /ingest` — admin-only (`tenant_admin` JWT claim); schedules the corpus pipeline as a background task and returns 202 + `run_id`\n\nFull route specifications including auth posture and metric definitions in [`docs/design/rag-on-azure.md`](docs/design/rag-on-azure.md) §3.4.\n\n## Benchmark publication\n\nEvery CI run on `main` whose `eval-gate` passes pushes the resulting summary and per-query benchmark JSONs to the [`llm-benchmarks`](https://github.com/berkayildi/llm-benchmarks) repo: latest pointers under `retrieval/azure-{summary,benchmark}.json` for current-state views, plus a timestamped pair under `retrieval/history/` for drift charts. Mechanism is a GitHub App install token (`actions/create-github-app-token@v1`); the job is best-effort (`continue-on-error: true`) and gated on `vars.LLMSHOT_PUSH_ENABLED == 'true'` so forks unconnected to the llmshot ecosystem skip it silently. Full details in [`docs/design/rag-on-azure.md`](docs/design/rag-on-azure.md) §13.\n\n## Roadmap\n\nItems deferred from v1 and tracked for a future v0.x release:\n\n- **Azure Pipelines mirror.** The original v1 spec called for an `azure-pipelines.yml` mirror of the GitHub Actions pipeline (target audience: Azure DevOps shops). Deferred because GitHub Actions is now the canonical CI (10 jobs, OIDC, eval-gate, cross-repo App-token publish) and a partial mirror would be worse than none. Full rationale in [`docs/design/rag-on-azure.md`](docs/design/rag-on-azure.md) §6.2.\n- **Two-app-registration split for CI federated identity.** Today, one AAD app holds both `:ref:refs/heads/main` and `:pull_request` federated credentials. Branch protection is the load-bearing control. Production posture splits into a PR-scoped Reader app and a main-scoped Owner app. See [`docs/security.md`](docs/security.md).\n- **Multi-chunk goldens + `avg_precision_at_k` re-add.** The current 36 golden rows each have exactly one `relevant_chunk_ids` entry, which mathematically caps `avg_precision_at_5` at 1/5 — uninformative. The metric was removed during calibration and lands back when the dataset grows multi-chunk relevance.\n- **Multi-tenant scaling.** `queries_total` is labelled by `tenant_id`. Cardinality grows linearly with tenant count; demo has one. Documented to revisit at \u003e100 tenants.\n- **`GET /ingest/{run_id}`** status endpoint for the admin pipeline. Run IDs flow into structured logs today; a polling endpoint adds operational ergonomics for long-running ingests.\n- **Container Apps Job for `/ingest`.** Scheduling the corpus pipeline as a FastAPI `BackgroundTasks` callback works for the demo but is susceptible to scale-to-zero kill mid-run (idempotent retry recovers, but a Job is the prod-grade move).\n\n## Licence\n\nReleased under the [MIT Licence](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fberkayildi%2Frag-on-azure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fberkayildi%2Frag-on-azure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fberkayildi%2Frag-on-azure/lists"}