{"id":47685636,"url":"https://github.com/chernistry/shafi","last_synced_at":"2026-04-02T14:48:53.420Z","repository":{"id":347561870,"uuid":"1167312279","full_name":"chernistry/shafi","owner":"chernistry","description":"Evidence-first legal QA system — 300+ DIFC documents, 16 retrieval experiments, 87.5% rejected after ablation. LangGraph + Qdrant + FastAPI. Agentic RAG Challenge 2026.","archived":false,"fork":false,"pushed_at":"2026-03-28T13:17:33.000Z","size":10708,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-28T15:55:14.271Z","etag":null,"topics":["agentic-rag","ai-agents","competition","document-processing","evaluation","fastapi","langchain","langgraph","legal-ai","legal-tech","llm","nlp","openai","python","qdrant","question-answering","rag","retrieval-augmented-generation","vector-search"],"latest_commit_sha":null,"homepage":"https://chernistry.github.io/shafi/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chernistry.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-26T06:48:22.000Z","updated_at":"2026-03-28T13:17:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chernistry/shafi","commit_stats":null,"previous_names":["chernistry/shafi"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/chernistry/shafi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chernistry%2Fshafi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chernistry%2Fshafi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chernistry%2Fshafi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chernistry%2Fshafi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chernistry","download_url":"https://codeload.github.com/chernistry/shafi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chernistry%2Fshafi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31308446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-rag","ai-agents","competition","document-processing","evaluation","fastapi","langchain","langgraph","legal-ai","legal-tech","llm","nlp","openai","python","qdrant","question-answering","rag","retrieval-augmented-generation","vector-search"],"created_at":"2026-04-02T14:48:51.730Z","updated_at":"2026-04-02T14:48:53.407Z","avatar_url":"https://github.com/chernistry.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Agentic RAG Legal Challenge 2026\n\n### Evidence-first legal question answering over DIFC regulations, case law, and contracts.\n\nBuilt for the [Agentic RAG Legal Challenge](https://machinescansee.com) at Dubai AI Week \u0026 Machines Can See 2026.\n\n[![Python 3.12+](https://img.shields.io/badge/Python-3.12+-3776AB?logo=python\u0026logoColor=white)](https://python.org)\n[![FastAPI](https://img.shields.io/badge/FastAPI-009688?logo=fastapi\u0026logoColor=white)](https://fastapi.tiangolo.com)\n[![LangGraph](https://img.shields.io/badge/LangGraph-1C3C3C?logo=langchain\u0026logoColor=white)](https://langchain-ai.github.io/langgraph/)\n[![Qdrant](https://img.shields.io/badge/Qdrant-DC244C?logo=qdrant\u0026logoColor=white)](https://qdrant.tech)\n[![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)\n[![CI](https://github.com/chernistry/shafi/actions/workflows/ci.yml/badge.svg)](https://github.com/chernistry/shafi/actions/workflows/ci.yml)\n[![Commits](https://img.shields.io/badge/commits-1598-2ea44f)]()\n\n\u003c/div\u003e\n\n---\n\nA staged LangGraph pipeline that answers 900 legal questions over 300 DIFC documents with page-level retrieval, anchor-based filtering, and faithfulness guardrails. Developed over a 12-day competition sprint (March 10--22, 2026) with 1,598 commits, culminating in a multi-agent coordination phase where six AI agents worked in parallel on the final push.\n\n\u003e [!TIP]\n\u003e For the full write-up on what worked, what didn't, and the infrastructure-over-retrieval finding, see the [blog article](https://alexchernysh.com/blog/legal-answering-systems).\n\n## Results\n\n| Metric | Value |\n|:-------|:------|\n| Questions answered | 900 |\n| Documents processed | 300 |\n| Total pages indexed | 10,109 (+209% via registry enrichment) |\n| Null answers | 3 |\n| No-page-grounding answers | 3 |\n| Corrections applied | 103+ (56 DOI dates, 11 comparisons, 32 booleans, 16 numbers) |\n| Eval versions completed | V9.1 through V17 |\n| Total commits | 1,598 |\n\n## Key Findings\n\n**Infrastructure beats retrieval.** The single biggest insight from the competition: investing in evaluation infrastructure, submission tooling, and automated correction pipelines produced more score improvement than any retrieval technique change.\n\n**87.5% ablation rejection rate.** Seven of eight retrieval enhancements tested during the competition were rejected after ablation testing. BM25-only retrieval, RAG Fusion, HyDE, FlashRank, Isaacus EQA, step-back prompting, and citation-first retrieval all failed to improve over the baseline. Only the anchor-based page filtering survived.\n\n**TTFT was the biggest remaining lever.** Late in the competition, time-to-first-token analysis revealed a +23.5% potential swing -- larger than any retrieval optimization still on the table.\n\n**DB short-circuit for metadata questions.** 167 of 900 questions were answerable from corpus metadata alone (document titles, dates, registry entries), resolved in \u003c50ms without an LLM call.\n\n## Architecture\n\nEvery query flows through a staged LangGraph pipeline with page-first retrieval and anchor-based filtering:\n\n```mermaid\ngraph LR\n    Q([Query]) --\u003e Classify\n    Classify --\u003e DB[\"DB Answerer\u003cbr/\u003e(short-circuit)\"]\n    Classify --\u003e PageRetrieval[\"Page-Level Retrieval\"]\n    PageRetrieval --\u003e AnchorFilter[\"Anchor Filter\"]\n    AnchorFilter --\u003e Rerank\n    Rerank --\u003e Generate\n    Generate --\u003e Verify\n    Verify --\u003e Emit\n    Emit --\u003e Finalize\n    DB --\u003e Finalize\n    Finalize --\u003e A([Answer + Telemetry])\n```\n\n```mermaid\ngraph TD\n    subgraph Retrieval\n        Hybrid[\"Hybrid Search\u003cbr/\u003eDense + BM25\"]\n        Kanon[\"Kanon 2 Embedder\u003cbr/\u003e1792 dims\"]\n        Qdrant[\"Qdrant\u003cbr/\u003e10,109 points\"]\n        Hybrid --\u003e Kanon\n        Kanon --\u003e Qdrant\n    end\n\n    subgraph Filtering\n        Anchor[\"Anchor Filter\u003cbr/\u003eLegal refs, entities,\u003cbr/\u003ecross-references\"]\n        Rerank[\"Zerank 2\u003cbr/\u003eCohere fallback\"]\n        Anchor --\u003e Rerank\n    end\n\n    subgraph Generation\n        Router[\"Model Router\"]\n        Mini[\"gpt-4.1-mini\u003cbr/\u003eStrict types\"]\n        Full[\"gpt-4.1\u003cbr/\u003eFree text\"]\n        Router --\u003e Mini\n        Router --\u003e Full\n    end\n\n    subgraph Verification\n        Premise[\"Premise Guard\"]\n        Conflict[\"Conflict Detection\"]\n        Citation[\"Citation Verification\"]\n        PostProc[\"Post-processing\"]\n    end\n\n    Qdrant --\u003e Anchor\n    Rerank --\u003e Router\n    Full --\u003e Premise\n    Mini --\u003e Premise\n    Premise --\u003e Conflict --\u003e Citation --\u003e PostProc\n```\n\n### Design Choices\n\n- **DB answerer short-circuit**: 167 metadata-answerable questions resolved via corpus registry lookup in \u003c50ms\n- **Page-first retrieval**: Hybrid search (dense + BM25) at page level, then anchor-based filtering for precision\n- **Legal-domain embeddings**: Kanon 2 Embedder for DIFC-heavy terminology and statute references\n- **Anchor filtering**: Extracts legal references, entity names, and cross-references to localize relevant pages\n- **Reranking**: Zerank 2 with Cohere Rerank fallback for final page ordering\n- **Model routing**: `gpt-4.1-mini` for strict answer types, `gpt-4.1` for complex free-text reasoning\n- **Faithfulness guardrails**: Premise guard, conflict detection, citation verification, and final post-processing bound to `answer_final`\n- **Telemetry bound to the final answer**: `used_page_ids`, `cited_page_ids`, per-stage timings, and model metadata in every response\n\nPrivate phase corpus: 300 documents, 900 questions, 10,307 Qdrant points at 1,792 dims (Kanon-2).\n\n---\n\n## Quick Start\n\n```bash\n# 1. Clone and configure\ngit clone https://github.com/chernistry/shafi \u0026\u0026 cd shafi\ncp .env.example .env\n# Put machine-specific secrets in .env.local\n\n# 2. Start the local stack (API + Qdrant)\ndocker compose up --build -d\n\n# 3. Ingest the DIFC corpus\ndocker compose --profile tools run --rm ingest\n\n# 4. Ask a question\ncurl -X POST http://localhost:8000/query \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"question\": \"Which laws are administered by the Registrar?\"}'\n```\n\nThe default `docker compose` setup brings up:\n\n- `qdrant` locally, with health checks\n- `api` with warmup enabled by default\n- `ingest` and `eval` as tool-profile services inside the same Docker network\n\nNo extra `QDRANT_URL` juggling is required for the default local workflow.\n\n### Environment Contract\n\n- Host-local `uv run ...` and other shell commands use `QDRANT_URL=http://localhost:6333`\n- Docker Compose service containers always use `QDRANT_URL=http://qdrant:6333` internally\n- Local config precedence: process env \u003e `.env.local` \u003e `.env` \u003e code defaults\n- Keep shared defaults in `.env`; keep workstation-specific overrides and secrets in `.env.local`\n\n---\n\n## Stack\n\n| Layer | Choice |\n|:------|:-------|\n| API | FastAPI + SSE |\n| Orchestration | LangGraph |\n| Embeddings | Kanon 2 Embedder (1,792 dims) |\n| Vector DB | Qdrant hybrid search |\n| Reranker | Zerank 2, Cohere fallback |\n| Complex LLM | `gpt-4.1` |\n| Strict/simple LLM | `gpt-4.1-mini` |\n| Validation | Pydantic v2 + Pyright strict |\n| Tooling | `uv`, `ruff`, `pytest`, Docker Compose |\n\n---\n\n## Evaluation\n\nRun the harness locally against the Dockerized API:\n\n```bash\ndocker compose --profile tools run --rm eval \\\n  python -m shafi.eval.harness \\\n  --golden dataset/public_dataset.json \\\n  --endpoint http://api:8000/query \\\n  --concurrency 4 \\\n  --emit-cases \\\n  --judge \\\n  --judge-scope free_text \\\n  --judge-docs-dir dataset/dataset_documents \\\n  --judge-out data/judge_run.jsonl \\\n  --out data/eval_run.json\n```\n\nTracked checks:\n\n- Full public-set correctness\n- `free_text` judge pass rate, accuracy, grounding, clarity\n- Citation coverage\n- Answer-type format compliance\n- Document-retrieval diagnostics\n- TTFT distribution\n- Per-stage latency (`classify`, `embed`, `qdrant`, `rerank`, `llm`, `verify`)\n- Platform submission projection and archive compliance\n\n---\n\n## Platform Submission\n\nThe repository has a separate **platform-native** submission path for warm-up and final runs.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSubmission workflow\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\n# 1. Start local infrastructure\ndocker compose up --build -d qdrant\n\n# 2. Preflight: build and audit the curated code archive\ndocker compose --profile tools run --rm eval \\\n  python -m shafi.submission.platform --archive-only\n\n# 3. Build a warm-up submission package\ndocker compose --profile tools run --rm eval \\\n  python -m shafi.submission.platform\n\n# 4. Inspect generated artifacts\n#   - platform_runs/\u003cphase\u003e/submission.json\n#   - platform_runs/\u003cphase\u003e/preflight_summary.json\n#   - platform_runs/\u003cphase\u003e/code_archive.zip\n#   - platform_runs/\u003cphase\u003e/code_archive_audit.json\n\n# 5. Submit the inspected artifact and poll status\ndocker compose --profile tools run --rm eval \\\n  python -m shafi.submission.platform \\\n    --submit-existing \\\n    --submission-path platform_runs/warmup/submission.json \\\n    --code-archive-path platform_runs/warmup/code_archive.zip \\\n    --poll\n```\n\nThe flow downloads phase-specific documents, ingests them into a phase-specific Qdrant collection, then downloads questions and runs the RAG pipeline. Documents are ingested before questions to comply with the \"no question-aware indexing\" rule.\n\n\u003c/details\u003e\n\n---\n\n## Development\n\n```bash\nuv sync --extra dev       # install dev dependencies\nmake lint                 # ruff check src tests scripts\nmake typecheck            # pyright strict mode\nmake test                 # pytest tests/unit\nmake all                  # lint + typecheck + test\nmake format               # auto-fix with ruff\n```\n\nScript index: [docs/scripts_index.md](docs/scripts_index.md)\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for the full development guide.\n\n---\n\n## Related\n\n- **[Building Legal Answering Systems](https://alexchernysh.com/blog/legal-answering-systems)** -- Full write-up covering the competition experience, architecture decisions, the infrastructure-over-retrieval finding, and the multi-agent sprint\n- **[Bernstein](https://github.com/chernistry/bernstein)** -- AI coding agent orchestrator born from this competition's multi-agent coordination patterns\n- **[Competition Page](https://machinescansee.com)** -- Agentic RAG Legal Challenge at Dubai AI Week \u0026 Machines Can See 2026\n\n---\n\n## License\n\n[AGPL-3.0 1.0.0](LICENSE) -- Free for non-commercial use. Commercial licensing: [alex@alexchernysh.com](mailto:alex@alexchernysh.com). See [COMMERCIAL-LICENSING.md](COMMERCIAL-LICENSING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchernistry%2Fshafi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchernistry%2Fshafi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchernistry%2Fshafi/lists"}