{"id":43341089,"url":"https://github.com/yafitzdev/fitz-gov","last_synced_at":"2026-03-01T06:01:37.527Z","repository":{"id":335921153,"uuid":"1147496910","full_name":"yafitzdev/fitz-gov","owner":"yafitzdev","description":"Comprehensive RAG Governance Benchmark","archived":false,"fork":false,"pushed_at":"2026-02-09T00:40:11.000Z","size":23330,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-09T06:38:32.822Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yafitzdev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-01T20:46:16.000Z","updated_at":"2026-02-09T00:40:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/yafitzdev/fitz-gov","commit_stats":null,"previous_names":["yafitzdev/fitz-gov"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/yafitzdev/fitz-gov","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafitzdev%2Ffitz-gov","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafitzdev%2Ffitz-gov/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafitzdev%2Ffitz-gov/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafitzdev%2Ffitz-gov/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yafitzdev","download_url":"https://codeload.github.com/yafitzdev/fitz-gov/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yafitzdev%2Ffitz-gov/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29961856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T05:59:08.471Z","status":"ssl_error","status_checked_at":"2026-03-01T05:58:04.208Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-02T01:08:12.718Z","updated_at":"2026-03-01T06:01:37.428Z","avatar_url":"https://github.com/yafitzdev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003cdiv align=\"center\"\u003e\n\n# fitz-gov\n\n### A benchmark for measuring whether RAG systems know when to answer, when to push back, and when to shut up.\n\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![PyPI version](https://badge.fury.io/py/fitz-gov.svg)](https://pypi.org/project/fitz-gov/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Version](https://img.shields.io/badge/version-5.0.0-green.svg)](CHANGELOG.md)\n\n[The Problem](#the-problem) • [Three Modes](#the-three-modes-) • [What Makes This Hard](#what-makes-this-hard-) • [Quick Start](#-quick-start) • [GitHub](https://github.com/yafitzdev/fitz-gov)\n\n\u003c/div\u003e\n\n\u003cbr /\u003e\n\n---\n\n```python\nfrom fitz_gov import FitzGovEvaluator, load_tier, Tier\n\ntier0 = load_tier(Tier.SANITY)   # 60 sanity cases\ntier1 = load_tier(Tier.CORE)     # 2,920 real cases\n\nevaluator = FitzGovEvaluator()\nresult = evaluator.evaluate_tiered(tier0_cases, ..., tier1_cases, ...)\nprint(result)\n```\n\n2,980 test cases. One score that tells you if your RAG system knows what it doesn't know.\n\n---\n\n### About 🧑‍🌾\n\nSolo project by Yan Fitzner ([LinkedIn](https://www.linkedin.com/in/yan-fitzner/), [GitHub](https://github.com/yafitzdev)).\n\n- ~4k lines of Python, 2,980 benchmark cases\n- 107 tests\n- Built for [fitz-ai](https://github.com/yafitzdev/fitz-ai) — used to train and validate its governance classifier (81.3% accuracy on 2,910 hard cases)\n\n---\n\n### The Problem\n\nEvery RAG benchmark today measures the same thing: *did the system get the right answer?* BEIR measures retrieval. RAGAS measures generation quality. But none of them measure the thing that actually matters in production: **does the system know what it doesn't know?**\n\nAsk a typical RAG system \"What was Acme Corp's revenue last quarter?\" and give it context that only mentions Acme Corp's founding date. Most systems will confidently hallucinate a revenue figure. A well-governed system would say \"the provided context doesn't contain revenue information.\"\n\nGive it two analyst reports that directly contradict each other — one says the market is growing 12%, the other says it's shrinking 3%. Most systems will pick one and present it as fact. A well-governed system would flag the contradiction.\n\nThis is the **governance problem**: RAG systems need to make a meta-decision about their own evidence *before* they generate an answer. Should I answer confidently? Hedge? Flag conflicting sources? Refuse entirely?\n\nfitz-gov measures that meta-decision.\n\n---\n\n### The Three Modes 🔀\n\nEvery query + context pair maps to one of three governance modes:\n\n**TRUSTWORTHY** ✅ — Sufficient evidence. Answer confidently or with appropriate hedging.\n\u003e *\"What is the boiling point of water at sea level?\"*\n\u003e *Context: \"At standard atmospheric pressure, pure water boils at 100°C.\"*\n\u003e → Answer directly.\n\n**DISPUTED** ⚠️ — Conflicting information. Surface the contradiction, don't pick a side.\n\u003e *\"Is remote work more productive?\"*\n\u003e *Context A: \"Stanford found remote workers were 13% more productive.\"*\n\u003e *Context B: \"Microsoft found remote work decreased collaboration by 25%.\"*\n\u003e → Present both sides.\n\n**ABSTAIN** 🚫 — Irrelevant, insufficient, or wrong entity/time period. Refuse to answer.\n\u003e *\"What are the side effects of ibuprofen?\"*\n\u003e *Context: \"Python was created by Guido van Rossum in 1991.\"*\n\u003e → \"I don't have relevant information to answer this.\"\n\n---\n\n### What Makes This Hard 🧩\n\nThe easy cases are obvious. Nobody confuses a biology passage with a finance question. fitz-gov includes those as a sanity check, but the real benchmark lives in the hard cases:\n\n**Near-miss abstention 🎯**\n\u003e The context discusses the right *topic* but the wrong *entity*, wrong *time period*, or wrong *jurisdiction*. \"What are Tesla's Q4 earnings?\" with context about Ford's Q4 earnings.\n\n**Implicit contradiction 🔇**\n\u003e Sources don't directly say opposite things, but their claims are logically incompatible. One says a company \"exceeded all growth targets\" while another says it \"failed to meet analyst expectations.\"\n\n**Hedged vs. confident 🤔**\n\u003e The context contains a correlation study. The query asks about causation. The system should answer (TRUSTWORTHY) but hedge — not abstain, and not state correlation as proven causation.\n\n**Methodology conflicts vs. genuine disputes 📊**\n\u003e Two studies report different numbers for the same thing. Is it because they used different methodologies (TRUSTWORTHY with caveats) or because they genuinely disagree (DISPUTED)?\n\nThese boundary cases are where production RAG systems actually fail, and where fitz-gov separates a good governance classifier from a great one.\n\n\u003e [!NOTE]\n\u003e 62.7% of tier1 cases are rated \"hard.\" This is deliberate — the easy cases exist as a sanity gate, not the benchmark.\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 What is RAG governance?\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nMost RAG systems have two jobs: (1) find relevant documents, (2) generate an answer. But there's a critical third job they skip: **decide whether you should answer at all.**\n\nA governance classifier sits between retrieval and generation. It looks at the query and the retrieved context and makes a meta-decision:\n\n```\nQuery + Retrieved Context\n        │\n        ▼\n┌─────────────────┐\n│   Governance    │──► TRUSTWORTHY → generate answer\n│   Classifier    │──► DISPUTED    → flag contradictions\n│                 │──► ABSTAIN     → refuse to answer\n└─────────────────┘\n```\n\nWithout governance, your RAG system will confidently answer \"The company's Q4 revenue was $2.3 billion\" when the context only mentions Q1-Q3 data. With governance, it says \"I don't have Q4 revenue figures.\"\n\nfitz-gov provides the test cases to measure how well your governance classifier makes these decisions.\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Quick Start\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```bash\npip install fitz-gov\n```\n\n#### Tiered Evaluation\n\nfitz-gov uses a two-tier system. Tier 0 is a 60-case sanity check (95% pass threshold) that gates Tier 1.\n\n```python\nfrom fitz_gov import FitzGovEvaluator, load_tier, Tier, AnswerMode\n\ntier0_cases = load_tier(Tier.SANITY)  # 60 cases\ntier1_cases = load_tier(Tier.CORE)    # 2,920 cases\n\n# Your RAG system classifies each case\ntier0_responses, tier0_modes = your_system.evaluate(tier0_cases)\ntier1_responses, tier1_modes = your_system.evaluate(tier1_cases)\n\nevaluator = FitzGovEvaluator()\nresult = evaluator.evaluate_tiered(\n    tier0_cases, tier0_responses, tier0_modes,\n    tier1_cases, tier1_responses, tier1_modes,\n)\n\nprint(result)\n# TIER 0 (Sanity Check): PASSED  |  95% threshold, achieved 98.3%\n# TIER 1 (Core Benchmark): 69.1%\n#   abstention:         84.8% (581/685)\n#   dispute:            66.8% (451/675)\n#   trustworthy_hedged: 71.2% (826/1160)  |  grounding: 89.3%, relevance: 85.1%\n#   trustworthy_direct: 78.5% (314/400)   |  grounding: 92.1%, relevance: 88.7%\n```\n\n#### Standalone Usage\n\nAny RAG system can be evaluated — fitz-gov is framework-agnostic:\n\n```python\nfrom fitz_gov import FitzGovEvaluator, load_cases, AnswerMode\n\ncases = load_cases()  # 2,980 cases\nevaluator = FitzGovEvaluator()\n\nresponses, modes = [], []\nfor case in cases:\n    response = your_system.query(case.query, case.contexts)\n    mode = your_system.classify_mode(response)\n    responses.append(response)\n    modes.append(mode)\n\nresults = evaluator.evaluate_all(cases, responses, modes)\nprint(f\"Governance accuracy: {results.overall_accuracy:.1%}\")\n```\n\n#### Two-Pass Validation\n\nGrounding and relevance checks use regex + optional LLM validation:\n\n```python\nevaluator = FitzGovEvaluator(\n    llm_validation=True,\n    llm_model=\"qwen2.5:14b\",\n    llm_base_url=\"http://localhost:11434\"\n)\n```\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Interpreting Your Score\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nA fitz-gov score is governance mode accuracy across 2,920 test cases.\n\n| Score | Meaning |\n|-------|---------|\n| **90%+** | Exceptional. Almost always makes the right meta-decision. |\n| **75-90%** | Strong. Handles most cases, occasional misjudgments on boundaries. |\n| **60-75%** | Moderate. Gets obvious cases right, struggles with subtlety. |\n| **\u003c 60%** | Frequently making the wrong meta-decision. |\n\nThe score breaks down by category — so you can see exactly *where* your system fails. 90% on abstention but 55% on disputes? It knows when to shut up but doesn't catch contradictions. 40% on trustworthy_direct? It's being overly cautious, refusing to answer even when evidence is clear.\n\n**Four test categories:**\n\n| Category | Cases | Mode | What it catches |\n|----------|------:|------|-----------------|\n| **Abstention** | 685 | ABSTAIN | System answers when it has no relevant evidence |\n| **Dispute** | 675 | DISPUTED | System ignores contradictions between sources |\n| **Trustworthy Hedged** | 1,160 | TRUSTWORTHY | System over-hedges (abstains) or under-hedges (states uncertain things as fact) |\n| **Trustworthy Direct** | 400 | TRUSTWORTHY | System refuses or hedges when evidence clearly supports a confident answer |\n\nTrustworthy cases are evaluated on three dimensions: governance mode, grounding (no hallucinated details via `forbidden_claims`), and relevance (actually addresses the question via `required_elements`). A case only passes if all three checks succeed.\n\n→ [Evaluation Guide](docs/evaluation-guide.md) for deeper analysis\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Evaluation Flow\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```mermaid\nflowchart TD\n    A[Run Benchmark] --\u003e B[Tier 0: Sanity Check\\n60 cases]\n    B --\u003e|\"≥ 95%\"| C[PASS → Tier 1: Core Benchmark\\n2,920 cases]\n    B --\u003e|\"\u003c 95%\"| D[FAIL → Fix fundamentals first]\n    C --\u003e E{Classify governance mode\\nper case}\n    E --\u003e F[Abstention — 685 cases\\nMode check only]\n    E --\u003e G[Dispute — 675 cases\\nMode check only]\n    E --\u003e H[Trustworthy Hedged — 1,160 cases]\n    E --\u003e I[Trustworthy Direct — 400 cases]\n    H --\u003e J{Mode correct?}\n    I --\u003e J\n    J --\u003e|No| K[Fail case]\n    J --\u003e|Yes| L[Grounding check\\nforbidden_claims]\n    L --\u003e M[Relevance check\\nrequired_elements]\n    M --\u003e N[Pass only if all 3 checks succeed]\n```\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Benchmark Stats\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n**2,980 total cases** (60 tier0 sanity + 2,920 tier1 core) across **113+ subcategories**, **17 domains**, and **10 query types**.\n\n- **Mode split:** TRUSTWORTHY 53.4% / ABSTAIN 23.5% / DISPUTED 23.1%\n- **Difficulty:** 62.7% hard / 37.3% medium (tier1), easy (tier0 only)\n- **Multi-source:** 264 cases (9.0%) with source metadata\n- **Domains:** Technology, Medicine, Finance, Science, Education, Environment, Food, Law, Government, Transportation, Sports, Agriculture, History, HR/Workplace, Real Estate, Psychology, Social Media\n- **Query types:** what, how, is, does, why, should, when, which, who, compare\n- **Reasoning types:** Factual, Evaluative, Causal, Comparative, Temporal, Procedural\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Data Format\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```\ndata/\n├── tier0_sanity/               # 60 easy cases (95% gate)\n├── tier1_core/                 # 2,920 medium/hard cases\n│   ├── abstention.json         # 685 cases\n│   ├── dispute.json            # 675 cases\n│   ├── trustworthy_hedged.json # 1,160 cases\n│   └── trustworthy_direct.json # 400 cases\n├── corpus/                     # 5,043 reference documents\n├── queries/                    # 3,800 query-to-document mappings\n└── validation/                 # 250-case human validation sample\n```\n\nEach case:\n\n```json\n{\n  \"id\": \"t1_abstain_medium_001\",\n  \"query\": \"What is the company's revenue for 2024?\",\n  \"contexts\": [\"The company was founded in 2010...\"],\n  \"expected_mode\": \"abstain\",\n  \"category\": \"abstention\",\n  \"subcategory\": \"wrong_entity\",\n  \"difficulty\": \"medium\",\n  \"domain\": \"finance\",\n  \"query_type\": \"what\",\n  \"source_type\": \"single\",\n  \"context_count\": 1,\n  \"reasoning_type\": \"factual\",\n  \"evidence_pattern\": \"absent\",\n  \"forbidden_claims\": [\"\\\\$\\\\d+\\\\s*billion\"],\n  \"required_elements\": [\"revenue\", \"not provided\"]\n}\n```\n\nEvery case has 6 classification attributes for slicing results. Trustworthy cases additionally have `forbidden_claims` (grounding) and `required_elements` (relevance) for quality scoring.\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Full Distribution Tables\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n#### Categories (Tier 1)\n\n| Category | Cases | Medium | Hard | Med % |\n|----------|------:|-------:|-----:|------:|\n| Abstention | 685 | 255 | 430 | 37% |\n| Dispute | 675 | 261 | 414 | 39% |\n| Trustworthy Hedged | 1,160 | 428 | 732 | 37% |\n| Trustworthy Direct | 400 | 145 | 255 | 36% |\n\n#### Domain Distribution\n\n| Domain | Cases | % | Domain | Cases | % |\n|--------|------:|--:|--------|------:|--:|\n| Technology | 412 | 14.1% | Transportation | 131 | 4.5% |\n| Medicine | 309 | 10.6% | Sports | 127 | 4.3% |\n| Finance | 296 | 10.1% | Agriculture | 126 | 4.3% |\n| Science | 192 | 6.6% | History | 122 | 4.2% |\n| Government | 155 | 5.3% | HR/Workplace | 121 | 4.1% |\n| Education | 152 | 5.2% | Real Estate | 119 | 4.1% |\n| Environment | 147 | 5.0% | Psychology | 119 | 4.1% |\n| Food | 143 | 4.9% | Social Media | 113 | 3.9% |\n| Law | 136 | 4.7% | | | |\n\n#### Query Type Distribution\n\n| Type | Cases | % | Type | Cases | % |\n|------|------:|--:|------|------:|--:|\n| what | 821 | 28.1% | should | 135 | 4.6% |\n| how | 694 | 23.8% | when | 121 | 4.1% |\n| is | 437 | 15.0% | which | 97 | 3.3% |\n| does | 284 | 9.7% | who | 77 | 2.6% |\n| why | 213 | 7.3% | compare | 41 | 1.4% |\n\n#### Reasoning Type Distribution\n\n| Reasoning Type | Cases | % |\n|----------------|------:|--:|\n| Factual | 1,588 | 54.4% |\n| Evaluative | 596 | 20.4% |\n| Causal | 239 | 8.2% |\n| Comparative | 187 | 6.4% |\n| Temporal | 178 | 6.1% |\n| Procedural | 132 | 4.5% |\n\n#### Evidence Pattern Distribution\n\n| Evidence Pattern | Cases | % |\n|------------------|------:|--:|\n| Direct | 1,039 | 35.6% |\n| Absent | 637 | 21.8% |\n| Conflicting | 587 | 20.1% |\n| Partial | 428 | 14.7% |\n| Indirect | 195 | 6.7% |\n| Mixed | 34 | 1.2% |\n\n#### Subcategories\n\n**Abstention** (23): wrong_entity, wrong_specificity, temporal_mismatch, missing_data, off_topic_contradiction, wrong_domain, wrong_jurisdiction, outdated_context, wrong_product, cross_domain_insufficient, decoy_keywords, converted_insufficient, converted_off_domain, wrong_version, implicit_only, wrong_granularity, converted_wrong_entity, multi_source_gap, cross_source_irrelevant, code_abstention, topic_adjacent, format_impossible, converted_wrong_scope\n\n**Dispute** (19): numerical_conflict, implicit_contradiction, binary_conflict, opposing_conclusions, temporal_conflict, statistical_direction_conflict, source_authority_conflict, methodology_conflict, interpretation_conflict, competing_theories, scientific_replication, cross_source_contradiction, converted_contradiction, conditional_conflict, converted_consensus_removed, converted_framing_conflict, temporal_source_conflict, contradictory_attribution, converted_version_conflict\n\n**Trustworthy Hedged** (57): evidence_quality, hedged_evidence, different_aspects, causal_uncertainty, mixed_evidence, temporal_uncertainty, version_overlap, methodology_difference, stale_source, evolving_facts, entity_ambiguity, partial_answer, scope_condition, numerical_near_miss, cross_source_partial, implicit_assumptions, adjacent_entity, cross_domain_transfer, hedged_contradiction_corroborated, different_framing, grounding_numerical_hallucination, grounding_attribution_hallucination, grounding_temporal_confusion, grounding_entity_blending, grounding_process_hallucination, grounding_quote_fabrication, grounding_statistical_inference, grounding_code_hallucination, grounding_table_inference, grounding_causal_hallucination, grounding_comparative_hallucination, grounding_geographic_hallucination, grounding_technical_hallucination, grounding_date_hallucination, grounding_location_hallucination, grounding_code_grounding, grounding_medical_hallucination, grounding_quote_extension, relevance_partial_answer, relevance_wrong_entity_focus, relevance_temporal_mismatch, relevance_tangent_drift, relevance_related_but_different, relevance_over_answering, relevance_granularity_mismatch, relevance_prerequisite_missing, relevance_scope_mismatch, relevance_format_mismatch, relevance_summarization_vs_answer, relevance_cherry_picking, relevance_false_precision, relevance_assumption_injection, relevance_symptom_only, relevance_status_dump, relevance_feature_dump, relevance_instruction_only, relevance_metric_avoidance\n\n**Trustworthy Direct** (14): technical_documented, clear_explanation, contradiction_resolved, opposing_with_consensus, different_framing, quantitative_answer, cross_source_agreement, direct_factual, multi_source_convergence, authoritative_source, near_complete_evidence, conditional_confidence, step_by_step, definitional\n\n\u003c/details\u003e\n\n---\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cstrong\u003e📦 Contributing\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n1. Fork this repo\n2. Add cases to the appropriate `data/tier0_sanity/` or `data/tier1_core/` JSON file\n3. Run validation: `python -m fitz_gov.cli validate --data-dir data`\n4. Submit a PR\n\n→ [Mode Decision Tree](docs/mode-decision-tree.md) — how expected modes are assigned\n\n\u003c/details\u003e\n\n---\n\n### License\n\nMIT\n\n---\n\n### Links\n\n- [GitHub](https://github.com/yafitzdev/fitz-gov)\n- [PyPI](https://pypi.org/project/fitz-gov/)\n- [Changelog](CHANGELOG.md)\n\n**Documentation:**\n- [Evaluation Guide](docs/evaluation-guide.md) — How to interpret scores and diagnose failures\n- [Mode Decision Tree](docs/mode-decision-tree.md) — How expected modes are assigned to test cases\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyafitzdev%2Ffitz-gov","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyafitzdev%2Ffitz-gov","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyafitzdev%2Ffitz-gov/lists"}