{"id":39987044,"url":"https://github.com/frmoretto/clarity-gate","last_synced_at":"2026-02-02T23:23:10.425Z","repository":{"id":329082603,"uuid":"1117411166","full_name":"frmoretto/clarity-gate","owner":"frmoretto","description":"Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for your documents that are going to be provided to LLMs or RAG systems. Place automatically the missing uncertainty markers to avoid confident hallucinations. HITL for non-directly verifiable claims.","archived":false,"fork":false,"pushed_at":"2026-01-19T07:18:01.000Z","size":241,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-19T16:10:55.821Z","etag":null,"topics":["ai","ai-safety","clarity-gate","claude","context-engineering","documentation","epistemic-quality","hallucination","llm","methodology","pre-ingestion","prompt-engineering","rag","stream-coding","verification"],"latest_commit_sha":null,"homepage":"https://clarity-gate.org","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frmoretto.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-16T09:23:36.000Z","updated_at":"2026-01-19T07:18:05.000Z","dependencies_parsed_at":"2026-01-19T10:01:42.721Z","dependency_job_id":null,"html_url":"https://github.com/frmoretto/clarity-gate","commit_stats":null,"previous_names":["frmoretto/clarity-gate"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/frmoretto/clarity-gate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frmoretto%2Fclarity-gate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frmoretto%2Fclarity-gate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frmoretto%2Fclarity-gate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frmoretto%2Fclarity-gate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frmoretto","download_url":"https://codeload.github.com/frmoretto/clarity-gate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frmoretto%2Fclarity-gate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28813223,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T12:25:15.069Z","status":"ssl_error","status_checked_at":"2026-01-27T12:25:05.297Z","response_time":168,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-safety","clarity-gate","claude","context-engineering","documentation","epistemic-quality","hallucination","llm","methodology","pre-ingestion","prompt-engineering","rag","stream-coding","verification"],"created_at":"2026-01-19T00:00:34.558Z","updated_at":"2026-02-02T23:23:10.415Z","avatar_url":"https://github.com/frmoretto.png","language":null,"readme":"# Clarity Gate — Prevent LLMs from Misinterpreting Facts\n\n\u003e **⚠️ LATEST:** Version 2.1 released (2026-01-27). RFC-001 applied: claim status semantics, bundled scripts. See [CHANGELOG](CHANGELOG.md).\n\n\u003e ✅ **This README passed Clarity Gate verification** (2026-01-13, adversarial mode, Claude Opus 4.5)\n\n**Open-source pre-ingestion verification for epistemic quality in RAG systems.**\n\n[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)\n\n\u003e *\"Detection finds what is; enforcement ensures what should be. In practice: find the missing uncertainty markers before they become confident hallucinations.\"*\n\n---\n\n## The Problem\n\nIf you feed a well-aligned model a document that states \"Revenue will reach $50M by Q4\" as fact (when it's actually a projection), the model will confidently report this as fact.\n\nThe model isn't hallucinating. It's faithfully representing what it was told.\n\n**The failure happened before the model saw the input.**\n\n| Document Says | Accuracy Check | Epistemic Check |\n|---------------|----------------|-----------------|\n| \"Revenue will be $50M\" (unmarked projection) | ✅ PASS | ❌ FAIL — projection stated as fact |\n| \"Our approach outperforms X\" (no evidence) | ✅ PASS | ❌ FAIL — ungrounded assertion |\n| \"Users prefer feature Y\" (no methodology) | ✅ PASS | ❌ FAIL — missing epistemic basis |\n\n**Accuracy verification asks:** \"Does this match the source?\"  \n**Epistemic verification asks:** \"Is this claim properly qualified?\"\n\nBoth matter. Accuracy verification has mature open-source tools. Epistemic verification has detection systems (UnScientify, HedgeHunter, BioScope), but at the date of 2.0 release (January 13th, 2026), I found no open-source pre-ingestion epistemic enforcement system (methodology: deep research conducted via multiple LLMs). Corrections welcome.\n\nClarity Gate is a proposal for that layer.\n\n---\n\n## What Is Clarity Gate?\n\nClarity Gate is an **open-source pre-ingestion verification system** for epistemic quality.\n\n- **Clarity** — Making explicit what's fact, what's projection, what's hypothesis\n- **Gate** — Documents don't enter the knowledge base until they pass verification\n\n### The Gap It Addresses\n\n| Component | Status |\n|-----------|--------|\n| Pre-ingestion gate pattern | ✅ Proven (Adlib, pharma QMS) |\n| Epistemic detection | ✅ Proven (UnScientify, HedgeHunter) |\n| **Pre-ingestion epistemic enforcement** | ❌ Gap (to my knowledge) |\n| **Open-source accessibility** | ❌ Gap |\n\n| Dimension | Enterprise (Adlib) | Clarity Gate |\n|-----------|-------------------|--------------|\n| **License** | Proprietary | Open source (CC BY 4.0) |\n| **Focus** | Accuracy, compliance | Epistemic quality |\n| **Target** | Fortune 500 | Founders, researchers, small teams |\n| **Cost** | Enterprise pricing | Free |\n\n---\n\n## When to Use Clarity Gate\n\nMost valuable when:\n\n- Your RAG corpus includes **drafts, docs, tickets, meeting notes**, or user-provided content\n- You care about **correctness** and want a verifiable ingestion gate\n- You need a practical **HITL loop** that scales beyond manual spot checks\n- You want **automated enforcement** of document quality before ingestion\n\n---\n\n## How Clarity Gate Differs from Knowledge Engineering Tools\n\n| Aspect | Semantica / LlamaIndex | Clarity Gate |\n|--------|------------------------|--------------|\n| **Stage** | Post-extraction | Pre-ingestion |\n| **Input** | Structured entities | Raw documents |\n| **Problem** | \"Which value is correct?\" | \"Is this claim properly qualified?\" |\n| **Output** | Resolved knowledge graph | Annotated document (CGD) |\n| **Conflict** | Multi-source disagreement | Unmarked projections/assumptions |\n\n**They're complementary:** Use Clarity Gate *before* Semantica/LlamaIndex.\n\n---\n\n## Quick Start\n\n### Option 1: Claude.ai (Web) — Skill Upload\n\n1. Download [`dist/clarity-gate.skill`](dist/clarity-gate.skill)\n2. Go to claude.ai → Settings → Features → Skills → Upload\n3. Upload the `.skill` file\n4. Ask Claude: *\"Run clarity gate on this document\"*\n\n### Option 2: Claude Desktop\n\nSame as Option 1 — Claude Desktop uses the same skill format as claude.ai.\n\n### Option 3: Claude Code\n\nClone the repo — Claude Code auto-detects skills in `.claude/skills/`:\n\n```bash\ngit clone https://github.com/frmoretto/clarity-gate\ncd clarity-gate\n# Claude Code will automatically detect .claude/skills/clarity-gate/SKILL.md\n```\n\nOr copy `.claude/skills/clarity-gate/` to your project's `.claude/skills/` directory.\n\nAsk Claude: *\"Run clarity gate on this document\"*\n\n### Option 4: Claude Projects\n\nAdd [`skills/clarity-gate/SKILL.md`](skills/clarity-gate/SKILL.md) to project knowledge. Claude will search it when needed, though Skills provide better integration.\n\n### Option 5: OpenAI Codex / GitHub Copilot\n\nCopy the canonical skill to the appropriate directory:\n\n| Platform | Location |\n|----------|----------|\n| OpenAI Codex | `.codex/skills/clarity-gate/SKILL.md` |\n| GitHub Copilot | `.github/skills/clarity-gate/SKILL.md` |\n\nUse [`skills/clarity-gate/SKILL.md`](skills/clarity-gate/SKILL.md) (agentskills.io format).\n\n### Option 6: Manual / Other LLMs\n\nUse the [9-point verification](docs/ARCHITECTURE.md#the-9-verification-points) as a manual review process.\n\nFor Cursor, Windsurf, or other AI tools, extract the 9 verification points into your `.cursorrules`. The methodology is tool-agnostic—only SKILL.md is Claude-optimized.\n\n---\n\n## Platform-Specific Skill Locations\n\n| Platform | Skill Location | Frontmatter Format |\n|----------|----------------|-------------------|\n| Claude.ai / Claude Desktop | `.claude/skills/clarity-gate/` | Minimal (`name`, `description` only) |\n| Claude Code | `.claude/skills/clarity-gate/` | Minimal |\n| OpenAI Codex | `.codex/skills/clarity-gate/` | agentskills.io (full) |\n| GitHub Copilot | `.github/skills/clarity-gate/` | agentskills.io (full) |\n| Canonical | `skills/clarity-gate/` | agentskills.io (full) |\n\nPre-built skill file: [`dist/clarity-gate.skill`](dist/clarity-gate.skill)\n\n---\n\n## Format Specification\n\nSee [CLARITY_GATE_FORMAT_SPEC.md](docs/CLARITY_GATE_FORMAT_SPEC.md) for the complete format specification (v2.0).\n\n---\n\n## Two Modes\n\n**Verify Mode (default):**\n```\n\"Run clarity gate on this document\"\n→ Issues report + Two-Round HITL verification\n```\n\n**Annotate Mode:**\n```\n\"Run clarity gate and annotate this document\"\n→ Complete document with fixes applied inline (CGD)\n```\n\nThe annotated output is a **Clarity-Gated Document (CGD)**.\n\n---\n\n## Workflow Overview\n\n```mermaid\nflowchart TD\n  A[Raw Docs\u003cbr\u003enotes, PRDs, transcripts] --\u003e B[process\u003cbr\u003eadd epistemic markers\u003cbr\u003ecompute document-sha256]\n  B --\u003e C[CGD\u003cbr\u003esafe for RAG ingestion]\n  C --\u003e|optional| D[promote\u003cbr\u003eadd tier block]\n  D --\u003e E[SOT\u003cbr\u003ecanonical + extractable]\n  C --\u003e F[generate HITL queue\u003cbr\u003eclaim IDs + locations]\n  F --\u003e G[Human review\u003cbr\u003econfirm/reject]\n  G --\u003e H[apply-hitl\u003cbr\u003etransaction + checkpoint]\n  H --\u003e C\n```\n\n---\n\n## The 9 Verification Points\n\n### Epistemic Checks (Core Focus)\n\n1. **Hypothesis vs. Fact Labeling** — Claims marked as validated or hypothetical\n2. **Uncertainty Marker Enforcement** — Forward-looking statements require qualifiers\n3. **Assumption Visibility** — Implicit assumptions made explicit\n4. **Authoritative-Looking Unvalidated Data** — Tables with percentages flagged if unvalidated\n\n### Data Quality Checks (Complementary)\n\n5. **Data Consistency** — Conflicting numbers within document\n6. **Implicit Causation** — Claims implying causation without evidence\n7. **Future State as Present** — Planned outcomes described as achieved\n\n### Verification Routing\n\n8. **Temporal Coherence** — Dates consistent with each other and with present\n9. **Externally Verifiable Claims** — Pricing, statistics, competitor claims flagged for verification\n\nSee [ARCHITECTURE.md](docs/ARCHITECTURE.md) for full details and examples.\n\n---\n\n## Two-Round HITL Verification\n\nDifferent claims need different types of verification:\n\n| Claim Type | What Human Checks | Cognitive Load |\n|------------|-------------------|----------------|\n| LLM found source, human witnessed | \"Did I interpret correctly?\" | Low (quick scan) |\n| Human's own data | \"Is this actually true?\" | High (real verification) |\n| No source found | \"Is this actually true?\" | High (real verification) |\n\n**The system separates these into two rounds:**\n\n### Round A: Derived Data Confirmation\n\nQuick scan of claims from sources found in the current session:\n\n```\n## Derived Data Confirmation\n\nThese claims came from sources found in this session:\n\n- [Specific claim from source A] (source link)\n- [Specific claim from source B] (source link)\n\nReply \"confirmed\" or flag any I misread.\n```\n\n### Round B: True HITL Verification\n\nFull verification of claims needing actual checking:\n\n```\n## HITL Verification Required\n\n| # | Claim | Why HITL Needed | Human Confirms |\n|---|-------|-----------------|----------------|\n| 1 | Benchmark scores (100%, 75%→100%) | Your experiment data | [ ] True / [ ] False |\n```\n\n**Result:** Human attention focused on claims that actually need it.\n\n---\n\n## Verification Hierarchy\n\n```mermaid\nflowchart TD\n    A[Claim Extracted] --\u003e B{Source of Truth Exists?}\n    B --\u003e|YES| C[Tier 1: Automated Verification]\n    B --\u003e|NO| D[Tier 2: HITL Two-Round Verification]\n\n    C --\u003e E[Tier 1A: Internal]\n    C --\u003e F[Tier 1B: External]\n    E --\u003e G[PASS / BLOCK]\n    F --\u003e G\n\n    D --\u003e H[Round A]\n    H --\u003e I[Round B]\n    I --\u003e J[APPROVE / REJECT]\n```\n\n### Tier 1A: Internal Consistency (Ready Now)\n\nChecks for contradictions *within* a document — no external systems required.\n\n| Check Type | Example |\n|------------|---------|\n| Figure vs. Text | Figure shows β=0.33, text claims β=0.73 |\n| Abstract vs. Body | Abstract claims \"40% improvement,\" body shows 28% |\n| Table vs. Prose | Table lists 5 features, text references 7 |\n\nSee [biology paper example](examples/biology-paper-example.md) for a real case where Clarity Gate detected a Δ=0.40 discrepancy. Try it yourself at [arxiparse.org](https://arxiparse.org).\n\n### Tier 1B: External Verification (Extension Interface)\n\nFor claims verifiable against structured sources. **Users provide connectors.**\n\n### Tier 2: Two-Round HITL (Intelligent Routing)\n\nThe system detects *which* specific claims need human review AND *what kind of review* each needs.\n\n*Example: Most claims in a document typically pass automated checks, with the remainder split between Round A (quick confirmation) and Round B (real verification). (Illustrative — actual ratios vary by document type.)*\n\n---\n\n## Where This Fits\n\n```\nLayer 4: Human Strategic Oversight\nLayer 3: AI Behavior Verification (behavioral evals, red-teaming)\nLayer 2: Input/Context Verification  \u003c-- Clarity Gate\nLayer 1: Deterministic Boundaries (rate limits, guardrails)\nLayer 0: AI Execution\n```\n\nA perfectly aligned model (Layer 3) can confidently produce unsafe outputs from unsafe context (Layer 2). Alignment doesn't inoculate against misleading information.\n\n---\n\n## Prior Art\n\nClarity Gate builds on proven patterns. See [PRIOR_ART.md](docs/PRIOR_ART.md) for the full landscape.\n\n**Enterprise Gates:** Adlib Software, Pharmaceutical QMS  \n**Epistemic Detection:** UnScientify, HedgeHunter, FactBank  \n**Fact-Checking:** FEVER, ClaimBuster  \n**Post-Retrieval:** Self-RAG, RAGAS, TruLens\n\n**The opportunity:** Existing detection tools (UnScientify, HedgeHunter, BioScope) excel at identifying uncertainty markers. Clarity Gate proposes a complementary enforcement layer that routes ambiguous claims to human review or marks them automatically. I believe these could work together. Community input on integration is welcome.\n\n---\n\n## Critical Limitation\n\n\u003e **Clarity Gate verifies FORM, not TRUTH.**\n\nThis system checks whether claims are properly marked as uncertain — it cannot verify if claims are actually true.\n\n**Risk:** An LLM can hallucinate facts INTO a document, then \"pass\" Clarity Gate by adding source markers to false claims.\n\n**Mitigation:** Two-Round HITL verification is **mandatory** before declaring PASS. See [SKILL.md](skills/clarity-gate/SKILL.md) for the full protocol.\n\n---\n\n## Non-Goals (By Design)\n\n- Does **not** prove truth automatically — enforces correct labeling and verification workflow\n- Does **not** replace source citations — prevents epistemic category errors\n- Does **not** require a centralized database — file-first and Git-friendly\n\n---\n\n## Roadmap\n\n| Phase | Status | Description |\n|-------|--------|-------------|\n| **Phase 1** | ✅ Ready | Internal consistency checks + Two-Round HITL + annotation (Claude skill) |\n| **Phase 2** | 🔜 Planned | npm/PyPI validators for CI/CD integration |\n| **Phase 3** | 🔜 Planned | External verification hooks (user connectors) |\n| **Phase 4** | 🔜 Planned | Confidence scoring for HITL optimization |\n\nSee [ROADMAP.md](docs/ROADMAP.md) for details.\n\n---\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [CLARITY_GATE_FORMAT_SPEC.md](docs/CLARITY_GATE_FORMAT_SPEC.md) | Unified format specification (v2.0) |\n| [CLARITY_GATE_PROCEDURES.md](docs/CLARITY_GATE_PROCEDURES.md) | Verification procedures and workflows |\n| [ARCHITECTURE.md](docs/ARCHITECTURE.md) | Full 9-point system, verification hierarchy |\n| [PRIOR_ART.md](docs/PRIOR_ART.md) | Landscape of existing systems |\n| [ROADMAP.md](docs/ROADMAP.md) | Phase 1/2/3 development plan |\n| [BENCHMARK_RESULTS.md](docs/research/BENCHMARK_RESULTS.md) | Empirical validation (+19-25% improvement for mid-tier models) |\n| [SKILL.md](skills/clarity-gate/SKILL.md) | Claude skill implementation (v2.0) |\n| [examples/](examples/) | Real-world verification examples |\n\n---\n\n## Related\n\n**arxiparse.org** — Live implementation for scientific papers  \n[arxiparse.org](https://arxiparse.org)\n\n**Source of Truth Creator** — Create epistemically calibrated documents (use before verification)  \n[github.com/frmoretto/source-of-truth-creator](https://github.com/frmoretto/source-of-truth-creator)\n\n**Stream Coding** — Documentation-first methodology where Clarity Gate originated  \n[github.com/frmoretto/stream-coding](https://github.com/frmoretto/stream-coding)\n\n---\n\n## License\n\nCC BY 4.0 — Use freely with attribution.\n\n---\n\n## Author\n\n**Francesco Marinoni Moretto**\n- GitHub: [@frmoretto](https://github.com/frmoretto)\n- LinkedIn: [francesco-moretto](https://www.linkedin.com/in/francesco-moretto/)\n\n---\n\n## Contributing\n\nLooking for:\n\n1. **Prior art** — Open-source pre-ingestion gates for epistemic quality I missed?\n2. **Integration** — LlamaIndex, LangChain implementations\n3. **Verification feedback** — Are the 9 points the right focus?\n4. **Real-world examples** — Documents that expose edge cases\n\nOpen an issue or PR.\n","funding_links":[],"categories":["Sponsors ❤️","Knowledge \u0026 Memory"],"sub_categories":["Community Skills"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrmoretto%2Fclarity-gate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrmoretto%2Fclarity-gate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrmoretto%2Fclarity-gate/lists"}