{"id":50316237,"url":"https://github.com/msitarzewski/duh","last_synced_at":"2026-05-29T00:02:58.013Z","repository":{"id":338883953,"uuid":"1159098466","full_name":"msitarzewski/duh","owner":"msitarzewski","description":"Multi-model consensus engine — because one LLM opinion isn't enough","archived":false,"fork":false,"pushed_at":"2026-03-20T23:03:23.000Z","size":1133,"stargazers_count":23,"open_issues_count":0,"forks_count":9,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-21T13:50:13.785Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msitarzewski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-02-16T10:09:29.000Z","updated_at":"2026-03-20T23:03:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"f88e75ba-e5c1-41c0-a872-9836b1c19775","html_url":"https://github.com/msitarzewski/duh","commit_stats":null,"previous_names":["msitarzewski/duh"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/msitarzewski/duh","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msitarzewski%2Fduh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msitarzewski%2Fduh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msitarzewski%2Fduh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msitarzewski%2Fduh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msitarzewski","download_url":"https://codeload.github.com/msitarzewski/duh/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msitarzewski%2Fduh/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33631002,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-29T00:02:57.722Z","updated_at":"2026-05-29T00:02:58.001Z","avatar_url":"https://github.com/msitarzewski.png","language":"Python","funding_links":["https://github.com/sponsors/msitarzewski"],"categories":[],"sub_categories":[],"readme":"# duh\n\n**The trust layer for AI applications.**\n\nduh is a multi-model consensus engine that sits between your application and LLM providers — arbitrating, verifying, and scoring AI outputs before they reach your users. Think of it as the Cloudflare of AI: a verification and routing layer that makes the models behind it trustworthy.\n\n## Why this exists\n\nSingle-model answers are fragile. They hallucinate. They carry training bias. They give you no way to audit why a conclusion was reached. And if your only provider goes down or changes behavior, you're exposed.\n\nModels are commoditizing. The value is moving above the model layer — into orchestration, verification, and trust. duh captures that layer.\n\nThe output isn't \"an AI answer.\" It's confidence-scored analysis with adversarial fact-checking and preserved dissent. Every decision records who proposed what, who challenged it, what survived review, and what the dissenting positions were. This is transformative synthesis, not answer aggregation.\n\n## What it does\n\n```\nPROPOSE  --\u003e  CHALLENGE  --\u003e  REVISE  --\u003e  COMMIT\n```\n\n1. **Propose** -- The strongest available model answers your question\n2. **Challenge** -- Other models find genuine flaws (forced disagreement, no sycophancy allowed)\n3. **Revise** -- The proposer addresses every valid challenge\n4. **Commit** -- Decision extracted with confidence score, intent classification, and preserved dissent\n\nEvery step is stored. Every challenge is attributed. Every confidence score is domain-capped and calibrated against historical outcomes.\n\n## How to use it\n\nduh runs anywhere in your stack:\n\n| Interface | Use case |\n|-----------|----------|\n| **CLI** | `duh ask \"question\"` -- interactive consensus from the terminal |\n| **REST API** | `POST /api/ask` -- integrate into any application, any language |\n| **WebSocket** | Real-time streaming -- watch models debate live |\n| **Python client** | `pip install duh-client` -- async and sync wrappers |\n| **MCP server** | `duh mcp` -- AI agent integration via Model Context Protocol |\n| **Web UI** | `duh serve` -- consensus streaming, thread browser, 3D decision space |\n\n## Quick start\n\n```bash\nuv add duh\nexport ANTHROPIC_API_KEY=sk-ant-...\nexport OPENAI_API_KEY=sk-...\nexport GOOGLE_API_KEY=AIza...          # optional: Gemini models\nexport PERPLEXITY_API_KEY=pplx-...     # optional: Sonar models (challenger-only)\nduh ask \"What database should I use for a new SaaS product?\"\n```\n\nOr use a `.env` file (see `.env.example`).\n\n## Features\n\n### Consensus \u0026 reasoning\n- **Multi-model consensus** -- Claude, GPT, Gemini, Mistral, and Perplexity debate. Sycophantic challenges detected and flagged.\n- **Voting protocol** -- Fan out to all models in parallel, aggregate via majority or weighted synthesis.\n- **Query decomposition** -- Break complex questions into subtask DAGs, solve in parallel, synthesize results.\n- **Protocol auto-selection** -- Classifies your question and routes to consensus (reasoning) or voting (judgment) automatically.\n- **Question refinement** -- Pre-consensus clarification step catches ambiguous questions before they waste model calls.\n- **Convergence detection** -- Early exit when challenges repeat (Jaccard similarity \u003e= 0.7). No wasted rounds.\n\n### Trust \u0026 verification\n- **Epistemic confidence** -- Rigor scoring (0.5-1.0) + domain-capped confidence (factual 95%, technical 90%, creative 85%, judgment 80%, strategic 70%). Calibrated against historical outcomes via ECE tracking.\n- **Sycophancy detection** -- Identifies deference markers in challenges. Rubber-stamp agreements are flagged, not counted.\n- **Preserved dissent** -- Minority positions are extracted and attributed by model. Disagreement is a feature, not a bug.\n- **Decision taxonomy** -- Auto-classify decisions by intent, category, and genus for structured recall.\n- **Outcome tracking** -- Record success/failure/partial feedback. Calibration improves over time.\n\n### Grounding \u0026 tools\n- **Native web search** -- Anthropic, Google, Mistral, and Perplexity search server-side during consensus. Citations extracted, persisted, and displayed with domain grouping.\n- **Tool-augmented reasoning** -- Web search, file read, and code execution available to models during any phase.\n- **Citations** -- Deduplicated, grouped by hostname, attributed by phase (propose/challenge/revise). Displayed in CLI, Web UI, and API responses.\n\n### Web UI\n- **Live consensus streaming** -- Watch models debate in real-time via WebSocket. Challengers stream in as they finish (parallel, not batched).\n- **Thread browser** -- Search, filter, and revisit past consensus threads with full debate history. Thread detail view mirrors the live consensus view with phase-grouped rendering and phase-level navigation.\n- **3D decision space** -- Interactive scatter plot of decisions by confidence, rigor, and category. InstancedMesh handles 1000+ points.\n- **Calibration dashboard** -- ECE analysis, accuracy by confidence bucket, overall calibration rating.\n- **Shareable threads** -- Public share links for consensus results (no auth required).\n- **Executive overview** -- Auto-generated summary of key decision points after consensus completes.\n\n### Infrastructure\n- **17 models across 5 providers** -- Claude (Opus/Sonnet/Haiku), GPT (5.4/5.2/5 mini/o3), Gemini (3.1 Pro/3 Pro/3 Flash/2.5 Pro/2.5 Flash), Mistral (Large/Medium/Small/Codestral), Perplexity (Sonar/Sonar Pro/Reasoning Pro/Deep Research).\n- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud and local in the same consensus.\n- **Authentication** -- JWT auth with user accounts, RBAC (admin/contributor/viewer), password reset via SMTP email.\n- **Persistent memory** -- SQLite or PostgreSQL. Every thread, turn, contribution, decision, vote, subtask, and citation stored.\n- **Cost tracking** -- Per-model token costs in real-time with warn thresholds and hard limits.\n- **Export** -- Threads as JSON, Markdown, or PDF. PDF features branded cover page, styled table of contents with dot-leader links, colored section headers, phase-grouped contributions (PROPOSE/CHALLENGE/REVISE), confidence/rigor meters, and consolidated sources with clickable URLs.\n- **Batch processing** -- Process multiple questions from a file with any protocol.\n- **Backup \u0026 restore** -- SQLite copy or JSON export, with merge mode for restores.\n\n## Protocols\n\n### Consensus (default)\n\n```\nPROPOSE  --\u003e  CHALLENGE  --\u003e  REVISE  --\u003e  COMMIT\n```\n\nStrongest model proposes. Others challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate). Proposer revises, addressing each valid challenge. Decision extracted with confidence score and preserved dissent.\n\nConvergence detection (Jaccard similarity \u003e= 0.7) stops early when challenges repeat.\n\n### Voting\n\n```\nFAN-OUT (all models)  --\u003e  AGGREGATE (majority / weighted)\n```\n\nAll models answer independently in parallel. A meta-judge picks the best answer (majority) or synthesizes all answers weighted by capability (weighted).\n\n### Decomposition\n\n```\nDECOMPOSE  --\u003e  SCHEDULE (topological sort)  --\u003e  SYNTHESIZE\n```\n\nComplex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results synthesized by the strongest model.\n\n## Commands\n\n### Consensus\n\n```bash\nduh ask \"question\"                        # Run consensus (default protocol)\nduh ask \"question\" --refine               # Clarify ambiguous questions first\nduh ask \"question\" --decompose            # Decompose into subtasks first\nduh ask \"question\" --protocol voting      # Use voting protocol\nduh ask \"question\" --protocol auto        # Auto-select by question type\nduh ask \"question\" --tools                # Enable tool use (on by default)\nduh ask \"question\" --no-tools             # Disable tool use\nduh ask \"question\" --rounds 5             # Override max consensus rounds\nduh ask \"question\" --proposer anthropic:claude-opus-4-6   # Override proposer\nduh ask \"question\" --challengers openai:gpt-5.4,google:gemini-3.1-pro  # Override challengers\nduh ask \"question\" --panel anthropic:claude-opus-4-6,openai:gpt-5.4    # Restrict model panel\n```\n\n### Memory \u0026 recall\n\n```bash\nduh recall \"keyword\"                      # Search past decisions\nduh recall \"keyword\" --limit 20           # Limit results\nduh threads                               # List past threads\nduh threads --status complete --limit 50  # Filter by status\nduh show \u003cthread-id\u003e                      # Full debate history (prefix match OK)\nduh feedback \u003cid\u003e --result success        # Record outcome\nduh feedback \u003cid\u003e --result failure --notes \"...\"  # With notes\n```\n\n### Export \u0026 data\n\n```bash\nduh export \u003cthread-id\u003e                    # Export as JSON (default)\nduh export \u003cthread-id\u003e --format markdown  # Export as Markdown\nduh export \u003cthread-id\u003e --format pdf -o report.pdf  # Export as PDF\nduh export \u003cthread-id\u003e --content decision # Decision only (vs full)\nduh export \u003cthread-id\u003e --no-dissent       # Suppress dissent section\nduh backup ./backup.db                    # Backup database\nduh backup ./backup.json --format json    # Backup as JSON\nduh restore ./backup.db                   # Restore (replace)\nduh restore ./backup.db --merge           # Restore (merge with existing)\n```\n\n### Models \u0026 cost\n\n```bash\nduh models                                # List all available models\nduh cost                                  # Cumulative cost breakdown by model\n```\n\n### Calibration\n\n```bash\nduh calibration                           # Confidence calibration analysis\nduh calibration --category technical      # Filter by category\nduh calibration --since 2026-01-01        # Filter by date range\n```\n\n### Server \u0026 integrations\n\n```bash\nduh serve                                 # Start REST API + Web UI\nduh serve --host 0.0.0.0 --port 9000     # Custom host/port\nduh serve --reload                        # Auto-reload for development\nduh mcp                                   # Start MCP server for AI agents\nduh batch questions.txt                   # Batch consensus (text file)\nduh batch questions.jsonl --format json   # Batch with JSON output\nduh batch questions.txt --protocol voting # Batch with voting protocol\n```\n\n### User management\n\n```bash\nduh user-create --email u@x.com --password ...  # Create user\nduh user-list                             # List users\n```\n\n## REST API\n\n```\nPOST /api/ask              Consensus query (any protocol)\nPOST /api/refine           Analyze question for ambiguity\nPOST /api/enrich           Rewrite question with clarifications\nGET  /api/threads          List threads (filter by status)\nGET  /api/threads/:id      Thread with full debate history + citations\nGET  /api/share/:token     Public thread view (no auth)\nGET  /api/threads/:id/export  Export as PDF or Markdown\nGET  /api/recall           Search past decisions\nPOST /api/feedback         Record outcome\nGET  /api/models           List available models\nGET  /api/cost             Cost breakdown by model\nGET  /api/calibration      Confidence calibration analysis\nGET  /api/decisions/space  Decision space data (3D viz)\nWS   /ws/ask               Stream consensus in real-time\n```\n\nAPI key auth, rate limiting, and JWT authentication included. Full reference: [docs/api-reference.md](docs/api-reference.md).\n\n## Supported models\n\n| Provider | Models | Context | Notes |\n|----------|--------|---------|-------|\n| **Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5 | 200K | Native web search |\n| **OpenAI** | GPT-5.4, GPT-5.2, GPT-5 mini, o3 | 200K-1M | Search on select models |\n| **Google** | Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash | 1M | Native grounding search |\n| **Mistral** | Large, Medium, Small, Codestral | 128-256K | Native web search |\n| **Perplexity** | Sonar, Sonar Pro, Reasoning Pro, Deep Research | 128-200K | Always searches (challenger-only) |\n| **Local** | Any Ollama or LM Studio model | Varies | Via OpenAI-compatible API |\n\nSet API keys as environment variables or in `.env`. Models are auto-discovered from available keys.\n\n## Phase 0 benchmark\n\nBefore building duh, we validated the thesis: 50 questions, 4 methods, blind LLM-as-judge evaluation. Consensus consistently outperformed direct answers, self-debate, and ensemble approaches -- especially on questions requiring nuanced judgment and multi-perspective analysis. See [full benchmark results](docs/reference/benchmarks.md).\n\n## Documentation\n\nFull documentation: [docs/](docs/index.md)\n\n- [Installation](docs/getting-started/installation.md)\n- [Quickstart](docs/getting-started/quickstart.md)\n- [How Consensus Works](docs/concepts/how-consensus-works.md)\n- [CLI Reference](docs/cli/index.md)\n- [REST API Reference](docs/api-reference.md)\n- [Python Client](docs/python-client.md)\n- [MCP Server](docs/mcp-server.md)\n- [Batch Mode](docs/batch-mode.md)\n- [Export](docs/export.md)\n- [Python API](docs/python-api/library-usage.md)\n- [Docker Guide](docs/guides/docker.md)\n- [Authentication](docs/guides/authentication.md)\n- [Config Reference](docs/reference/config-reference.md)\n\n## Hosted service\n\n**[duh.bot](https://duh.bot)** -- commercial hosted consensus. Pay-per-question, no infrastructure to manage. Same engine, managed for you.\n\n## Sponsor\n\nIf duh is useful to you, consider [sponsoring the project](https://github.com/sponsors/msitarzewski).\n\n## License\n\n[AGPL-3.0](LICENSE) -- Run it yourself (open source) or use the hosted service at [duh.bot](https://duh.bot).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsitarzewski%2Fduh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsitarzewski%2Fduh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsitarzewski%2Fduh/lists"}