{"id":49820176,"url":"https://github.com/fxspeiser/crosscheck-agent","last_synced_at":"2026-05-13T10:04:36.745Z","repository":{"id":353170301,"uuid":"1217236155","full_name":"fxspeiser/crosscheck-agent","owner":"fxspeiser","description":"A compact, polyglot MCP add-on for Claude Code that lets Claude confer with GPT, Grok, Gemini, Mistral, Groq, and DeepSeek — to reason, debate, plan, and peer-review. Pick your runtime: Python, TypeScript, Rust, or Perl. All four share one config and bounded limits on rounds, tokens, and time.","archived":false,"fork":false,"pushed_at":"2026-04-22T18:03:09.000Z","size":56,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-22T20:05:59.593Z","etag":null,"topics":["ai-agents","anthropic","claude-code","codex","deepseek","developer-tools","gemini","grok","groq","llm","mcp","mcp-server","mistral","multi-agent","openai","perl","python","rust","typescript"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fxspeiser.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-21T17:22:53.000Z","updated_at":"2026-04-22T19:25:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/fxspeiser/crosscheck-agent","commit_stats":null,"previous_names":["fxspeiser/crosscheck-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/fxspeiser/crosscheck-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fxspeiser%2Fcrosscheck-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fxspeiser%2Fcrosscheck-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fxspeiser%2Fcrosscheck-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fxspeiser%2Fcrosscheck-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fxspeiser","download_url":"https://codeload.github.com/fxspeiser/crosscheck-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fxspeiser%2Fcrosscheck-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32977316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T06:31:55.726Z","status":"ssl_error","status_checked_at":"2026-05-13T06:31:51.336Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","anthropic","claude-code","codex","deepseek","developer-tools","gemini","grok","groq","llm","mcp","mcp-server","mistral","multi-agent","openai","perl","python","rust","typescript"],"created_at":"2026-05-13T10:04:35.001Z","updated_at":"2026-05-13T10:04:36.740Z","avatar_url":"https://github.com/fxspeiser.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# crosscheck-agent\n\nConfer with multiple LLMs from inside Claude Code. `crosscheck-agent` is a\ncompact MCP server that lets Claude ask peers from other model families\n(GPT, Grok, Gemini, Mistral, Groq, DeepSeek) to reason, debate, plan, and\npeer-review — then hands the synthesised answer back to Claude.\n\nThe server is **Python, stdlib-only** — no external dependencies, no build\nstep. (Earlier versions shipped TypeScript/Rust/Perl mirrors; those have\nbeen dropped to remove the 4× maintenance tax. Python is canonical.)\n\n```\n       ┌──────────────┐\nClaude │ Claude Code  │     MCP      ┌────────────────────────┐\n tool  │  (your IDE)  │ ───────────▶ │  crosscheck-agent MCP  │\n call  │              │   stdio      │       (Python)         │\n       └──────────────┘              └───────────┬────────────┘\n                                                 │ HTTPS\n                                                 ▼\n                         ┌───────────────────────────────────────┐\n                         │ OpenAI · xAI · Gemini · Mistral · Groq │\n                         │ DeepSeek · Anthropic (…)              │\n                         └───────────────────────────────────────┘\n```\n\n## Tools\n\n| Tool             | What it does                                                                 |\n|------------------|------------------------------------------------------------------------------|\n| `list_providers` | Discover which LLMs are currently available and which are in the active set.|\n| `confer`         | Ask one or more providers the same question in parallel; return every answer.|\n| `debate`         | Bounded round-trip debate; the configured moderator synthesises a result. Optional `structured: true` makes the synthesis a JSON-schema-validated object (consensus, dissent, key claims, citations, open questions). |\n| `plan`           | Collaborative step-by-step planning with risks + alternatives. Honours `structured: true`. |\n| `review`         | Peer code / proposal review across one or more LLMs. Pass `untrusted_input: true` when the snippet may contain prompt injections. |\n| `coordinate`     | Structured **Proposer → Critic(s) → Synthesizer** flow. Each role emits a JSON envelope; the synthesizer emits a validated `StructuredSynthesis`. Persists key claims to the SQLite claim-list with `supports`/`attacks` edges when `session_id` is provided. |\n| `triangulate`    | Run a coordinate flow and return **consensus + minority report** plus per-provider weights drawn from accumulated bench / critic-ballot win-rate. The \"give me one answer, but be honest about disagreement\" tool. |\n| `delegate`       | Cross-model handshake: route a `confer` or `review` call through one named provider, with quota tracking by `session_id` and by requesting provider. Refused calls return `accepted: false` with a structured `reason` and the live quota envelope. |\n| `bench`          | Run repo-scoped goldens (`*.json` in `.crosscheck/goldens/`) against a panel; rule-based verifiers (contains / regex_match / contains_any / contains_all / not_contains / min_length) score each provider, and the win-rate feeds `triangulate`'s weights. |\n| `solve`          | Iterative **propose → verify → retry**. Provider drafts a literal solution; a `shell` (sandboxed subprocess: timeout, RLIMIT_AS, RLIMIT_CPU, isolated tmpdir) or `regex_response` verifier accepts or rejects; failures feed back to the next attempt. Pass `target_path` to also get a unified-diff `patch` preview (file is never modified). |\n| `fetch`          | Retrieve a URL with **deny-by-default allowlist** (`fetch.url_allowlist` prefix list) and persist a sha256-keyed snapshot under `.crosscheck/evidence/`. Cached on repeat unless `force_refresh: true`. Use to ground claims with reproducible evidence. |\n| `pick`           | **Multi-criteria decision-making.** Each provider scores every option on every criterion (0..1); the tool aggregates with criterion weights, returns a ranked list, and surfaces the top-K cross-provider disagreements as `dissent_deltas`. |\n| `scoreboard`     | Read-only snapshot: per-provider weight + wins/losses/abstains + delegations, plus `totals` for sessions/claims/links/delegations and (optional) the last N redacted event lines. The data the UI panel reads. |\n| `update_crosscheck` | Compares your local git HEAD against `main` at https://github.com/fxspeiser/crosscheck-agent. With `apply: true`, runs `git pull --ff-only` in the install directory; the server can't reload itself, so the response asks you to restart Claude Code. The first crosscheck call per server process runs the same check (cached 6h) and attaches an `update_notice` to the result so Claude can offer the upgrade proactively. |\n\n### Ad-hoc panels\n\n`confer`, `debate`, `plan`, and `review` all accept an optional `providers`\narray so you can assemble a panel on the fly instead of using the configured\nactive set. Some useful patterns from inside Claude Code:\n\n```\n# \"I want a fast second opinion from Grok only, skip everyone else.\"\nconfer(question=\"…\", providers=[\"xai\"])\n\n# \"Pit GPT against Gemini, let OpenAI moderate.\"\ndebate(topic=\"…\", providers=[\"openai\", \"gemini\"], moderator=\"openai\")\n\n# \"Review this diff with just my local-fast models.\"\nreview(snippet=\"…\", providers=[\"groq\", \"deepseek\"])\n\n# \"Plan the migration; I want every model in the house.\"\nplan(goal=\"…\", providers=[\"anthropic\",\"openai\",\"xai\",\"gemini\",\"mistral\",\"groq\",\"deepseek\"])\n```\n\nNot sure what's wired up? Call `list_providers` first — it returns every\nknown provider, whether it has an API key in `.env`, and whether it's in\nthe configured active set. If you ask for a provider that has no key,\n`crosscheck` returns a structured error telling you exactly what's missing\nso Claude can self-correct.\n\nEvery run obeys the limits in `crosscheck.config.json`:\n\n- `max_rounds` — hard cap on unsupervised round trips.\n- `token_cap` — total token budget spread across providers × rounds.\n- `max_time_seconds` — wall-clock deadline enforced per run.\n- `cache.{enabled,ttl_seconds,max_entries,dir}` — SHA256 exact-match disk cache for provider responses. Cached calls return with `cache_hit: true` and `elapsed_ms: 0`. LRU eviction at write time; default TTL 7 days.\n- `retries.{max_attempts,backoff_base_s}` — jittered exponential backoff on transient HTTP errors (429, 5xx, network, timeout). Honours upstream `Retry-After`.\n- `rate_limits.{\u003cprovider\u003e|default}.{capacity,refill_per_sec}` — per-provider leaky-bucket rate limiter.\n- `redaction.{enabled,patterns_extra}` — regex scrub for emails, IPv4, AWS keys, GitHub PATs, Slack tokens, OpenAI keys, bearer tokens, 16-digit cards. Applied recursively to ndjson event records and to JSON transcripts at write time.\n- `provider_allowlist` — null = no restriction; otherwise the array is the only set of providers that may run, even when callers ask for others. Blocked providers surface in `blocked_by_allowlist`.\n- `events_log` — path to the ndjson event trace (one structured event per `tool_start` / `provider_call` / `tool_end`). Use `scripts/replay` to inspect.\n- `delegation.{max_per_session,max_per_requester}` — quota knobs for the `delegate` tool.\n- `bench.goldens_dir` — directory holding bench fixtures.\n\nEvery tool result includes a `budget` block (`wall_used_ms`, `wall_remaining_ms`, `cache_hits`, `provider_calls`). When `session_id` is supplied, a `session` block carries cumulative `{calls, wall_ms, cache_hits}` across calls — backed by SQLite at `.crosscheck/sessions.sqlite3` (override via `session_db`).\n\n## Asking Claude to use it\n\nOnce the MCP server is registered, you don't call the tools by hand — Claude\ndoes, based on what you say. A few prompts that work well inside Claude Code:\n\n**Quick sanity check across the panel**\n\n\u003e \"Confer with the panel: is a `uuid.v7()` primary key a bad idea for a\n\u003e high-write Postgres table?\"\n\n**Pick a specific model on the fly**\n\n\u003e \"Ask Grok only — what's the cheapest way to shard this Redis cluster?\"\n\n\u003e \"Just confer with GPT and Gemini on whether this regex is ReDoS-safe.\"\n\n**Debate between specific peers**\n\n\u003e \"Debate this with OpenAI and Gemini, let Claude moderate: should we use\n\u003e Server-Sent Events or WebSockets for the notification stream?\"\n\n**Plan with a hand-picked team**\n\n\u003e \"Plan the auth migration with Anthropic, OpenAI, and xAI. Constraint: zero\n\u003e downtime, Postgres-backed sessions, 3M active users.\"\n\n**Peer code review**\n\n\u003e \"Have Groq and DeepSeek review this migration for race conditions:\n\u003e `\u003cpaste SQL\u003e`\"\n\n**Discover what's wired up**\n\n\u003e \"List the providers crosscheck has available and tell me which ones are\n\u003e missing an API key.\"\n\n**Structured coordination (Proposer → Critic → Synthesizer)**\n\n\u003e \"Coordinate this with anthropic, openai, and gemini, session_id=auth-rewrite-1: 'should we move from JWT to opaque tokens for our internal-API auth?'\"\n\n**Triangulate when you want consensus + dissent**\n\n\u003e \"Triangulate across the panel: what's the right batch size for our embedding pipeline given a 16GB GPU and 4M docs?\"\n\n**Cross-model delegation**\n\n\u003e \"Delegate this code review to xai with requested_by=anthropic, session_id=migrations-2: paste-the-SQL-here.\"\n\n**Bench the panel**\n\n\u003e \"Run bench against alpha, beta, gamma using the goldens in .crosscheck/goldens/ and rank them.\"\n\n**Self-update**\n\n\u003e \"Check whether crosscheck-agent has an update.\"  *(if Claude already saw an `update_notice` on a previous tool call, it will surface it without prompting.)*\n\n\u003e \"Yes, upgrade it.\"  → Claude calls `update_crosscheck` with `apply: true`, then reminds you to restart Claude Code so the new server code loads.\n\n**Replay the event log**\n\n```bash\nscripts/replay --tail 50                # last 50 events\nscripts/replay --tool coordinate        # only coordinate events\nscripts/replay --provider gemini        # all Gemini calls\nscripts/replay --kind provider_call --since 5m\n```\n\nClaude will call `list_providers`, `confer`, `debate`, `plan`, or `review`\nunder the hood, pass the subset you named, and stream the responses back.\nIf you name a provider that isn't configured, crosscheck returns a\nstructured error so Claude can ask you what to do instead of guessing.\n\n## Quick start\n\n```bash\ngit clone https://github.com/\u003cyou\u003e/crosscheck-agent.git\ncd crosscheck-agent\n\n# 1. Interactive setup — writes .env + crosscheck.config.json\nbash scripts/setup.sh\n\n# 2. Sanity-check the server starts\npython3 servers/python/crosscheck_server.py\n\n# 3. Register with Claude Code\nclaude mcp add crosscheck -- python3 \"$PWD/servers/python/crosscheck_server.py\"\n```\n\nThen inside Claude Code:\n\n```\n/mcp\n# call confer / debate / plan / review\n```\n\nRequires Python 3.10+. No `pip install` needed.\n\n## Tuning limits from the terminal\n\nThe `scripts/crosscheck` CLI edits `crosscheck.config.json` in place.\n\n```bash\nscripts/crosscheck config show\nscripts/crosscheck config set max_rounds 5\nscripts/crosscheck config set token_cap 16000\nscripts/crosscheck config set max_time_seconds 300\nscripts/crosscheck config set providers anthropic,openai,xai,gemini\nscripts/crosscheck config set moderator openai\n\nscripts/crosscheck providers list\nscripts/crosscheck providers enable gemini\nscripts/crosscheck providers disable xai\n\nscripts/crosscheck doctor     # audit: keys present, config sane\n```\n\nOptionally add this line to your shell rc file to make the CLI globally available:\n\n```bash\nexport PATH=\"$PATH:/path/to/crosscheck-agent/scripts\"\n```\n\n## Providers\n\n| Provider   | Env var              | Default model             | Endpoint                |\n|------------|----------------------|---------------------------|-------------------------|\n| Anthropic  | `ANTHROPIC_API_KEY`  | `claude-opus-4-5`         | native                  |\n| OpenAI     | `OPENAI_API_KEY`     | `gpt-5`                   | Chat Completions        |\n| xAI (Grok) | `XAI_API_KEY`        | `grok-4-latest`           | OpenAI-compatible       |\n| Google     | `GEMINI_API_KEY`     | `gemini-2.5-pro`          | Gemini API              |\n| Mistral    | `MISTRAL_API_KEY`    | `mistral-large-latest`    | OpenAI-compatible       |\n| Groq       | `GROQ_API_KEY`       | `llama-3.3-70b-versatile` | OpenAI-compatible       |\n| DeepSeek   | `DEEPSEEK_API_KEY`   | `deepseek-chat`           | OpenAI-compatible       |\n\nAny provider without a key in `.env` is silently skipped.\n\n## Configuration\n\n`crosscheck.config.json`:\n\n```json\n{\n  \"max_rounds\": 3,\n  \"token_cap\": 8000,\n  \"max_time_seconds\": 120,\n  \"providers\": [\"anthropic\", \"openai\", \"xai\"],\n  \"moderator\": \"anthropic\",\n  \"temperature\": 0.4,\n  \"log_transcripts\": true,\n  \"transcript_dir\": \".crosscheck/transcripts\"\n}\n```\n\nWhen `log_transcripts` is on, every conferral / debate is persisted as JSON\nunder `.crosscheck/transcripts/` (also gitignored).\n\n### Picking a sensible `token_cap`\n\n`token_cap` is the total completion-token budget for a single tool call,\nsplit across providers × rounds. The default (60000, for coding work) gives\neach call ~6.6k tokens with the default 3 rounds × 3 providers.\n\nOne gotcha worth knowing: OpenAI's GPT-5 and o-series are **reasoning\nmodels**. crosscheck-agent reserves `max_completion_tokens` per call, and\nOpenAI counts that reservation against your tier's per-request / per-minute\nlimit *before* the call runs. On lower usage tiers, a 6.6k reservation can\ntrip a 429. If you see `HTTP 429: exceeded your current quota` on OpenAI\nonly (Anthropic + xAI still work), drop the cap:\n\n```bash\nscripts/crosscheck config set token_cap 20000   # ~2.2k per call\n```\n\nOr raise your OpenAI usage tier. crosscheck-agent automatically uses\n`max_completion_tokens` and omits `temperature` when the model name starts\nwith `gpt-5`, `o1`, `o3`, or `o4`.\n\n## Layout\n\n```\ncrosscheck-agent/\n├── .env.example\n├── crosscheck.config.example.json\n├── scripts/\n│   ├── setup.sh            # interactive wizard\n│   └── crosscheck          # config + providers CLI\n└── servers/\n    └── python/             # stdlib-only MCP server (canonical)\n```\n\n## Security\n\n- `.env` is gitignored. The setup wizard chmods it to `600`.\n- The `crosscheck` CLI never prints API keys, only whether they exist.\n- Keys are read at startup and never written anywhere except stderr on an\n  HTTP error (which may echo the remote error payload — be mindful if you\n  pipe logs to third-party tools).\n\n## Contributing\n\nIssues and PRs welcome. Keep the tool surface (`confer`, `debate`, `plan`,\n`review`, `list_providers`) stable and dependency-light. Python stdlib only.\n\n## Credits\n\nBuilt by [Frank Speiser](https://github.com/fspeiser) with pair-programming\nassistance from Claude (Anthropic). Mistakes are Frank's; good ideas are shared.\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffxspeiser%2Fcrosscheck-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffxspeiser%2Fcrosscheck-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffxspeiser%2Fcrosscheck-agent/lists"}