An open API service indexing awesome lists of open source software.

https://github.com/raydocs/fusion-deck

🃏 Three models gang up and out-argue the lone star. A Claude Code skill: a panel of Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro judged into one answer — plus plan, context, orchestrate & handoff workflows. Runs on the subscriptions already on your machine.
https://github.com/raydocs/fusion-deck

ai-agents anthropic claude claude-code code-review developer-tools ensemble fusion gemini llm multi-model prompt-engineering

Last synced: about 13 hours ago
JSON representation

🃏 Three models gang up and out-argue the lone star. A Claude Code skill: a panel of Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro judged into one answer — plus plan, context, orchestrate & handoff workflows. Runs on the subscriptions already on your machine.

Awesome Lists containing this project

README

          

# fusion-deck


fusion-deck — a panel of models, one judged answer

> 🃏 Three B-tier models gang up and out-argue the one A-tier star.
> A Claude Code skill that turns a panel of models into one judged answer — plus a workflow toolkit that
> plans, investigates, gathers context, splits, optimizes, refactors, and hands off.

![Claude Code Skill](https://img.shields.io/badge/Claude%20Code-Skill-8A63D2)
![License: MIT](https://img.shields.io/badge/License-MIT-3da639.svg)

**English** · [简体中文](#简体中文)

---

## The story

OpenRouter published a fun result: a **panel of models, judged by one of them, beats the best single
frontier model** (“Fusion beats frontier”). Two snags with just using theirs:

1. The single strongest model in that test — **Claude Fable 5 — is off the table for me. I can't run it.**
2. OpenRouter's Fusion is a **metered API**: every call costs.

So fusion-deck does the same trick, **on your own machine**: it rounds up **three models you already pay a
flat subscription for** — Claude Opus 4.8, GPT‑5.5 (via the `codex` CLI), and Gemini 3.1 Pro (via
Antigravity CLI `agy`; legacy `gemini` is opt-in) — has Opus 4.8 judge them, and **beats the lone star
anyway**. No extra per‑token API meter:
it just rides the CLIs you're already logged into. Three cobblers, one Zhuge Liang. 🧠

```mermaid
flowchart LR
Q(["Your question"]) --> O["Opus 4.8"]
Q --> G["GPT-5.5"]
Q --> M["Gemini 3.1 Pro"]
O --> J{{"Opus 4.8 judges
consensus · conflicts · blind spots"}}
G --> J
M --> J
J --> R(["One cross-checked answer"])
```

> **The catch:** the full panel needs all three subscriptions/CLIs. Missing one? No drama — it runs with
> whatever you've got and always tells you exactly which models answered.

## The proof

OpenRouter's **DRACO** deep‑research benchmark — 100 tasks across 10 domains:

| Setup | DRACO | vs. best solo model |
| --- | --- | --- |
| 🃏 **fusion-deck's panel** — Opus 4.8 + GPT‑5.5 + Gemini 3.1 Pro, judged by Opus 4.8 | **68.3%** | **+3.0** 🟢 |
| Opus 4.8 + GPT‑5.5, judged by Opus 4.8 | 67.6% | +2.3 |
| 🌟 Claude Fable 5 — the lone star, solo | 65.3% | — _(baseline)_ |
| GPT‑5.5, solo | 60.0% | −5.3 |
| Opus 4.8, solo | 58.8% | −6.5 |

The three underdogs land **68.3% — that's +3.0 over the star (Fable 5) and ~+9.5 over Opus 4.8 on its
own.** Three independent tries catch each other's mistakes; even the *same* model run twice and judged
jumps +6.7. Not luck — that's the whole point.

*Data: OpenRouter, “[Fusion beats frontier](https://openrouter.ai/blog/announcements/fusion-beats-frontier/).”
fusion-deck runs the same panel locally via Claude, `codex`, and `agy` — no router, nothing
leaves for a third party.*

## Two superpowers

**① Think hard — open the panel.**
`/fusion ` and `/fusion-review ` fan your question (or your code) out to the panel,
blind and in parallel, then Opus 4.8 judges it into **one cross‑checked answer** — or one prioritized
findings list, must‑fix first. For the calls where being confidently wrong is expensive.

**② Work smart — run the workflow.** This is the part people sleep on:

- 🧩 **`/fusion-plan `** → a real plan: the goal, a concrete “done‑when”, the steps, the
risks. Stop hand‑holding the AI through vague asks — pin down what you actually meant first.
- 📦 **`/fusion-context `** → a tidy, **token‑budgeted context pack of only the files that matter**.
The model finally reasons about your real code instead of drowning in the whole repo.
- 🔀 **`/fusion-orchestrate `** → **splits the work into pieces, runs each in a focused sub‑agent, and
verifies each one before starting the next.** Big changes done carefully — not one hopeful mega‑prompt.
- 🔎 **`/fusion-investigate `** → evidence first, then the panel adjudicates
the competing theories. A root‑cause report, not a confident guess.
- ⏱️ **`/fusion-optimize `** → a measure → change → re‑measure loop: baseline first, one change at a
time, the panel calls continue/stop. No baseline, no bragging.
- ♻️ **`/fusion-refactor `** → structure analysis → behavior‑preserving plan → one steered agent.
Cleaner code, same behavior (proven by the tests that stay green).
- 🤝 **`/fusion-handoff `** → a clean handoff note (done / verified / risks / next steps) so the next
agent — or future‑you — picks up in seconds.

**Power-user modes:** `/fusion-plan --deep` (a polished design doc with a critique pass) · `/fusion-context
--discover` (let an agent curate the pack, evidence-gated) · `/fusion-orchestrate --worktrees` (isolate
parallel siblings in their own git worktrees). All opt-in; the plain commands stay simple.

Chain them and you go from a vague one‑liner to a verified, shipped change:

```text
fuzzy idea → /fusion-plan → /fusion-context → /fusion-orchestrate → /fusion-handoff
```

Under the hood it's tuned to actually *get* you: panelists answer **blind** (no echo chamber), the judge
**reconciles** consensus vs. contradictions (it doesn't average), context is **curated not dumped**, and
every step is **verified before the next**.

## Which command? (when to use what)

Not sure which to reach for? Match your situation below — and when the task is easy, just ask Claude
directly; the panel is for the calls where being wrong is expensive.

| When you're trying to… | Reach for | Panel? |
| --- | --- | --- |
| Settle a hard call or trade-off (*"optimistic or pessimistic locking?"*) | `/fusion` | yes |
| Vet code, a diff, or a plan before it ships | `/fusion-review` | yes |
| Find the root cause of a bug, or *"why is it built like this?"* | `/fusion-investigate` | by exception |
| Turn a vague idea into a concrete, checkable plan | `/fusion-plan` · `--deep` for a design doc | no |
| Hand the *right* files to another model or agent | `/fusion-context` · `--discover` to auto-curate | no |
| Execute a big, multi-step change carefully | `/fusion-orchestrate` · `--worktrees` to parallelize | no |
| Make something measurably faster or smaller | `/fusion-optimize` | by exception |
| Clean up structure **without** changing behavior | `/fusion-refactor` | no |
| Pass work to the next agent (or future-you) | `/fusion-handoff` | no |
| Re-anchor a drifting session (situation→command + invariants) | `/fusion-remind` | no |

Typical flows: a **feature** is `plan → context → orchestrate → handoff`; a **bug** is
`investigate → plan → orchestrate`. A one-off hard question is just `/fusion`.

## Install

```bash
git clone https://github.com/raydocs/fusion-deck.git
bash fusion-deck/install.sh
```

Then run **`/reload-skills`** in Claude Code (or restart). Done — `/fusion`, `/fusion-plan`, … are ready.

**For the full 3‑model panel**, install the two optional CLIs (and be logged into each):

- [`codex`](https://developers.openai.com/codex) — adds the GPT‑5.5 panelist
- [`agy`](https://antigravity.google/docs/cli-install) — adds the Gemini 3.1 Pro panelist via Antigravity CLI
- Legacy `gemini` is still available only when explicitly enabled with `FUSION_GEMINI_BACKEND=gemini`
or `FUSION_ALLOW_LEGACY_GEMINI=1`.

Check anytime:

```bash
bash ~/.claude/skills/fusion-deck/scripts/detect_panel.sh # which models are available right now
bash ~/.claude/skills/fusion-deck/scripts/smoke_test.sh # offline self-check (never calls a paid model)
```

## Examples

```text
/fusion Should we use optimistic or pessimistic locking for the booking flow? Trade-offs at our scale.
/fusion-review git diff main...HEAD
/fusion-investigate the cart total is wrong for multi-currency orders
/fusion-plan add a /health endpoint with a test
/fusion-context the checkout flow, so I can hand it to another agent
/fusion-orchestrate docs/plans/add-health.md
/fusion-optimize cut p95 latency of /search under load; stop at 200ms
/fusion-refactor the payments module
/fusion-handoff the auth refactor
```

## Good to know

- **Where the savings come from.** It reuses the subscriptions you're already logged into (Claude /
`codex` / Antigravity `agy`) — no per‑token API bill the way OpenRouter's Fusion API charges. *You just
need the three subscriptions.* The full panel costs more quota and runs as slow as its slowest model, so only
`/fusion` and `/fusion-review` open the whole table by default, `/fusion-investigate` and
`/fusion-optimize` call it only at their decision points, and the rest are fast single‑model commands.
- **Nothing is faked.** Every panel answer states which models actually answered; a smaller panel is never
dressed up as the full one.
- **No secrets in the repo.** Auth lives in the CLIs; nothing private is hardcoded.

## License

[MIT](LICENSE)

---

## 简体中文

> 🃏 三个臭皮匠合起来,比那个独苗状元还能打——这回状元叫 Fable。
> 一个 Claude Code 技能:把一桌模型拧成一个被评审过的答案,外加一套会规划、会查根因、会备上下文、会拆活、会调优、会重构、会交接的工作流。

[English](#fusion-deck) · **简体中文**

### 来历

OpenRouter 发了个挺好玩的结论:**一桌模型 + 其中一个当评审,分数能压过最强的单个前沿模型**(《Fusion beats frontier》)。但直接用他们的有俩坎:

1. 那场里最能打的单模型 —— **Claude Fable 5,我这儿根本用不了,被封了。**
2. OpenRouter 的 Fusion 是 **按量计费的 API**:一调一掏钱。

所以 fusion-deck 把这套搬到**你自己电脑上**:拉上**三个你本来就按月订阅、早就登录好的模型** —— Claude Opus 4.8、GPT‑5.5(走 `codex`)、Gemini 3.1 Pro(默认走 Antigravity CLI `agy`,旧 `gemini` 只做显式兼容)—— 让 Opus 4.8 当评审,**照样把那个单飞的状元比下去**。不额外按 token 收费,直接复用你已经登录的订阅。三个臭皮匠,顶个诸葛亮。🧠

```mermaid
flowchart LR
Q(["你的问题"]) --> O["Opus 4.8"]
Q --> G["GPT-5.5"]
Q --> M["Gemini 3.1 Pro"]
O --> J{{"Opus 4.8 评审
共识 · 冲突 · 盲点"}}
G --> J
M --> J
J --> R(["一个交叉核对过的答案"])
```

> **小前提:** 想凑齐整桌,你得有这三家的订阅 / CLI。少一个也不耽误 —— 它会用现有的接着跑,而且每次都老老实实告诉你这回到底上了谁。

### 实测

OpenRouter 的 **DRACO** 深度研究基准 —— 10 个领域、100 道题:

| 配置 | DRACO | 比最强单模型 |
| --- | --- | --- |
| 🃏 **fusion-deck 的阵容** —— Opus 4.8 + GPT‑5.5 + Gemini 3.1 Pro,Opus 4.8 评审 | **68.3%** | **+3.0** 🟢 |
| Opus 4.8 + GPT‑5.5,Opus 4.8 评审 | 67.6% | +2.3 |
| 🌟 Claude Fable 5 —— 独苗状元,单飞 | 65.3% | —(基准) |
| GPT‑5.5,单飞 | 60.0% | −5.3 |
| Opus 4.8,单飞 | 58.8% | −6.5 |

三个臭皮匠落在 **68.3% —— 比状元 Fable 5(65.3%)高 3.0 分**,比 Opus 4.8 单飞高了将近 9.5 分。三次各自独立的尝试会互相挑错;哪怕同一个模型跑两遍再合并,也能高 6.7 分。不是运气,这就是整件事的核心。

*数据来自 OpenRouter 的《[Fusion beats frontier](https://openrouter.ai/blog/announcements/fusion-beats-frontier/)》(DRACO 基准)。fusion-deck 是用你本机的 Claude / `codex` / `agy` 直接跑同一套阵容 —— 不经过任何 router,也不往第三方发东西。*

### 两样看家本领

**① 想得狠 —— 开一桌。**
`/fusion <问题>`、`/fusion-review <代码 / diff>`:把问题(或代码)甩给一桌模型,各自盲答、并行跑,再由 Opus 4.8 评审成**一个交叉核对过的答案** —— 或者一份排好优先级、必改的排最前的问题清单。专治"答错了很贵"的场合。

**② 干得巧 —— 跑工作流。** 这部分最容易被低估:

- 🧩 **`/fusion-plan <一句模糊的话>`** → 一份真计划:目标、怎样算做完、分几步、有哪些坑。别再手把手哄着 AI 猜你想要啥 —— 先把你真正的意思钉死。
- 📦 **`/fusion-context <任务>`** → 一份卡着 token 预算、**只装该看的文件**的上下文包。让模型对着你真正的代码动脑子,而不是被整个仓库淹死。
- 🔀 **`/fusion-orchestrate <任务>`** → **把活拆成小块,每块交给一个专注的子 agent,做完一块先验过再开下一块。** 大改动也能稳稳落地,而不是赌一个超长 prompt 一把梭。
- 🔎 **`/fusion-investigate `** → 先把证据摆清楚,再让一桌模型给互相打架的几个假设当裁判。最后给你一份能指到根因的报告,而不是一拍脑袋的猜测。
- ⏱️ **`/fusion-optimize <指标>`** → 量一下 → 改一处 → 再量一遍的循环:先立基线,一次只动一处,该接着干还是收手让一桌模型拍板。没基线,就不准吹优化。
- ♻️ **`/fusion-refactor <目标>`** → 先看结构哪儿乱、哪儿重复,再排一份"只动结构、不动行为"的计划,然后盯着一个 agent 一步步落地。代码更干净,行为照旧——测试从头到尾绿着,就是没改坏的凭据。
- 🤝 **`/fusion-handoff <工作>`** → 一份干净的交接(做了啥 / 验了啥 / 有啥风险 / 下一步),下一个 agent —— 或者明天的你 —— 接手秒上手。

**进阶玩法:** `/fusion-plan --deep`(产出一份正式设计文档,中途还会自己挑一遍刺)· `/fusion-context --discover`(让 agent 自己挑文件,但每个都得拿得出证据)· `/fusion-orchestrate --worktrees`(并行的几路各跑在自己的 git worktree 里,互不踩脚)。都是可选项,平时用基础命令照样省心。

串起来用,一句模糊需求就能走到一个验证过、能交付的改动:

```text
模糊想法 → /fusion-plan → /fusion-context → /fusion-orchestrate → /fusion-handoff
```

底层都是冲着"更懂你"调的:几个模型**盲答**(不搞回声室)、评审**分清共识和冲突**(不是求平均)、上下文**精挑而非乱塞**、每一步**验过再走**。

### 用哪个?(什么时候用什么)

拿不准用哪个?对着下面找你的处境就行——简单活儿直接问 Claude,一桌模型是留给"答错了很贵"的场合的。

| 你想干的 | 用 | 开整桌? |
| --- | --- | --- |
| 拍一个难决定 / 权衡(*"乐观锁还是悲观锁?"*) | `/fusion` | 是 |
| 上线前审一段代码 / diff / 方案 | `/fusion-review` | 是 |
| 查一个 bug 的根因,或*"这玩意儿为啥长这样?"* | `/fusion-investigate` | 按需 |
| 把一句模糊想法变成能落地、能验收的计划 | `/fusion-plan` · `--deep` 出设计文档 | 否 |
| 把**该看的**文件挑给另一个模型 / agent | `/fusion-context` · `--discover` 自动挑 | 否 |
| 稳稳执行一个多步的大改动 | `/fusion-orchestrate` · `--worktrees` 并行 | 否 |
| 想把啥改得更快 / 更小(数字看得见) | `/fusion-optimize` | 按需 |
| 只整理结构、**不改**行为 | `/fusion-refactor` | 否 |
| 把活交给下一个 agent(或明天的你) | `/fusion-handoff` | 否 |
| 长会话跑偏了,或新 agent 要一眼看清地图和铁律 | `/fusion-remind` | 否 |

常见流程:**功能** = `plan → context → orchestrate → handoff`;**改 bug** = `investigate → plan → orchestrate`。临时一个难问题,直接 `/fusion`。

### 安装

```bash
git clone https://github.com/raydocs/fusion-deck.git
bash fusion-deck/install.sh
```

然后在 Claude Code 里跑一下 **`/reload-skills`**(或者直接重启),就齐活了 —— `/fusion`、`/fusion-plan`…… 拿来就能用。

**想凑齐三个模型的完整阵容**,再装两个可选 CLI(并各自登录好):

- [`codex`](https://developers.openai.com/codex) —— 接上 GPT‑5.5
- [`agy`](https://antigravity.google/docs/cli-install) —— 通过 Antigravity CLI 接上 Gemini 3.1 Pro
- 旧 `gemini` 只在显式设置 `FUSION_GEMINI_BACKEND=gemini` 或 `FUSION_ALLOW_LEGACY_GEMINI=1` 时启用。

想随时检查一下:

```bash
bash ~/.claude/skills/fusion-deck/scripts/detect_panel.sh # 现在能用上哪几个模型
bash ~/.claude/skills/fusion-deck/scripts/smoke_test.sh # 本地自检(不花钱、不碰付费模型)
```

### 来几个例子

```text
/fusion 预订流程到底用乐观锁还是悲观锁?按我们这个量级帮我权衡下
/fusion-review git diff main...HEAD
/fusion-investigate 多币种订单的购物车总价算错了
/fusion-plan 加一个带测试的 /health 接口
/fusion-context 把结账流程整理一下,我要交给另一个 agent
/fusion-orchestrate docs/plans/add-health.md
/fusion-optimize 把 /search 的 p95 延迟压下来,目标 200ms
/fusion-refactor 支付模块
/fusion-handoff 这次的鉴权重构
```

### 几点说明

- **省钱省在哪。** 它复用你电脑里已经登录的订阅(Claude / `codex` / Antigravity `agy`),不像 OpenRouter Fusion 那样按 token 收 API 费 —— **前提是你有这三家的订阅。** 整桌一起上更费额度、也得等最慢的那个,所以默认只有 `/fusion` 和 `/fusion-review` 开整桌,`/fusion-investigate` 和 `/fusion-optimize` 只在关键决策点才开桌,其余命令走单模型,图个快。
- **不糊弄。** 每个面板答案都会写明这回到底是哪几个模型回答的;小阵容绝不冒充满配。
- **仓库里不放密钥。** 登录的事交给各家 CLI,绝不往代码里塞私密信息。

### 许可证

[MIT](LICENSE)