{"id":50944242,"url":"https://github.com/luckeyfaraday/fusion-engine","last_synced_at":"2026-06-17T18:07:09.035Z","repository":{"id":365022713,"uuid":"1270166508","full_name":"luckeyfaraday/fusion-engine","owner":"luckeyfaraday","description":"Self-hosted Fusion API and OpenRouter Fusion alternative for building reliable multi-model LLM ensembles — fan out one prompt to N models, then synthesize one answer with a judge model. OpenAI-compatible API, CLI, and eval harness.","archived":false,"fork":false,"pushed_at":"2026-06-15T14:39:48.000Z","size":75,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-15T15:22:16.528Z","etag":null,"topics":["ai-agents","benchmarking","fastapi","fusion-api","judge-model","llm","llm-ensemble","llm-orchestration","model-fusion","openai-compatible","openrouter","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luckeyfaraday.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-15T12:58:13.000Z","updated_at":"2026-06-15T14:42:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/luckeyfaraday/fusion-engine","commit_stats":null,"previous_names":["luckeyfaraday/fusion-engine"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/luckeyfaraday/fusion-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckeyfaraday%2Ffusion-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckeyfaraday%2Ffusion-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckeyfaraday%2Ffusion-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckeyfaraday%2Ffusion-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luckeyfaraday","download_url":"https://codeload.github.com/luckeyfaraday/fusion-engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckeyfaraday%2Ffusion-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34459761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","benchmarking","fastapi","fusion-api","judge-model","llm","llm-ensemble","llm-orchestration","model-fusion","openai-compatible","openrouter","python"],"created_at":"2026-06-17T18:07:07.924Z","updated_at":"2026-06-17T18:07:09.028Z","avatar_url":"https://github.com/luckeyfaraday.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fusion Engine: Fusion API and OpenRouter Fusion Alternative\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/luckeyfaraday/fusion-engine/actions/workflows/ci.yml\"\u003e\u003cimg alt=\"CI\" src=\"https://img.shields.io/github/actions/workflow/status/luckeyfaraday/fusion-engine/ci.yml?branch=main\u0026amp;label=CI\u0026amp;style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/luckeyfaraday/fusion-engine/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/luckeyfaraday/fusion-engine?style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003cimg alt=\"Python 3.10-3.12\" src=\"https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-3776AB?style=for-the-badge\u0026amp;logo=python\u0026amp;logoColor=white\"\u003e\n  \u003cimg alt=\"Version 0.1.0\" src=\"https://img.shields.io/badge/version-0.1.0-7C3AED?style=for-the-badge\"\u003e\n  \u003cimg alt=\"OpenRouter\" src=\"https://img.shields.io/badge/OpenRouter-Fusion%20API-111827?style=for-the-badge\"\u003e\n  \u003cimg alt=\"OpenAI-compatible API\" src=\"https://img.shields.io/badge/API-OpenAI--compatible-10A37F?style=for-the-badge\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e ·\n  \u003ca href=\"#what-is-fusion-engine\"\u003eWhat It Does\u003c/a\u003e ·\n  \u003ca href=\"#openai-compatible-api-and-fastapi-server\"\u003eHTTP API\u003c/a\u003e ·\n  \u003ca href=\"#benchmarking-llm-ensembles--does-fusion-actually-beat-one-model\"\u003eBenchmarks\u003c/a\u003e ·\n  \u003ca href=\"#openrouter-fusion-alternative\"\u003eOpenRouter Fusion Alternative\u003c/a\u003e ·\n  \u003ca href=\"#contributing-and-security\"\u003eContributing\u003c/a\u003e ·\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n**Fusion Engine is a self-hosted Fusion API and OpenRouter Fusion alternative\nfor building reliable AI model ensembles.** Send one prompt to *N* large\nlanguage models in parallel through [OpenRouter](https://openrouter.ai), collect\nevery response with token, latency, and cost metadata, then have a configurable\n**judge model** synthesize one higher-quality answer.\n\nUse Fusion Engine as a Python library, CLI, FastAPI service, or\nOpenAI-compatible `/v1/chat/completions` API for AI agents that need\nmulti-model LLM orchestration, fused tool calling, transparent model comparison,\nand eval-driven panel tuning.\n\nIf you are looking for a **Fusion API**, **OpenRouter Fusion**, **Fusion Fable**,\nor **Fable Fusion** style system, this project provides the self-hosted Python\nimplementation: model panels, judge prompts, OpenRouter routing, OpenAI-style\nchat completions, and benchmarkable fusion results.\n\nOne prompt in → many models answer → one fused answer out. You control the\npanel composition, the judge prompts, and where it runs — no vendor lock-in.\n\n| Widget | What Fusion Engine Gives You |\n|---|---|\n| Fusion API | Self-hosted multi-model fusion with transparent cost, token, and latency data. |\n| OpenRouter Fusion | Bring your own OpenRouter models and panel definitions instead of using a black box. |\n| Fable Fusion / Fusion Fable | Keyword-aligned multi-model synthesis for agents, research, code, and creative workflows. |\n| OpenAI-compatible gateway | `/v1/models` and `/v1/chat/completions` endpoints for existing AI clients. |\n| Eval harness | GSM8K, HumanEval, MMLU, GPQA, paired baselines, and cost-aware reporting. |\n\n---\n\n## What Is Fusion Engine?\n\nFusion Engine is an open-source Python framework for **parallel LLM inference**,\n**LLM ensemble routing**, and **judge-model synthesis**. It lets you build a\ncustom panel of OpenRouter models, run them concurrently, and combine their\nanswers with a judge prompt you own.\n\nKey capabilities:\n\n- **Multi-model LLM orchestration** — fan out one prompt to several providers and\n  models through OpenRouter.\n- **AI model ensemble synthesis** — use a judge model to resolve disagreements\n  and produce one final response.\n- **OpenAI-compatible API** — expose a panel as `fusion/\u003cpanel\u003e` for clients that\n  call `/v1/models` and `/v1/chat/completions`.\n- **Tool-calling support** — let panelists propose tool calls and have the judge\n  select one synthesized next action.\n- **Benchmark harness** — compare fusion against single-model and judge-alone\n  baselines on GSM8K, HumanEval, MMLU, GPQA, or custom JSONL evals.\n- **Self-hosted control** — keep panel definitions, judge prompts, logs, costs,\n  and deployment under your control.\n\n---\n\n## LLM Summary\n\nFor search engines, AI agents, and LLM retrieval systems, Fusion Engine is:\n\n- **Project type:** self-hosted OpenRouter Fusion API, LLM ensemble framework,\n  and OpenAI-compatible LLM gateway.\n- **Related search terms:** Fusion API, OpenRouter Fusion, Openrouter Fusion,\n  Fusion Fable, Fable Fusion, multi-model fusion, model fusion API, LLM fusion\n  engine, and AI model ensemble.\n- **Primary job:** send one prompt to multiple LLMs, collect independent answers,\n  and synthesize one response with a judge model.\n- **Interfaces:** Python API, command-line interface, FastAPI HTTP server, and\n  OpenAI-compatible chat completions endpoint.\n- **Best use cases:** AI agent backend, model comparison, code review,\n  research synthesis, brainstorming, security review, and eval-driven model\n  routing.\n- **Evaluation support:** HumanEval code execution, GSM8K numeric grading, MMLU\n  multiple choice, GPQA science questions, and custom benchmark datasets.\n- **Not a:** model provider, vector database, RAG framework, or hosted SaaS.\n\n---\n\n## Search and LLM Discovery Keywords\n\nFusion Engine is written to be discoverable for developers comparing or building\nsystems around these terms:\n\n- Fusion API\n- OpenRouter Fusion / Openrouter Fusion\n- Fusion Fable / Fable Fusion\n- self-hosted Fusion API\n- OpenRouter Fusion alternative\n- multi-model fusion API\n- LLM fusion engine\n- AI model ensemble\n- judge model synthesis\n- OpenAI-compatible model router\n\n---\n\n## Why Use a Multi-Model LLM Ensemble?\n\nA single model has blind spots. Asking several models the same question and\nfusing their answers gives you:\n\n- **Higher reliability** — agreement across independent models is a strong\n  signal; disagreement is a flag worth surfacing.\n- **Coverage** — different models are strong at different things (reasoning,\n  code, recency, tone). The judge keeps the best of each.\n- **Control** — you own the panel, the judge prompt, and the data path. Run it\n  locally, pin your own models, swap judges per task.\n\n---\n\n## Architecture\n\n```\n                           ┌──────────────────────────────────────┐\n                           │           FusionEngine                │\n                           │                                       │\n   \"Analyze the         ┌──┤  1. load panel (budget/quality/code)  │\n    competitive    ─────┘  │  2. fan out the prompt in parallel    │\n    landscape...\"          │                                       │\n                           │        ┌─────────────────────┐        │\n                           │   ┌───▶│  model A (OpenRouter)│───┐    │\n                           │   │    └─────────────────────┘   │    │\n       prompt ─────────────┼───┼───▶│  model B (OpenRouter)│───┼─┐  │\n                           │   │    └─────────────────────┘   │ │  │\n                           │   └───▶│  model C (OpenRouter)│───┘ │  │\n                           │        └─────────────────────┘     │  │\n                           │                                     ▼  │\n                           │   3. collect responses   ┌────────────┐\n                           │      (text, latency,      │  COLLECT   │\n                           │       tokens, cost)       └─────┬──────┘\n                           │                                 │       │\n                           │   4. judge synthesizes    ┌─────▼──────┐\n                           │      all responses ──────▶│   JUDGE    │\n                           │      (judges/\u003cpanel\u003e.md)  │   model    │\n                           │                           └─────┬──────┘\n                           └─────────────────────────────────┼──────┘\n                                                             ▼\n                                              ┌───────────────────────────┐\n                                              │   FusionResult            │\n                                              │   • synthesized answer     │\n                                              │   • per-model responses    │\n                                              │   • cost / latency / usage │\n                                              └───────────────────────────┘\n```\n\n**Flow:** `prompt → parallel dispatch → N models → collect → judge → synthesized answer`\n\n---\n\n## Quick start\n\n```bash\n# 1. Get the code\ngit clone https://github.com/\u003cowner\u003e/fusion-engine.git\ncd fusion-engine\n\n# 2. Install the CLI and core dependencies\npython3 -m pip install -e .\n\n# 3. Configure your OpenRouter key (required)\nexport OPENROUTER_API_KEY=sk-or-v1-...\n#   ...or: cp .env.example .env  and edit it\n\n# 4. Run a fusion query\nfusion run \"What are the implications of quantum computing on cryptography?\" -p budget\n```\n\nUseful CLI flags:\n\n```bash\n# Show each model's answer + latency + cost, not just the synthesis\nfusion run \"Compare REST vs gRPC for microservices\" -p quality -v\n\n# Code-focused panel with live web search enabled\nfusion run \"Review this auth flow for vulnerabilities\" -p code --web-search\n\n# List configured panels and their member models\nfusion panels\n```\n\n---\n\n## Panels\n\nA **panel** is a named set of member models plus a judge. Panels live in\n`panels/*.json` — those files are the source of truth. Model IDs use\nOpenRouter's `provider/model` form.\n\n| Panel | Member models | Judge (model · template) | Best for | Est. cost / query\\* |\n|-------|---------------|--------------------------|----------|---------------------|\n| `budget` | `xiaomi/mimo-v2.5`, `deepseek/deepseek-v4-flash`, `xiaomi/mimo-v2.5-pro` | `qwen/qwen3.7-plus` · `default` | Drafts, summaries, brainstorming, high-volume runs | $0.02–0.05 |\n| `quality` | `anthropic/claude-fable-5`, `openai/gpt-5.5` | `anthropic/claude-opus-4` · `deep_research` | High-stakes analysis, research, hard reasoning | $0.50–1.00 |\n| `code` | `openai/codex`, `anthropic/claude-opus-4`, `deepseek/deepseek-v4-pro` | `anthropic/claude-opus-4` · `code_review` | Code review, debugging, security analysis, codegen | $0.30–0.60 |\n| `self_fuse` | `deepseek/deepseek-v4-pro` ×2 (independent samples) | `deepseek/deepseek-v4-pro` · `default` | Measuring how much fusion alone helps, model held constant | ~$0.01 |\n\n\\* From each panel file's `estimated_cost_per_query`. Actual cost depends on\nprompt/response length and OpenRouter's live per-model pricing (OpenRouter is the\nsource of truth for rates). Run with `-v` to see the exact per-query cost.\n\n### Adding or editing a panel\n\nEach panel is a JSON file in `panels/`. The schema:\n\n```json\n{\n  \"schema_version\": 1,\n  \"name\": \"quality\",\n  \"description\": \"Frontier models for maximum answer quality.\",\n  \"models\": [\n    { \"slug\": \"anthropic/claude-fable-5\", \"role\": \"panelist\", \"max_tokens\": 8192 },\n    { \"slug\": \"openai/gpt-5.5\",           \"role\": \"panelist\", \"max_tokens\": 8192 }\n  ],\n  \"judge_model\": \"anthropic/claude-opus-4\",\n  \"judge_template\": \"deep_research\",\n  \"estimated_cost_per_query\": { \"min\": 0.50, \"max\": 1.00, \"currency\": \"USD\", \"unit\": \"query\" }\n}\n```\n\n`judge_template` names a file in `judges/` (without the `.md`). `max_tokens`, if\nset on a model entry, is forwarded to OpenRouter for that panel member. Drop in a\nnew `\u003cname\u003e.json` and it becomes selectable with `-p \u003cname\u003e`.\n\n---\n\n## Judge prompt templates\n\nAfter the panel responds, the judge model is given a **synthesis prompt** plus\nall the collected answers. Judge templates live in `judges/*.md` and are\nselected per panel via the `judge_template` field. The repo ships\n`default`, `deep_research`, `code_review`, `creative`, and `tool_synthesis`.\nThey generally\ninstruct the judge to:\n\n1. Read every panel response without assuming any one is correct.\n2. Identify points of **agreement** (treat as high-confidence) and\n   **disagreement** (surface, don't silently drop).\n3. Resolve conflicts on the merits, keeping the strongest reasoning from each.\n4. Produce **one** answer — not a list of \"Model A said… Model B said…\".\n5. Flag remaining uncertainty rather than papering over it.\n\nSpecialized templates tune this per use case — e.g. `code_review` (used by the\n`code` panel) prioritizes correctness and security; `deep_research` (used by\n`quality`) weights depth and rigor; `creative` favors originality. Because the\ntemplates are plain Markdown you check into the repo, you can edit synthesis\nbehavior without touching code.\n\n---\n\n## Python Library Usage\n\n`FusionEngine.fuse()` is **async** and takes an explicit list of model slugs (or\npanel model dictionaries with `slug` and optional `max_tokens`) plus a judge\nmodel. The CLI resolves a panel *name* like `quality` by reading\n`panels/*.json`.\n\n```python\nimport asyncio\nimport json\nfrom pathlib import Path\n\nfrom fusion import FusionEngine  # run from the project dir; see note below\n\n\ndef load_panel(name: str):\n    cfg = json.loads(Path(f\"panels/{name}.json\").read_text())\n    return cfg[\"models\"], cfg[\"judge_model\"]\n\n\nasync def main():\n    panel, judge_model = load_panel(\"quality\")\n\n    # Reads OPENROUTER_API_KEY from the environment by default.\n    engine = FusionEngine()\n    result = await engine.fuse(\n        \"What are the implications of quantum computing on cryptography?\",\n        panel=panel,\n        judge_model=judge_model,\n        web_search=False,\n    )\n\n    # The fused, synthesized answer (from the judge):\n    print(result.answer)\n\n    # Per-model detail (list[PanelResponse]):\n    for r in result.panel_responses:\n        status = r.error or f\"{r.latency_ms:7.0f}ms  ${r.cost_usd:.4f}\"\n        print(f\"{r.model:40s} {status}\")\n        if r.ok:\n            print(r.content)\n\n    # Run-level metadata:\n    print(\"judge:\", result.judge_response.model)\n    print(\"total cost: $%.4f\" % result.total_cost)\n\n\nasyncio.run(main())\n```\n\n`PanelResponse` exposes `model`, `content`, `tokens_in`, `tokens_out`,\n`latency_ms`, `cost_usd`, `error`, and the `ok` property. `FusionResult` exposes\n`answer`, `panel_responses`, `judge_response`, `total_cost`, `total_latency_ms`,\nand the `successful_panel` property.\n\n\u003e **Imports.** The public module import is currently\n\u003e `from fusion import FusionEngine, FusionResult, PanelResponse`. Installing with\n\u003e `python3 -m pip install -e .` also gives you the `fusion` and `fusion-engine`\n\u003e console scripts.\n\n---\n\n## OpenAI-Compatible API and FastAPI Server\n\nPrefer to call Fusion over HTTP — from another service or an agent — instead of\nshelling out to the CLI? `server.py` is a thin\n[FastAPI](https://fastapi.tiangolo.com) wrapper over the same engine, sharing\npanel resolution with the CLI via `panels.py`.\n\n```bash\npython3 -m pip install -e \".[server]\"\nexport OPENROUTER_API_KEY=sk-or-v1-...\nuvicorn server:app --host 127.0.0.1 --port 8000   # or: python3 server.py\n```\n\nIf you expose the API beyond localhost, set `FUSION_SERVER_API_KEY` and send\n`Authorization: Bearer \u003cvalue\u003e` on endpoints that spend credits (`/fuse` and\n`/v1/chat/completions`).\n\n| Method \u0026 path | Purpose |\n|---|---|\n| `GET /health` | Liveness, plus whether an OpenRouter key is configured. |\n| `GET /panels` | List configured panels with members, judge, and est. cost. |\n| `GET /panels/{name}` | Full JSON config for one panel. |\n| `POST /fuse` | Run a fusion; return the synthesized answer + per-model detail. |\n| `GET /v1/models` | OpenAI-compatible model list, one model per panel (`fusion/\u003cpanel\u003e`). |\n| `POST /v1/chat/completions` | OpenAI-compatible chat completion with fused tool-call support. |\n\n`POST /fuse` body — only `prompt` plus one of `panel`/`models` is required:\n\n```json\n{\n  \"prompt\": \"Compare REST vs gRPC for microservices\",\n  \"panel\": \"quality\",\n  \"judge_model\": null,\n  \"judge_template\": null,\n  \"web_search\": false\n}\n```\n\nPass `models` (a list of OpenRouter slugs) instead of `panel` to fuse an ad-hoc\nset, and `judge_model` / `judge_template` to override a panel's defaults.\n\n```bash\ncurl -s localhost:8000/fuse -H 'content-type: application/json' \\\n  -d '{\"prompt\":\"Explain CRDTs\",\"panel\":\"budget\"}' | jq .answer\n```\n\nThe response is the full `FusionResult` as JSON — `answer`, `panel_responses[]`\n(each with content, tokens, latency, cost, error), `judge_response`,\n`total_cost`, and `total_latency_ms`. Interactive docs live at `/docs`.\n\n---\n\n## OpenRouter Fusion Alternative\n\nOpenRouter offers a hosted multi-model \"fusion\" feature. Fusion Engine is a\nself-hosted OpenRouter Fusion alternative with more control:\n\n| | Fusion Engine (this project) | Hosted Fusion |\n|---|---|---|\n| **Judge prompts** | Yours — plain Markdown in `judges/`, editable per panel | Provider-defined |\n| **Panel composition** | Yours — any OpenRouter models, defined in `panels/*.json` | Limited / provider-curated |\n| **Where it runs** | Locally (or any host you control) | Provider-side |\n| **Transparency** | Full per-model responses, latency, tokens, cost | Aggregated |\n| **Vendor lock-in** | None — it's your code; swap providers freely | Tied to the provider's feature |\n| **Cost** | Pay only OpenRouter token costs | Same, plus whatever the feature adds |\n\nYou still use OpenRouter for the actual model calls (one key, many providers) —\nbut the orchestration, judging, and policy are yours.\n\n---\n\n## Benchmarking LLM Ensembles — Does Fusion Actually Beat One Model?\n\nThe whole premise is that a panel beats any single model. Don't take it on\nfaith — measure it. The `evals/` harness scores a panel's fusion against the\nright baselines on the same items.\n\n```bash\nexport OPENROUTER_API_KEY=sk-or-v1-...\npython3 evals/run_eval.py --panel quality --dataset evals/datasets/sample.jsonl\n# iterate cheaply with --limit 5\n```\n\n### Industry-standard benchmarks\n\nDon't hand-write items — pull real benchmarks with `evals/prepare.py`, then point\nthe runner at the generated dataset:\n\n```bash\npython3 -m pip install -e \".[eval]\"             # only needed for mmlu / gpqa\npython3 evals/prepare.py gsm8k --limit 100       # -\u003e evals/datasets/gsm8k.jsonl\npython3 evals/run_eval.py --panel quality --dataset evals/datasets/gsm8k.jsonl\n```\n\n| Benchmark | Tests | Grader | Source |\n|---|---|---|---|\n| `gsm8k` | grade-school math reasoning | `numeric` | GitHub, ungated |\n| `humaneval` | Python synthesis, run against unit tests | `code_exec` | GitHub, ungated |\n| `mmlu` | 57-subject knowledge (multiple choice) | `multiple_choice` | HF `cais/mmlu` |\n| `gpqa` | graduate-level science (multiple choice) | `multiple_choice` | HF `Idavidrein/gpqa` — **gated** (accept terms + `huggingface-cli login`) |\n\n`gsm8k`/`humaneval` download directly (httpx); `mmlu`/`gpqa` use the `datasets`\nlibrary. `--limit N` takes a seeded random sample to bound cost. `code_exec` runs\nmodel-generated code — **sandbox it** (container/VM) for untrusted models.\n\nA dataset is JSONL, one item per line (the format `prepare.py` emits, and what you\nwrite for a custom set):\n\n```json\n{\"id\": \"mc1\", \"prompt\": \"...\", \"target\": \"B\", \"grader\": \"multiple_choice\", \"category\": \"science\"}\n```\n\nFor each item the harness runs three kinds of **system** and grades each answer:\n\n- `fusion:\u003cpanel\u003e` — the whole panel + judge\n- `single:\u003cmodel\u003e` — each panel member on its own\n- `judge_alone:\u003cmodel\u003e` — the judge model alone, with no panel\n\nThat last one is the baseline most \"ensembles win\" claims forget: fusion adds the\npanel *on top of* the judge, so it has to beat the judge answering solo — and the\nbest single member — to justify its extra cost. The report prints per-system\naccuracy/cost/latency plus a **paired** comparison (Δaccuracy, win/tie/loss, and a\nbootstrap 95% CI) so you can tell a real gain from noise, and weigh it against the\nN× cost. Graders ship for `multiple_choice`, `numeric`, `exact_match`, and\n`contains` (in `evals/graders.py`); add your own there.\n\nBeyond a one-off check, this is how you **tune panels** — swap models, judges, or\ntemplates and keep what moves the metric for *your* workload.\n\n---\n\n## Contributing and Security\n\n| Resource | Link |\n|---|---|\n| Contributing guide | [`CONTRIBUTING.md`](CONTRIBUTING.md) |\n| Security policy | [`SECURITY.md`](SECURITY.md) |\n| CI workflow | [`.github/workflows/ci.yml`](.github/workflows/ci.yml) |\n| License | [`LICENSE`](LICENSE) |\n\nIssues and pull requests should include enough context to reproduce the behavior,\nespecially for model, panel, judge-template, and benchmark changes. Security\nreports should follow the private disclosure path in `SECURITY.md`.\n\n---\n\n## Roadmap\n\n- **Web UI** — a browser front-end (on top of the HTTP API) for running fusions\n  and diffing model answers.\n- **Streaming** — stream panel responses and the synthesis as they arrive.\n- **Result caching** — cache by `(prompt, panel)` to avoid paying twice for\n  identical runs.\n- **More evals** — add an LLM-as-judge grader (on a neutral model) for\n  open-ended tasks, more benchmarks (MATH, SWE-bench), and per-call result\n  caching so re-runs are free. (Benchmarks + harness already live in `evals/`:\n  GSM8K, HumanEval, MMLU, GPQA with numeric/code-exec/multiple-choice graders.)\n- **Package namespace** — add a stable `fusion_engine` import package while\n  preserving the current `fusion` module import.\n\n---\n\n## License\n\nMIT. See `LICENSE`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluckeyfaraday%2Ffusion-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluckeyfaraday%2Ffusion-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluckeyfaraday%2Ffusion-engine/lists"}