{"id":51033727,"url":"https://github.com/launchapp-dev/animus-plugin-testkit","last_synced_at":"2026-06-22T03:01:53.901Z","repository":{"id":360025933,"uuid":"1247690534","full_name":"launchapp-dev/animus-plugin-testkit","owner":"launchapp-dev","description":"Conformance test harness + benchmarks for Animus provider plugins","archived":false,"fork":false,"pushed_at":"2026-06-11T21:52:59.000Z","size":131,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-11T23:19:16.286Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/launchapp-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-23T16:43:40.000Z","updated_at":"2026-06-11T21:53:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/launchapp-dev/animus-plugin-testkit","commit_stats":null,"previous_names":["launchapp-dev/animus-plugin-testkit"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/launchapp-dev/animus-plugin-testkit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/launchapp-dev%2Fanimus-plugin-testkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/launchapp-dev%2Fanimus-plugin-testkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/launchapp-dev%2Fanimus-plugin-testkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/launchapp-dev%2Fanimus-plugin-testkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/launchapp-dev","download_url":"https://codeload.github.com/launchapp-dev/animus-plugin-testkit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/launchapp-dev%2Fanimus-plugin-testkit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34632723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-22T03:01:53.349Z","updated_at":"2026-06-22T03:01:53.895Z","avatar_url":"https://github.com/launchapp-dev.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# animus-plugin-testkit\n\nConformance test harness, mock CLIs, and benchmark suite for\n[Animus](https://github.com/launchapp-dev) plugins\n(v0.4.x stdio JSON-RPC protocol).\n\nThis repo lets a plugin author run the same set of scenarios the official\nAnimus CI runs, locally, without any network, API keys, or a live LLM\naccount. Every scenario drives the plugin through the same handshake the real\ndaemon uses and validates the streaming notification + response shape against\n[`animus-protocol`](https://github.com/launchapp-dev/animus-protocol).\n\nStatus: **v0.3.0** — provider, subject, transport, trigger, and log-storage\nplugin conformance suites; concurrent-cancel dispatcher; oai-style scenario\nvariants.\n\n## Crates\n\n| Crate                                       | Description                                                         |\n| ------------------------------------------- | ------------------------------------------------------------------- |\n| `testkit-core`                              | Shared types: `ScenarioFile`, `TestResult`, `MatrixReport`.         |\n| `plugin-harness`                            | Bin `animus-plugin-harness` — runs scenarios against a plugin.      |\n| `plugin-bench`                              | Bin `animus-plugin-bench` — TTFT, throughput, end-to-end duration.  |\n| `provider-conformance`                      | Baseline scenarios for provider plugins (10 scenarios).             |\n| `subject-conformance`                       | Baseline scenarios for subject backend plugins (5 scenarios).       |\n| `transport-conformance`                     | Baseline scenarios for transport backend plugins (4 scenarios).    |\n| `trigger-conformance`                       | Baseline scenarios for trigger backend plugins (3 scenarios).      |\n| `log-storage-conformance`                   | Baseline scenarios for log storage backend plugins (4 scenarios).  |\n| `mock-cli-claude` / `-codex` / `-gemini` /  | Fake LLM CLIs that emit canonical streams for each scenario.        |\n| `mock-cli-opencode`                         |                                                                     |\n| `mock-cli-oai`                              | Mock OpenAI-compatible HTTP server for `animus-provider-oai`.       |\n\n## Quickstart\n\n```bash\n# 1. Build the workspace (mocks + harness + bench).\ncargo build --release --workspace\n\n# 2. Build the plugin you want to test (here: animus-provider-claude).\ncd ../animus-provider-claude\ncargo build --release\ncd -\n\n# 3. Run conformance — provider (default), subject, transport, trigger, or log-storage.\n./target/release/animus-plugin-harness conformance \\\n  --plugin ../animus-provider-claude/target/release/animus-provider-claude\n\n./target/release/animus-plugin-harness conformance --kind subject \\\n  --plugin ../animus-subject-default/target/release/animus-subject-default\n\n./target/release/animus-plugin-harness conformance --kind transport \\\n  --plugin ../animus-transport-http/target/release/animus-transport-http\n\n./target/release/animus-plugin-harness conformance --kind trigger \\\n  --plugin ../animus-trigger-webhook/target/release/animus-trigger-webhook\n\n./target/release/animus-plugin-harness conformance --kind log-storage \\\n  --plugin ../animus-log-storage-file/target/release/animus-log-storage-file\n\n# 4. (Optional) save a machine-readable report.\n./target/release/animus-plugin-harness conformance \\\n  --plugin ../animus-provider-claude/target/release/animus-provider-claude \\\n  --report-json ./report-claude.json\n```\n\nThe harness injects `CLAUDE_BIN`, `CODEX_BIN`, `GEMINI_BIN`,\n`OPENCODE_BIN`, and `MOCK_SCENARIO` into the plugin's environment before\nspawning, so the provider plugin transparently uses our mock CLIs instead\nof the real binaries on `$PATH`. **No network, no API keys.**\n\n## Provider scenarios\n\nThe 10 baseline scenarios live in [`scenarios/`](./scenarios/):\n\n| Name                       | What it exercises                                                          |\n| -------------------------- | -------------------------------------------------------------------------- |\n| `streaming-short`          | 3-delta short completion, final aggregated text.                           |\n| `streaming-medium`         | ~40 deltas — sanity check buffering.                                       |\n| `streaming-long`           | ~300 deltas — back-pressure, large output assembly.                        |\n| `tool-call-single`         | One `tool_use` + `tool_result` round-trip surrounded by output.            |\n| `tool-call-parallel`       | Two parallel `tool_use` blocks resolved in one envelope.                   |\n| `tool-call-single-oai`     | Stateless OpenAI-style: `ToolCall` only, host owns tool execution.         |\n| `tool-call-parallel-oai`   | Same as above, parallel.                                                   |\n| `error-recovery`           | Mid-stream garbled line that the provider parser must ignore.              |\n| `cancellation`             | Concurrent dispatcher issues `agent/cancel` mid-flight (see below).        |\n| `resume-session`           | `agent/resume` against a prior session id.                                 |\n\nPlugins that don't advertise the relevant `$harness/*` capability for a\nscenario are SKIPPED (not failed). Plugins opt in by adding to their\n`initialize` capabilities:\n\n- `$harness/cancellation-loop-v2` — opt-in to the v0.3.0 concurrent-cancel test\n- `$harness/oai-style` — opt-in to the stateless OpenAI tool-call scenarios\n\n## Cancellation: concurrent dispatcher (v0.3.0)\n\nThe harness now spawns a side-task per scenario when `cancel_after_ms` is\nset. It watches for the first notification to learn the session id, waits\nthe configured delay, then issues `agent/cancel { session_id }` via the\nsame stdio pipe. The plugin should terminate the run with\n`BackendError::Cancelled` (`REQUEST_CANCELLED`, `-32002`) or emit a\nnon-recoverable `error` notification within the scenario timeout.\n\nA `fake-cancellable-plugin` test fixture lives at\n`crates/plugin-harness/src/bin/fake_cancellable_plugin.rs` and is\nexercised by `crates/plugin-harness/tests/cancellation.rs` to verify the\nwire dance end-to-end without depending on any real provider.\n\n## Subject / transport / trigger / log-storage conformance (v0.3.0)\n\nEach is a separate crate that exports `pub fn baseline_scenarios() -\u003e\nVec\u003cTestScenario\u003e` plus a `pub async fn run_conformance(plugin_path:\n\u0026Path) -\u003e Result\u003cMatrixReport\u003e`. External CI pipelines can depend on the\ncrate directly:\n\n```toml\n[dev-dependencies]\nsubject-conformance = { git = \"https://github.com/launchapp-dev/animus-plugin-testkit\", tag = \"v0.3.0\" }\n```\n\n| Suite     | Scenarios                                                              |\n| --------- | ---------------------------------------------------------------------- |\n| Subject   | `handshake`, `advertise-kinds`, `subject-list`, `subject-crud-round-trip`, `subject-watch-stream` |\n| Transport | `handshake`, `start-shutdown`, `schema-health`, `serve-and-accept`     |\n| Trigger   | `handshake`, `watch-fires-event`, `event-payload-shape`                |\n| Log storage | `handshake`, `schema-health`, `store-query-round-trip`, `tail-replay` |\n\nTrigger backends that need an external stimulus (a webhook POST, a Slack\nmessage, a cron tick) will SKIP `watch-fires-event` and\n`event-payload-shape`. The handshake still PASSes.\n\nLog-storage conformance runs each scenario in an isolated temporary\ndirectory and sets `ANIMUS_LOG_FILE_PATH` to a scenario-local JSONL file\nbefore spawning the plugin.\n\n## Smoke tests (proof the harness works)\n\nCaptured 2026-05-24 against v0.3.0:\n\n```text\n$ animus-plugin-harness conformance --kind subject \\\n  --plugin animus-subject-default/target/release/animus-subject-default\n\n==\u003e conformance report: animus-subject-default v0.1.1\n    kind: subject_backend   protocol: 1.0.0\n  [PASS]  handshake                     0ms\n  [PASS]  advertise-kinds               0ms\n  [PASS]  subject-list                  8ms\n  [PASS]  subject-crud-round-trip       8ms\n  [PASS]  subject-watch-stream          8ms\nsummary: total 5   passed 5   failed 0   skipped 0\nOVERALL: PASS\n```\n\n```text\n$ animus-plugin-harness conformance --kind transport \\\n  --plugin animus-transport-http/target/release/animus-transport-http\n\n==\u003e conformance report: animus-transport-http v0.1.0\n    kind: transport_backend   protocol: 1.0.0\n  [PASS]  handshake                     0ms\n  [PASS]  start-shutdown                6ms\n  [PASS]  schema-health                 5ms\n  [PASS]  serve-and-accept             83ms\nsummary: total 4   passed 4   failed 0   skipped 0\nOVERALL: PASS\n```\n\n```text\n$ animus-plugin-harness conformance --kind trigger \\\n  --plugin animus-trigger-webhook/target/release/animus-trigger-webhook\n\n==\u003e conformance report: animus-trigger-webhook v0.1.1\n    kind: trigger_backend   protocol: 1.0.0\n  [PASS]  handshake                     0ms\n  [SKIP]  watch-fires-event             0ms\n  [SKIP]  event-payload-shape           0ms\nsummary: total 3   passed 1   failed 0   skipped 2\nOVERALL: PASS\n```\n\n```text\n$ animus-plugin-harness conformance \\\n  --plugin animus-provider-claude/target/release/animus-provider-claude\n\n==\u003e conformance report: animus-provider-claude v0.2.1\n    kind: provider   protocol: 1.0.0\n  [SKIP]  cancellation                 34ms\n  [PASS]  error-recovery              409ms\n  [PASS]  resume-session              395ms\n  [PASS]  streaming-long              402ms\n  [PASS]  streaming-medium            391ms\n  [PASS]  streaming-short             406ms\n  [PASS]  tool-call-parallel          399ms\n  [SKIP]  tool-call-parallel-oai       34ms\n  [PASS]  tool-call-single            374ms\n  [SKIP]  tool-call-single-oai         33ms\nsummary: total 10   passed 7   failed 0   skipped 3\nOVERALL: PASS\n```\n\nThe 3 SKIPs are expected: animus-provider-claude does not advertise the\nopt-in capabilities `$harness/cancellation-loop-v2` or\n`$harness/oai-style`. Provider plugins that want those tests to run\nshould advertise those capabilities in their `initialize` capabilities\nlist once their backend supports the relevant semantics.\n\n## Adding scenarios\n\nDrop a YAML file into `scenarios/` (or your own directory and pass\n`--scenarios \u003cdir\u003e`):\n\n```yaml\nname: my-scenario\ndescription: ...\ntimeout_ms: 8000\nmethod: run            # or `resume`\nrequires_capabilities: [\"agent/resume\"]   # optional gate\ncancel_after_ms: 100   # optional — triggers the concurrent-cancel dispatcher\nrequest:\n  prompt: \"hello\"\n  model: claude-sonnet-4-6\nexpected_notifications:\n  - kind: output\n    contains: \"Hello\"\nexpected_response:\n  output_contains: \"Hello\"\n  exit_code: 0\nmock:\n  tool: claude\n  mock_scenario: streaming-short\n```\n\nSee [`docs/writing-scenarios.md`](./docs/writing-scenarios.md) for the full\nmatcher surface.\n\n## Benchmarks\n\n```bash\n./target/release/animus-plugin-bench \\\n  --plugin ../animus-provider-claude/target/release/animus-provider-claude \\\n  --iterations 10 \\\n  --scenario streaming-medium\n```\n\nThe benchmark runner can also compare provider plugins, mock scenarios, and\nmodel ids as a matrix:\n\n```bash\n./target/release/animus-plugin-bench \\\n  --plugin ../animus-provider-claude/target/release/animus-provider-claude \\\n  --plugin ../animus-provider-oai/target/release/animus-provider-oai \\\n  --suite full \\\n  --model claude-sonnet-4-6,gpt-5-mini \\\n  --iterations 20 \\\n  --warmup 2 \\\n  --report-json ./bench-report.json \\\n  --report-csv ./bench-summary.csv\n```\n\nSuites:\n\n| Suite | Scenarios |\n| ----- | --------- |\n| `smoke` | `streaming-short` |\n| `streaming` | `streaming-short`, `streaming-medium`, `streaming-long` |\n| `tools` | `tool-call-single`, `tool-call-parallel` |\n| `full` | `streaming`, `tools`, `error-recovery` |\n\nReports include TTFT (time-to-first-token), end-to-end duration, p95 latency,\nnotification count, output bytes, and throughput per\nplugin/scenario/model cell. `--mock-scenario` remains as a backward-compatible\nalias for `--scenario`. There's also a `criterion` micro-bench for the scenario\nloader (`cargo bench -p plugin-bench`).\n\n## CI Integration\n\nSee [`docs/ci-integration.md`](./docs/ci-integration.md). The shipped\n[`provider-matrix.yml`](.github/workflows/matrix.yml) workflow runs the\nharness against every published `launchapp-dev/animus-provider-*` repo every\nMonday morning.\n\n## Known Limitations\n\n- **Trigger conformance is shallow without an external stimulus.** The\n  `webhook` backend (and any backend that needs an inbound HTTP POST,\n  Slack event, cron tick, ...) will SKIP `watch-fires-event` and\n  `event-payload-shape`. A future revision could spin up a stimulus\n  injector per backend kind.\n- **Subject CRUD requires create+get to share state.** The harness keeps a\n  single plugin process alive across the round-trip so backends with\n  in-process state (the default task store) can be exercised. Backends\n  that persist to a global path may surface cross-test contamination.\n- **Provider plugins must advertise opt-in capabilities** for the new\n  cancellation + oai-style tests to run. Plugins that don't are SKIPPED\n  (not failed). Update each plugin's `initialize` capabilities to opt in.\n- The harness depends only on published `animus-protocol` crates — it\n  intentionally has **no dependency** on the in-tree `animus-cli`.\n\n## Versioning\n\nThis release is pinned to `animus-protocol v0.1.9`. Protocol bumps in the\npatch range remain compatible; minor bumps may require harness changes.\n\n## License\n\nMIT — see [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaunchapp-dev%2Fanimus-plugin-testkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaunchapp-dev%2Fanimus-plugin-testkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaunchapp-dev%2Fanimus-plugin-testkit/lists"}