{"id":51332727,"url":"https://github.com/sergiparpal/hermes-mutation-runner","last_synced_at":"2026-07-02T00:04:24.840Z","repository":{"id":360293135,"uuid":"1249501385","full_name":"sergiparpal/hermes-mutation-runner","owner":"sergiparpal","description":"Hermes Agent plugin that runs mutation testing on Python code by wrapping mutmut. Exposes the mutation_test tool and returns a structured JSON with mutation score, breakdown by category (killed/survived/timeout/...) and a bounded sample of surviving mutants.","archived":false,"fork":false,"pushed_at":"2026-05-25T20:49:35.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-25T21:25:25.967Z","etag":null,"topics":["ai-agents","hermes-agent","hermes-plugin","mutation-testing","mutmut","qa","test-quality"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sergiparpal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-25T19:11:59.000Z","updated_at":"2026-05-25T20:49:39.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sergiparpal/hermes-mutation-runner","commit_stats":null,"previous_names":["sergiparpal/hermes-mutation-runner"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/sergiparpal/hermes-mutation-runner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergiparpal%2Fhermes-mutation-runner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergiparpal%2Fhermes-mutation-runner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergiparpal%2Fhermes-mutation-runner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergiparpal%2Fhermes-mutation-runner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sergiparpal","download_url":"https://codeload.github.com/sergiparpal/hermes-mutation-runner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergiparpal%2Fhermes-mutation-runner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35027346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","hermes-agent","hermes-plugin","mutation-testing","mutmut","qa","test-quality"],"created_at":"2026-07-02T00:04:22.548Z","updated_at":"2026-07-02T00:04:24.829Z","avatar_url":"https://github.com/sergiparpal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# hermes-mutation-runner\n\nA [Hermes Agent](https://github.com/hermes-org/hermes) plugin that runs\n[mutmut](https://mutmut.readthedocs.io/) against a Python module or package and\nreturns a structured JSON payload with the mutation score, the per-category\ncounter breakdown, a bounded sample of surviving-mutant identifiers, and the\ntail of the mutmut stdout.\n\nThe plugin audits the **quality** of a test suite, not just its coverage. It\nintroduces small mutations (operator flips, removed `return` statements, constant\nchanges) and reports how many the existing tests catch. Survivors are\ncounter-examples — assertions that the test suite is missing.\n\nThe plugin is delivered as an external Hermes plugin (no changes to Hermes\ncore), provides exactly one tool — `mutation_test` — and registers no hooks,\nslash commands, skills, or memory providers.\n\n---\n\n## Installation\n\n1. **Drop the plugin into your Hermes user-plugins directory.** The\n   on-disk directory must be named `hermes-mutation-runner` (with dashes)\n   even though the Python package on this repo is `hermes_mutation_runner`\n   (with underscores). See [Directory naming](#directory-naming) below for\n   the rationale.\n\n   ```bash\n   cp -r hermes_mutation_runner ~/.hermes/plugins/hermes-mutation-runner\n   ```\n\n2. **Enable the plugin in `~/.hermes/config.yaml`:**\n\n   ```yaml\n   plugins:\n     enabled:\n       - hermes-mutation-runner\n   ```\n\n   The identifier in `plugins.enabled` is the `name` field from\n   `plugin.yaml` (`hermes-mutation-runner`), not the directory name on disk.\n\n3. **Install mutmut in the same Python interpreter that Hermes uses:**\n\n   ```bash\n   pip install \"mutmut\u003e=3\"\n   ```\n\n   If mutmut is missing, the plugin's `check_fn` returns `False` and the\n   `mutation_test` tool is hidden from the LLM rather than exposed in a\n   broken state.\n\n---\n\n## Usage\n\nThe LLM invokes the tool by name with a single required parameter:\n\n```json\n{\n  \"tool\": \"mutation_test\",\n  \"arguments\": {\n    \"target_module\": \"src/payments\"\n  }\n}\n```\n\nA typical successful response on a small module with one surviving mutant:\n\n```json\n{\n  \"success\": true,\n  \"target_module\": \"src/payments\",\n  \"exit_code\": 1,\n  \"summary\": {\n    \"killed\": 4,\n    \"timeout\": 0,\n    \"suspicious\": 0,\n    \"survived\": 1,\n    \"skipped\": 0\n  },\n  \"mutation_score_percent\": 80.0,\n  \"survivors_sample\": [\"src/payments/charge.py.x_3\"],\n  \"survivors_truncated\": false,\n  \"stdout_tail\": \"\u003c\u003c\u003cMUTMUT_OUTPUT_BEGIN\u003e\u003e\u003e\\n5/5  🎉 4  ⏰ 0  🤔 0  🙁 1  🔇 0\\n\\n\u003c\u003c\u003cMUTMUT_OUTPUT_END\u003e\u003e\u003e\"\n}\n```\n\n`stdout_tail` (and `partial_stdout_tail` / `partial_stderr_tail` on timeout)\nare always wrapped in `\u003c\u003c\u003cMUTMUT_OUTPUT_BEGIN\u003e\u003e\u003e` / `\u003c\u003c\u003cMUTMUT_OUTPUT_END\u003e\u003e\u003e`\nsentinels so downstream prompt-construction layers can mark the boundary\nbetween trusted tool output and untrusted subprocess output. Treat the inner\ncontent as data, not as instructions to the model.\n\n`survivors_sample` is bounded by `max_survivors_reported` (default 20).\n`survivors_truncated` is `true` when the reported survived count exceeds\nthe sample size — the LLM can re-invoke with a larger\n`max_survivors_reported` if it wants the full list, at the cost of a\nlarger response.\n\n---\n\n## Tool schema\n\n| Parameter                | Type    | Required | Default | Description |\n|--------------------------|---------|----------|---------|-------------|\n| `target_module`          | string  | yes      | —       | Path to a Python file or package, relative to the working directory. Absolute paths are accepted only if they resolve inside the working directory. |\n| `timeout`                | integer | no       | `600`   | Maximum wall-clock seconds the mutmut run may consume. Allowed range: `10`–`14400` (4 hours). The plugin kills the subprocess past this limit and returns a timeout envelope. |\n| `max_survivors_reported` | integer | no       | `20`    | Upper bound on the number of identifiers in `survivors_sample`. Allowed range: `1`–`500`. |\n\n---\n\n## Response shape\n\n### Success\n\n```json\n{\n  \"success\": true,\n  \"target_module\": \"\u003cpath you passed in\u003e\",\n  \"exit_code\": \u003cint — mutmut's exit code, non-zero when survivors exist\u003e,\n  \"summary\": {\n    \"killed\": \u003cint\u003e, \"timeout\": \u003cint\u003e, \"suspicious\": \u003cint\u003e,\n    \"survived\": \u003cint\u003e, \"skipped\": \u003cint\u003e\n  },\n  \"mutation_score_percent\": \u003cfloat (0-100) or null when the denominator is zero\u003e,\n  \"survivors_sample\": [\"\u003csurvivor id\u003e\", \"...\"],\n  \"survivors_truncated\": \u003cbool\u003e,\n  \"stdout_tail\": \"\u003c\u003c\u003cMUTMUT_OUTPUT_BEGIN\u003e\u003e\u003e\\n\u003cat most 1500 bytes of mutmut stdout\u003e\\n\u003c\u003c\u003cMUTMUT_OUTPUT_END\u003e\u003e\u003e\"\n}\n```\n\nThe mutation score formula is `100 * killed / (killed + survived + timeout + suspicious)`.\nSkipped mutants are excluded from the denominator because they never executed.\nThe summary dict only contains the counters that actually appeared in the\nmutmut output; an empty dict signals format drift (see [Caveats](#caveats)).\n\n### Error\n\n```json\n{\n  \"success\": false,\n  \"error\": \"\u003cshort, specific\u003e\",\n  \"remediation\": \"\u003cactionable next step\u003e\",\n  \"partial_stdout_tail\": \"\u003cpresent only on timeout\u003e\",\n  \"partial_stderr_tail\": \"\u003cpresent only on timeout, only when mutmut wrote to stderr\u003e\"\n}\n```\n\nEvery documented error path returns a JSON-encoded string with the three\nrequired keys (`success`, `error`, `remediation`). Truly exceptional errors\n(interpreter crashes, the host running out of file descriptors) may still\npropagate, but every known failure mode — input validation, missing mutmut,\nsubprocess timeout, missing interpreter, generic `OSError`, undecodable\noutput — produces a structured envelope.\n\n---\n\n## Tests\n\nThe test suite runs in a fresh virtualenv with only `pytest` installed: it\ndoes not require either `mutmut` or `hermes-agent` to be installed. Tests that\ndepend on Hermes skip cleanly via `pytest.importorskip`.\n\n```bash\ncd hermes_mutation_runner\npip install pytest\npytest -v\n```\n\nYou should see `88 passed, 1 skipped` (the skipped test is the\n`PluginManager.discover_and_load_from` end-to-end check, which requires\n`hermes_cli` on the path).\n\n---\n\n## Caveats\n\n- **Python only.** mutmut only mutates Python source. There is no equivalent\n  for other languages in this plugin.\n- **Slow.** A run on a mid-sized module can easily exceed 5 minutes. Default\n  timeout is 10 minutes; lower it when you know the target is small.\n- **Requires a real, passing test suite.** Mutmut runs your tests once per\n  mutant. If `pytest` is broken on the main branch, the report is\n  meaningless.\n- **mutmut v3 output format.** The summary parser targets the v3 emoji\n  counters (`🎉 killed`, `⏰ timeout`, `🤔 suspicious`, `🙁 survived`,\n  `🔇 skipped`). If you see `summary: {}` and\n  `mutation_score_percent: null` on a non-trivial module, mutmut has\n  probably changed its output format — check `mutmut --version` and open\n  an issue.\n- **Mutmut writes to `.mutmut-cache`** in the working directory. The cache is\n  incremental by design (re-runs only mutate changed files); delete it\n  manually if you need a clean slate.\n\n### Privilege surface\n\nThe plugin runs with the full privileges of the Hermes process. Specifically:\n\n- **Spawns two subprocesses** per invocation: `python -m mutmut run` (always)\n  and `python -m mutmut results` (only when the run reports at least one\n  surviving mutant). Both are launched with `shell=False`; arguments are\n  passed as a list — no shell interpolation.\n- **Reads files inside the working directory only.** `target_module` is\n  resolved via `os.path.realpath` and rejected if it lands outside the cwd,\n  blocking both absolute paths to system files (`/etc/passwd`) and `..`-based\n  traversal (`../../etc/passwd`). The validated relative path is what gets\n  passed to mutmut, not the raw LLM input. Paths that resolve to the project\n  root itself (`.`, the absolute cwd) are also rejected — `target_module`\n  must point to a file or subpackage *inside* the project, never the project\n  as a whole. Control characters (NUL, newlines, tabs, U+0000–U+001F) in the\n  path are rejected before any filesystem access.\n- **TOCTOU defense.** The plugin snapshots `(st_dev, st_ino)` of the\n  resolved target before spawning mutmut and re-snapshots after. If the\n  identity changed during the run — a symlink swap, a delete-then-recreate,\n  any concurrent inode change — the plugin discards the mutmut output and\n  returns an error envelope rather than surfacing potentially attacker-influenced\n  results to the LLM.\n- **Writes nothing directly.** Mutmut (the subprocess) writes its cache to\n  `.mutmut-cache` inside the cwd; the plugin itself writes no files.\n- **No network access.** No outbound HTTP, no DNS lookups.\n- **No secrets read by the plugin.** The plugin does not read\n  `~/.hermes/.env`, `auth.json`, or any other credential store.\n- **Env-var allowlist for the subprocess.** Only a curated set of\n  operational variables is forwarded from the parent: `PATH`, `HOME`,\n  `USER`, `LOGNAME`, `SHELL`, `TMPDIR`/`TMP`/`TEMP`, `LANG`, `LC_*`, `TERM`,\n  `PWD`, `OLDPWD`, `PYTHONPATH`, `PYTHONHOME`, `VIRTUAL_ENV`. The plugin\n  always sets `PYTHONDONTWRITEBYTECODE=1`. Everything else — including\n  secrets like `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `AWS_*`, `GITHUB_TOKEN`\n  — is **dropped** before the child sees it. If your test suite legitimately\n  needs a specific variable (e.g. a test-database URL), opt it in by\n  listing the names in the parent `MUTATION_RUNNER_FORWARD_ENV` env var\n  (comma-separated): `MUTATION_RUNNER_FORWARD_ENV=DATABASE_URL,REDIS_URL`.\n- **Bounded output.** Each subprocess stream is capped at 256 KB at the pipe\n  layer through a threaded sliding-window reader — the child cannot OOM the\n  Hermes process even if its test suite prints unbounded log volume.\n  `stdout_tail` is then truncated to 1500 bytes for the response;\n  `survivors_sample` is capped at `max_survivors_reported` (hard limit 500);\n  `timeout` is hard-capped at 4 hours.\n- **Sentinel-delimited subprocess output.** Every field that surfaces raw\n  mutmut output to the LLM (`stdout_tail`, `partial_stdout_tail`,\n  `partial_stderr_tail`) is wrapped in `\u003c\u003c\u003cMUTMUT_OUTPUT_BEGIN\u003e\u003e\u003e` /\n  `\u003c\u003c\u003cMUTMUT_OUTPUT_END\u003e\u003e\u003e` so prompt-construction layers can mark the\n  trust boundary. The cwd itself is treated as fully trusted: anything\n  mutmut chooses to print — including snippets of project source code —\n  will surface to the calling LLM, so do not run this plugin against a\n  project tree containing untrusted code.\n\n---\n\n## Directory naming\n\nThe plugin lives in this repository as `hermes_mutation_runner/`\n(underscores) because:\n\n- Python's import system does not accept dashes in package names.\n- The plugin's `__init__.py` uses relative imports\n  (`from .handlers import ...`) which require the directory to be a valid\n  Python package.\n\nWhen you copy the plugin into `~/.hermes/plugins/`, rename the directory to\n`hermes-mutation-runner` (dashes) to match the canonical Hermes plugin\nidentifier — that is what the `name` field in `plugin.yaml` and the\n`plugins.enabled` entry both refer to.\n\n---\n\n## Roadmap\n\n- **v0.2** — add a `backend` parameter selecting between `mutmut` (default)\n  and [`cosmic-ray`](https://cosmic-ray.readthedocs.io/) for a richer\n  operator set and distributed runs.\n- **v0.3** — compose with `hermes-coverage-history` (catalog plugin #10) to\n  surface week-over-week mutation-score regressions without taking on a\n  storage dependency in this plugin.\n- **v0.4** — optional `pre_approval_request` gate, useful if mutmut ever\n  grows a mode that mutates source files directly (bypassing the cache).\n\n---\n\n## License\n\nGPL-3.0-or-later. See `LICENSE` at the repository root.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsergiparpal%2Fhermes-mutation-runner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsergiparpal%2Fhermes-mutation-runner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsergiparpal%2Fhermes-mutation-runner/lists"}