An open API service indexing awesome lists of open source software.

https://github.com/sumithr/sumo-qa

MCP server that brings senior-QA discipline to AI coding assistants — test planning, TDD, mutation testing, code review.
https://github.com/sumithr/sumo-qa

ai-agents claude-code code-review istqb mcp mcp-server mutation-testing python qa senior-qa tdd testing

Last synced: 1 day ago
JSON representation

MCP server that brings senior-QA discipline to AI coding assistants — test planning, TDD, mutation testing, code review.

Awesome Lists containing this project

README

          


sumo-qa — strong QA, crouching rikishi mark

# sumo-qa MCP

[![tests](https://github.com/sumithr/sumo-qa/actions/workflows/test.yml/badge.svg)](https://github.com/sumithr/sumo-qa/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/sumo-qa?cacheSeconds=300&v=0.46.0)](https://pypi.org/project/sumo-qa/)
[![Python](https://img.shields.io/pypi/pyversions/sumo-qa?cacheSeconds=300&v=0.46.0)](https://pypi.org/project/sumo-qa/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue)](LICENSE)

An MCP server that brings senior-QA discipline to AI coding assistants — test planning, TDD, mutation testing, code review.

> [!IMPORTANT]
> sumo-qa is an advisor, not an oracle. Like any AI tool it can be wrong. Your judgment and your team's standards are the final word.

> ### 🚀 New here? **[5-minute demo →](DEMO.md)**
> One install line, one prompt on your repo, the workflow runs on real code.

## Why it exists

Ask a stock AI assistant to QA a change and you get the junior answer: "add unit tests, consider edge cases, maybe test performance." That's a checklist.

sumo-qa makes the agent work the way a senior QA does:

- Names 3–7 risks tied to specific files and lines, not categories
- Picks one design technique per risk from an ISTQB-grounded catalogue (boundary-value, decision-table, property-based, mutation)
- Runs your test suite fresh in the current turn before any "safe to merge" claim
- Holds TDD's red phase before any production code is written
- Keeps production code locked while strengthening tests against mutation survivors
- Won't ship a plan without measurable entry and exit criteria

The discipline lives in a library of [skill files](skills/), each followed literally — every one has an Iron Law and a HARD-GATE callout the LLM can't talk past. Skills route automatically from natural-language prompts; you don't need to remember to invoke them.

## Install

```bash
pip install sumo-qa && sumo-qa-install
```

`sumo-qa-install` with no flag configures every host it detects. Target a single host with `--claude-code`, `--vscode --workspace `, or `--jetbrains`.

On Windows PowerShell, use (`&&` isn't a valid separator in Windows PowerShell, and pip's script directory is often off PATH, so use the module form):

```powershell
py -m pip install sumo-qa; if ($?) { py -m sumo_qa.installer }
```

If `sumo-qa-install` isn't on your PATH (e.g. `pip install --user` without `~/.local/bin` exported), use the PATH-proof module form: `python -m pip install sumo-qa && python -m sumo_qa.installer`.

Prefer one command that installs, configures, and verifies in a single shot? From a clone of this repo, run `./install.sh` (macOS/Linux) or `.\install.ps1` (Windows) — thin wrappers that route to the same `pip install` + `python -m sumo_qa.installer` + `sumo-qa-doctor` steps, with `--update`, `--doctor`, per-host `--host`, and an ownership-aware `--uninstall` flag. CI-verified on Linux/macOS/Windows. Details: [docs/INSTALL.md#one-command-wrapper-installsh--installps1](docs/INSTALL.md#one-command-wrapper-installsh--installps1).

Restart your host or open a fresh chat afterwards.

### Plugin install from a local clone (Claude Code, session-scoped)

Prefer the plugin experience over pip? Clone the repo and pass `--plugin-dir` to `claude` on each invocation. This loads the `.claude-plugin/plugin.json` manifest directly — no pip install needed, skills + hooks + MCP server come from this checkout.

**Prerequisite:** `uv` on PATH (Astral's package runner — one-line install, no Python prerequisite). Skip if `uv --version` already resolves:

```bash
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Homebrew
brew install uv
```

Then clone and launch:

```bash
git clone https://github.com/sumithr/sumo-qa.git
claude --plugin-dir /path/to/sumo-qa
```

Session-scoped: every `claude` invocation needs the flag — plain `claude` (no flag) starts a session with no sumo-qa loaded. Use `/reload-plugins` inside the session to pick up edits without restarting.

> Persistent marketplace install (one-time setup, no flag on every launch): add this repo as a Claude Code plugin marketplace and install the plugin —
>
> ```text
> /plugin marketplace add sumithr/sumo-qa
> /plugin install sumo-qa@sumo-qa
> ```
>
> The marketplace catalog (`.claude-plugin/marketplace.json`) is generated from the canonical source and schema-validated in CI. **This flow is wired but the live `marketplace add` → `install` round-trip has not yet been verified end-to-end in a Claude Code session** — until that is confirmed, the `pip install sumo-qa && sumo-qa-install` path above remains the recommended persistent install. Full architecture + dev-iteration detail: [docs/INSTALL.md#plugin-format-install-claude-code--codex](docs/INSTALL.md#plugin-format-install-claude-code--codex).

### Something not working?

```bash
# pip-install path (after `pip install sumo-qa`)
sumo-qa-doctor # or `python -m sumo_qa.doctor` if not on PATH

# plugin-install path (no pip install required) — inside a Claude Code session
!sumo-qa-doctor # the plugin ships bin/sumo-qa-doctor on PATH

# plugin-install path from outside Claude Code
uvx --from /path/to/plugin/source sumo-qa-doctor
```

Read-only setup diagnostics — checks Python + sumo-qa version, install mode, the MCP `initialize` + `tools/list` handshake, and every host config the installer touches (Claude Code, Claude Desktop, VS Code workspace, JetBrains detection, Codex plugin). Each failure prints the exact `Fix:` command. `--json` for machine output. Details: [docs/INSTALL.md#diagnosing-setup-with-sumo-qa-doctor](docs/INSTALL.md#diagnosing-setup-with-sumo-qa-doctor).

Per-host flags, schema differences, and troubleshooting: [docs/INSTALL.md](docs/INSTALL.md). Want to install from a local clone — to try an unreleased branch or run with your team's standards / knowledge packs editable in place? See [docs/INSTALL.md#install-from-a-local-clone](docs/INSTALL.md#install-from-a-local-clone). For the pip path use `python scripts/dev_install.py`; for the Claude Code plugin path use `claude --plugin-dir /path/to/sumo-qa` ([Anthropic's documented local-dev mode](https://code.claude.com/docs/en/plugins#test-your-plugins-locally)).

### Verify it's wired

In any host, ask:

> load the QA classifications

You should get the canonical change-classification names back. If you do, you're wired.

### Run it from the terminal

Beyond the host integration, sumo-qa ships memorable commands for the QA-native repo loop:

```bash
sumo-qa analyze # map the current repo into .sumo-qa/repo-map.json
sumo-qa status # is the map present, current, and fresh? what next?
sumo-qa report # compose the .sumo-qa artifacts into qa-report.html
```

`analyze [path]` writes the schema-validated `.sumo-qa/repo-map.json` artifact and prints a concise summary; `status [path]` reports the artifact's presence, schema version, freshness against `HEAD`, and the next command to run; `report [path]` composes the persisted `.sumo-qa` artifacts into a self-contained static HTML QA report at `.sumo-qa/qa-report.html`, with honest not-available states for anything missing. All take `--json` for automation. (Bare `sumo-qa` launches the MCP server for hosts; `sumo-qa-doctor` runs setup diagnostics.)

### Update

```bash
pip install --upgrade sumo-qa && sumo-qa-install
```

Restart the host. The SessionStart hook re-injects the latest content; skills and knowledge refresh from the upgraded package.

## What's included

```mermaid
%%{init: {'theme':'base', 'themeVariables': {
'fontFamily':'Charter, "Iowan Old Style", Georgia, serif',
'fontSize':'15px',
'primaryTextColor':'#1B1B1B',
'lineColor':'#1B1B1B'
}}}%%
flowchart LR
LLM{{"Host LLM"}}

subgraph Inputs ["sumo-qa content"]
direction TB
Knowledge[("Knowledge")]
Standards[("Standards")]
end

Skills["Skills"]
Output(["Output"])

Knowledge -- cited by --> Skills
Standards -- cited by --> Skills
LLM == follows ==> Skills
Skills == produces ==> Output

classDef host fill:#7A1F1F,stroke:#1B1B1B,stroke-width:2px,color:#FAF7F2
classDef skills fill:#FAF7F2,stroke:#1B1B1B,stroke-width:2.5px,color:#1B1B1B
classDef data fill:#F0EAE0,stroke:#8A7B5C,stroke-width:1.5px,color:#1B1B1B
classDef out fill:#E8EDDF,stroke:#3F4A2E,stroke-width:2px,color:#1B1B1B
classDef group fill:none,stroke:#8A7B5C,stroke-width:1px,color:#5C4D00,stroke-dasharray: 4 4

class LLM host
class Skills skills
class Knowledge,Standards data
class Output out
class Inputs group

linkStyle 0,1 stroke:#8A7B5C,stroke-width:1.2px,stroke-dasharray:5 4
linkStyle 2,3 stroke:#1B1B1B,stroke-width:2.5px
```

| Layer | What |
|---|---|
| **Skills** ([`skills/`](skills/)) | Iron-Law procedures across the QA lifecycle: deciding approach, preparing for work, TDD scaffolding, diff review, strengthening tests, finding test data, answering testing questions, repo strategy — plus the planning → parallel subagent execution → finishing chain. |
| **MCP entry points** | A thin tool surface — skill tools, knowledge loaders, a capabilities-discovery tool, repo-map tools, test-data tools, an ingestion tool, and external-skill lifecycle tools. Each is file IO or small deterministic logic; no inference. |
| **Progressive skill loading** | A read-only loader (`sumo_qa_list_skill_manifests` + `sumo_qa_load_skill_context`) that fetches a skill in slices — a routing manifest, one section, one lazy module, or the full body — so a host pays the routing slice on each revisit instead of the whole body every time. `section`/`module`/`full` are **canonical** (verbatim from the file — cite/follow them); the **manifest** paths are *compact navigation aids*, not a substitute for the procedure text. Load the full body (or the exact section) when a workflow actually needs the wording. See [docs/TOOLS.md](docs/TOOLS.md#which-path-to-use--canonical-vs-compact) and [docs/SKILLS.md](docs/SKILLS.md#progressive-loading--manifest--section--module--full). |
| **Knowledge catalogues** ([`knowledge/`](knowledge/)) | Classifications, approaches, principles, techniques. The agent picks from these instead of recalling from training data. Editable as plain markdown. Specialty-tool picks are deliberately not catalogued — the discipline is observe the risk surface, web-search current options for the user's stack, cite when naming a tool. |

## Host support

Every host calls the same MCP server and reads the same SKILL.md files. What differs is how each host exposes them — that's a host-API difference, not a sumo-qa choice.

These hosts are verified end-to-end with `sumo-qa-install`:

| Host | Slash | Setup |
|---|---|---|
| **Claude Code** | `/sumo-qa-deciding-approach` (hyphens) | `sumo-qa-install --claude-code` |
| **VS Code + Copilot** (Agent mode, Claude Sonnet 4.5 or equivalent) | Natural language | `sumo-qa-install --vscode --workspace ` writes `.vscode/mcp.json` |
| **JetBrains AI Assistant** | `/sumo_qa_deciding_approach` (underscores) | One-time UI setup; `sumo-qa-install --jetbrains` prints the fields to paste |
| **JetBrains Junie** | Natural language | Drop the JSON `sumo-qa-install --jetbrains` prints into `~/.junie/mcp/sumo-qa.json` (global) or `/.junie/mcp/` (per project) |

In Claude Code, type `/` then `sumo-qa-` to see the skills as hyphenated entries (symlinked into `~/.claude/skills/`). The same skills are also registered through MCP with underscores (`/sumo_qa_load_classifications`, `/sumo_qa_find_test_data`); both routes call the same SKILL.md.

Natural language works everywhere. *"Review my changes"*, *"plan QA for this story"*, *"load the QA classifications"* — the agent routes by tool description. Slash and natural-language paths produce the same result.

**Other MCP hosts** (Cursor, Codex, OpenCode, etc.): `pip install sumo-qa` ships a standard stdio MCP server, so it should work with anything that speaks MCP. Follow your host's MCP-server setup docs and point it at the absolute path of the `sumo-qa` script. Not verified end-to-end by us, so we don't ship instructions.

### Host adapter folders

`sumo-qa` ships first-class plugin manifest folders for hosts that consume them directly. Both folders are generated from a single canonical source (`pyproject.toml`'s `[tool.sumo-qa.plugin]` overlay) — see [docs/host-adapters.md](docs/host-adapters.md) for the architecture.

| Host | Manifest | Install status today | Source-of-truth contract |
|---|---|---|---|
| Claude Code | `.claude-plugin/plugin.json` + `.claude-plugin/marketplace.json` (requires `uv` — see [INSTALL.md](docs/INSTALL.md#prerequisite-uv)) | `claude --plugin-dir /path/to/sumo-qa` (session-scoped), or `/plugin marketplace add sumithr/sumo-qa` → `/plugin install sumo-qa@sumo-qa` (persistent; wired + schema-valid, live install not yet verified) | Both schema-validated against the published JSON Schemas in CI |
| OpenAI Codex | `.codex-plugin/plugin.json` | Not verified end-to-end yet — treat as TBD | MCP `initialize` handshake smoke in CI (no published schema) |

Adding a new host is one new template under `plugin_packaging/templates/` plus the canonical-source line that describes it. The `plugin-packaging` CI workflow re-runs the generator on every PR and fails if any committed adapter file diverges from the canonical source.

## See it in action

Ten transcripts showing the workflow on real code — diff reviews refusing to call safe-to-merge from stale CI, TDD cycles with the red output surfaced verbatim, mutation survivors walked one at a time, formal test plans gated on entry/exit criteria, and the case where the right answer is "no tests needed, stop here":

- [tests/scenarios/worked-examples/](tests/scenarios/worked-examples/) — start with [02 — review my changes](tests/scenarios/worked-examples/02-review-my-changes.md) for a representative end-to-end
- [tests/scenarios/SCENARIOS.md](tests/scenarios/SCENARIOS.md) — the scenario specs (prompt → expected shape → anti-patterns the skill prevents)

## When sumo-qa doesn't fit

If your QA intent has no native fit (Playwright E2E, accessibility audits, k6 load testing, type checking), sumo-qa searches for an external skill through its MCP server, offers a `[y/N]` install gate, installs through the Skills CLI, then loads the installed `SKILL.md` back into the conversation.

```mermaid
%%{init: {'theme':'base', 'themeVariables': {
'fontFamily':'Charter, "Iowan Old Style", Georgia, serif',
'fontSize':'13px',
'primaryTextColor':'#1B1B1B',
'lineColor':'#1B1B1B'
}}}%%
flowchart LR
Intent(["QA intent
no native fit"])
Search["search
sumo_qa_search_external_skills"]
Gate{"[y/N]"}
Install["install
sumo_qa_install_external_skill"]
Locate["locate & load
check_installed · execute"]
Out(["external SKILL.md
in the conversation"])
Stop(["stop"])

Intent ==> Search ==> Gate
Gate -->|y| Install ==> Locate ==> Out
Gate -->|N| Stop

classDef io fill:#FAF7F2,stroke:#1B1B1B,stroke-width:2px,color:#1B1B1B
classDef step fill:#FAF7F2,stroke:#1B1B1B,stroke-width:2.5px,color:#1B1B1B
classDef gate fill:#7A1F1F,stroke:#1B1B1B,stroke-width:2px,color:#FAF7F2
classDef stop fill:#F0EAE0,stroke:#8A7B5C,stroke-width:1.5px,color:#1B1B1B
classDef done fill:#E8EDDF,stroke:#3F4A2E,stroke-width:2px,color:#1B1B1B

class Intent io
class Search,Install,Locate step
class Gate gate
class Stop stop
class Out done
```

- The host does not run `npx` directly; `sumo_qa_search_external_skills`, `sumo_qa_check_external_skill_installed`, `sumo_qa_install_external_skill`, and `sumo_qa_execute_external_skill` own the lifecycle.
- Search returns the Skills CLI's text output verbatim (ANSI stripped); the host LLM reads it as the user would. No structured parser to drift out of date.
- Node.js is required for the Skills CLI. If `npx` is missing, the MCP tool returns an actionable error and stops. It doesn't elevate via sudo.
- The external skill suggests tool-specific setup, but sumo-qa's setup standard overrides it — any machine-level / global install in the returned skill body is translated to a repo-pinned + CI-reproducible equivalent — while sumo-qa keeps the confirmation gates, test evidence, and risk-to-test mapping.

## Support

Filing a clear issue gets it fixed faster. Pick the template that matches the problem:

| Symptom | Template |
|---|---|
| `pip install` / `sumo-qa-install` failed, or first-run setup is broken | [Install / setup problem](.github/ISSUE_TEMPLATE/install_setup_problem.yml) |
| Install worked, but the host (Claude Code, VS Code + Copilot, JetBrains, Cursor, …) does not surface tools or skills correctly | [Host compatibility problem](.github/ISSUE_TEMPLATE/host_compatibility.yml) |
| The wrong sumo-qa skill ran for a prompt (or none ran when one should have) | [Skill routed wrong](.github/ISSUE_TEMPLATE/skill_routed_wrong.yml) |
| A skill ran, but its QA output was generic, wrong, or missed something | [QA output quality issue](.github/ISSUE_TEMPLATE/qa_output_quality.yml) |
| You want a new workflow, skill, or host integration | [Feature / workflow request](.github/ISSUE_TEMPLATE/feature_request.yml) |
| Reproducible defect that does not fit the above | [Bug report](.github/ISSUE_TEMPLATE/bug_report.yml) |

## License

[Apache 2.0](LICENSE). See [NOTICE](NOTICE) for attribution requirements that apply to forks and redistributors.

## More docs

- [AGENTS.md](AGENTS.md) — AI-agent bootstrap and per-host setup
- [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) — three layers, host delivery, knowledge authority
- [docs/SKILLS.md](docs/SKILLS.md) — every skill with its Iron Law
- [docs/TOOLS.md](docs/TOOLS.md) — every MCP entry point
- [docs/INSTALL.md](docs/INSTALL.md) — per-host install detail and troubleshooting
- [docs/CONTENT-FORMATS.md](docs/CONTENT-FORMATS.md) — schemas + worked examples for adding team standards, knowledge, change rules, and test data (incl. swapping ISTQB out)
- [docs/CONFIGURATION.md](docs/CONFIGURATION.md) — env vars
- [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) — local dev
- [docs/TEST-DATA.md](docs/TEST-DATA.md) — known-good test-data catalogue
- [docs/REPO-MAP.md](docs/REPO-MAP.md) — QA-native repo-map artifact under `.sumo-qa/`: schema, scanner, and the scan / diff-impact / query tools that consume it (issues #155, #156)
- [docs/RISK-LEDGER.md](docs/RISK-LEDGER.md) — risk-to-test traceability ledger: the structured appendix to the markdown-first verdict, its row schema and evidence-status vocabulary, and when not to use it (issue #144)
- [docs/SCORECARD.md](docs/SCORECARD.md) — QA readiness scorecard: composes the risk ledger + context bundle + optional coverage/mutation into a *derived* readiness recommendation (ready / blocked / insufficient_evidence / ready-with-accepted-residuals) — an evidence summary, not a predictive quality score (issue #151)
- [docs/EXPORT.md](docs/EXPORT.md) — deterministic export of already-structured QA test cases to versioned JSON, a markdown table, or (flat-only) CSV: the case schema, the format set, the side-effect-free contract, and the import-mapping caveat (issue #148)
- [docs/QA-REPORT.md](docs/QA-REPORT.md) — local QA report: the static `.sumo-qa/qa-report.html` page composed from the persisted artifacts, the four honest artifact states, and a readiness verdict derived by the #151 `QaScorecard` engine (issue #157)
- [docs/PERSONA.md](docs/PERSONA.md) — optional Sumo-sensei voice (off by default)