https://github.com/sbroenne/pytest-codingagents

Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.
https://github.com/sbroenne/pytest-codingagents

agent agent-skills copilot-instructions copilot-sdk custom-agents github-copilot prompts

Last synced: 4 months ago
JSON representation

Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.

Host: GitHub
URL: https://github.com/sbroenne/pytest-codingagents
Owner: sbroenne
License: mit
Created: 2026-02-11T15:57:02.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-02-19T11:37:54.000Z (5 months ago)
Last Synced: 2026-02-19T13:45:35.333Z (5 months ago)
Topics: agent, agent-skills, copilot-instructions, copilot-sdk, custom-agents, github-copilot, prompts
Language: Python
Homepage:
Size: 892 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing/index.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

          # pytest-codingagents

**Test-driven prompt engineering for GitHub Copilot.**

Everyone copies instruction files from blog posts, adds "you are a senior engineer" to agent configs, and includes skills found on Reddit. But does any of it work? Are your instructions making your agent better — or just longer?

**You don't know, because you're not testing it.**

pytest-codingagents gives you a complete **test→optimize→test loop** for GitHub Copilot configurations:

1. **Write a test** — define what the agent *should* do

2. **Run it** — see it fail (or pass)

3. **Optimize** — call `optimize_instruction()` to get a concrete suggestion

4. **A/B confirm** — use `ab_run` to prove the change actually helps

5. **Ship it** — you now have evidence, not vibes

Currently supports **GitHub Copilot** via [copilot-sdk](https://www.npmjs.com/package/github-copilot-sdk) with **IDE personas** for VS Code, Claude Code, and Copilot CLI environments.

```python

from pytest_codingagents import CopilotAgent, optimize_instruction

import pytest

async def test_docstring_instruction_works(ab_run):

    """Prove the docstring instruction actually changes output, and get a fix if it doesn't."""

    baseline = CopilotAgent(instructions="Write Python code.")

    treatment = CopilotAgent(

        instructions="Write Python code. Add Google-style docstrings to every function."

    )

    b, t = await ab_run(baseline, treatment, "Create math.py with add(a, b) and subtract(a, b).")

    assert b.success and t.success

    if '"""' not in t.file("math.py"):

        suggestion = await optimize_instruction(

            treatment.instructions or "",

            t,

            "Agent should add docstrings to every function.",

        )

        pytest.fail(f"Docstring instruction was ignored.\n\n{suggestion}")

    assert '"""' not in b.file("math.py"), "Baseline should not have docstrings"

```

## Install

```bash

uv add pytest-codingagents

```

Authenticate via `GITHUB_TOKEN` env var (CI) or `gh auth status` (local).

## What You Can Test

| Capability | What it proves | Guide |

|---|---|---|

| **A/B comparison** | Config B actually produces different (and better) output than Config A | [Getting Started](https://sbroenne.github.io/pytest-codingagents/getting-started/) |

| **Instruction optimization** | Turn a failing test into a ready-to-use instruction fix | [Optimize Instructions](https://sbroenne.github.io/pytest-codingagents/how-to/optimize/) |

| **Instructions** | Your custom instructions change agent behavior — not just vibes | [Getting Started](https://sbroenne.github.io/pytest-codingagents/getting-started/) |

| **Skills** | That domain knowledge file is helping, not being ignored | [Skill Testing](https://sbroenne.github.io/pytest-codingagents/how-to/skills/) |

| **Models** | Which model works best for your use case and budget | [Model Comparison](https://sbroenne.github.io/pytest-codingagents/getting-started/model-comparison/) |

| **Custom Agents** | Your custom agent configurations actually work as intended | [Getting Started](https://sbroenne.github.io/pytest-codingagents/getting-started/) |

| **MCP Servers** | The agent discovers and uses your custom tools | [MCP Server Testing](https://sbroenne.github.io/pytest-codingagents/how-to/mcp-servers/) |

| **CLI Tools** | The agent operates command-line interfaces correctly | [CLI Tool Testing](https://sbroenne.github.io/pytest-codingagents/how-to/cli-tools/) |

## AI Analysis

> **See it in action:** [Basic Report](https://sbroenne.github.io/pytest-codingagents/demo/basic-report.html) · [Model Comparison](https://sbroenne.github.io/pytest-codingagents/demo/model-comparison-report.html) · [Instruction Testing](https://sbroenne.github.io/pytest-codingagents/demo/instruction-testing-report.html)

Every test run produces an HTML report with AI-powered insights:

- **Diagnoses failures** — root cause analysis with suggested fixes

- **Compares models** — leaderboards ranked by pass rate and cost

- **Evaluates instructions** — which instructions produce better results

- **Recommends improvements** — actionable changes to tools, instructions, and skills

```bash

uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat

```

## Documentation

Full docs at **[sbroenne.github.io/pytest-codingagents](https://sbroenne.github.io/pytest-codingagents/)** — API reference, how-to guides, and demo reports.

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sbroenne/pytest-codingagents

Awesome Lists containing this project

README