https://github.com/thekevinscott/skillet
Evaluation-driven Claude Code skill development
https://github.com/thekevinscott/skillet
claude-code evaluation llm
Last synced: 4 months ago
JSON representation
Evaluation-driven Claude Code skill development
- Host: GitHub
- URL: https://github.com/thekevinscott/skillet
- Owner: thekevinscott
- License: mit
- Created: 2025-12-11T21:00:24.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-02-16T03:08:03.000Z (4 months ago)
- Last Synced: 2026-02-16T08:50:27.901Z (4 months ago)
- Topics: claude-code, evaluation, llm
- Language: Python
- Homepage: https://skillet.run
- Size: 1.46 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# skillet
[](https://github.com/thekevinscott/skillet/actions/workflows/test.yml)
[](LICENSE)
[](https://www.python.org/downloads/)
Evaluation-driven Claude Code skill development.
## Install
```bash
pip install pyskillet
```
## Why
Anthropic [recommends building evaluations before writing skills](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices#build-evaluations-first):
> Create evaluations BEFORE writing extensive documentation. This ensures your Skill solves real problems rather than documenting imagined ones.
But they don't provide tooling:
> We do not currently provide a built-in way to run these evaluations.
skillet fills that gap.
## Workflow
### 1. Capture evals with `/skillet:add`
When Claude does something wrong, capture it:
```
> Write a code review comment for this SQL query...
Claude: This code has a SQL injection vulnerability...
> /skillet:add
Claude: What did you expect instead?
> Should start with **issue** (blocking): using conventional comments format
Claude: Eval saved to ~/.skillet/evals/conventional-comments/001.yaml
```
### 2. Run baseline eval
```bash
skillet eval conventional-comments
```
```
Eval Results (baseline, no skill)
==================================
Evals: 5
Samples: 3 per eval
Total runs: 15
Pass rate: 0% (0/15)
```
### 3. Create the skill
```bash
skillet create conventional-comments
```
```
Found 5 evals for 'conventional-comments', drafting SKILL.md...
Created ~/.claude/skills/conventional-comments/
└── SKILL.md (draft from 5 evals)
```
### 4. Eval with skill
```bash
skillet eval conventional-comments ~/.claude/skills/conventional-comments
```
```
Eval Results (with skill)
=========================
Skill: ~/.claude/skills/conventional-comments
Evals: 5
Samples: 3 per eval
Total runs: 15
Pass rate: 80% (12/15)
```
### 5. Tune the skill
```bash
skillet tune conventional-comments ~/.claude/skills/conventional-comments
```
```
Round 1/5: Pass rate 80% (12/15)
Improving skill...
Round 2/5: Pass rate 93% (14/15)
Improving skill...
Round 3/5: Pass rate 100% (15/15)
Target reached!
Best skill saved to ~/.claude/skills/conventional-comments/SKILL.md
```
## Commands
```bash
skillet eval [skill] # run evals (baseline or with skill)
skillet create # create skill from evals
skillet tune # iteratively improve skill
skillet compare # compare baseline vs skill from cache
skillet show # inspect cached eval results
skillet lint # lint a SKILL.md for common issues
skillet generate-evals # generate candidate evals from a skill
```
## Evals
Evals are stored in `~/.skillet/evals//`:
```yaml
# ~/.skillet/evals/conventional-comments/001.yaml
timestamp: 2025-01-15T10:30:00Z
name: conventional-comments
prompt: "Write a code review comment for..."
expected: "Should start with **issue** (blocking):"
```
## Python API
```python
import asyncio
from pathlib import Path
from skillet import evaluate
async def main():
# Baseline (no skill)
baseline = await evaluate("conventional-comments")
print(f"Baseline: {baseline['pass_rate']}%")
# With skill
result = await evaluate(
"conventional-comments",
skill_path=Path("~/.claude/skills/conventional-comments").expanduser(),
)
print(f"With skill: {result['pass_rate']}%")
asyncio.run(main())
```
Tune a skill programmatically:
```python
from skillet import tune
from skillet.tune import TuneConfig
result = await tune(
"conventional-comments",
Path("~/.claude/skills/conventional-comments").expanduser(),
config=TuneConfig(max_rounds=10, target_pass_rate=90.0),
)
print(f"Final pass rate: {result.result.final_pass_rate}%")
```
See the [Python API reference](https://skillet.run/reference/python-api) for all functions and options.
## Documentation
Full documentation available at the [docs site](https://skillet.run):
- [Getting Started](https://skillet.run/getting-started)
- [Concepts](https://skillet.run/concepts/skills-vs-agents) — Skills vs agents, capability vs regression, balanced problem sets
- [CLI Reference](https://skillet.run/reference/cli)
- [Eval Format](https://skillet.run/reference/eval-format)
- [Python API](https://skillet.run/reference/python-api)
## Roadmap
See [ROADMAP.md](ROADMAP.md) for future ideas and planned features.
## License
MIT