{"id":48385700,"url":"https://github.com/lukasmetzler/agenteval","last_synced_at":"2026-04-08T00:01:01.536Z","repository":{"id":349034104,"uuid":"1198697450","full_name":"lukasmetzler/agenteval","owner":"lukasmetzler","description":"Lint, benchmark, and score your AI coding instructions. Stop guessing, start  measuring.","archived":false,"fork":false,"pushed_at":"2026-04-06T17:38:32.000Z","size":4890,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-06T23:02:45.701Z","etag":null,"topics":["agents","ai","benchmark","bun","claude","cli","code-quality","copilot","cursor","developer","developer-tools","evaluation","instructions","lint","testing","typescript"],"latest_commit_sha":null,"homepage":"https://github.com/lukasmetzler/agenteval#get-started-in-10-seconds","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lukasmetzler.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-01T17:09:27.000Z","updated_at":"2026-04-06T17:37:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"e4ef6d08-3e71-4ba1-a927-f098293bcf80","html_url":"https://github.com/lukasmetzler/agenteval","commit_stats":null,"previous_names":["lukasmetzler/agenteval"],"tags_count":54,"template":false,"template_full_name":null,"purl":"pkg:github/lukasmetzler/agenteval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasmetzler%2Fagenteval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasmetzler%2Fagenteval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasmetzler%2Fagenteval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasmetzler%2Fagenteval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lukasmetzler","download_url":"https://codeload.github.com/lukasmetzler/agenteval/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasmetzler%2Fagenteval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31533824,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","benchmark","bun","claude","cli","code-quality","copilot","cursor","developer","developer-tools","evaluation","instructions","lint","testing","typescript"],"created_at":"2026-04-05T22:03:33.629Z","updated_at":"2026-04-08T00:01:01.485Z","avatar_url":"https://github.com/lukasmetzler.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# agenteval\n\nYour CLAUDE.md is untested. So is your AGENTS.md, your copilot-instructions.md, and your .cursorrules.\n\nagenteval is a linter, benchmarker, and CI gate for AI coding instructions. It finds dead references, token bloat, contradictions, and stale instructions before your agent does. Then it scores agent performance so you can measure whether your instruction changes actually help.\n\n[![CI](https://github.com/lukasmetzler/agenteval/actions/workflows/ci.yml/badge.svg)](https://github.com/lukasmetzler/agenteval/actions/workflows/ci.yml)\n[![npm](https://img.shields.io/npm/v/agenteval-cli?label=npm)](https://www.npmjs.com/package/agenteval-cli)\n[![npm downloads](https://img.shields.io/npm/dm/agenteval-cli)](https://www.npmjs.com/package/agenteval-cli)\n[![Version](https://img.shields.io/github/v/release/lukasmetzler/agenteval?label=release)](https://github.com/lukasmetzler/agenteval/releases)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n![agenteval demo](demo/demo.gif)\n\n## Install\n\n```bash\nnpm install -g agenteval-cli\n```\n\nOr pick your preferred method:\n\n```bash\nbrew tap lukasmetzler/agenteval \u0026\u0026 brew install agenteval   # Homebrew\ncurl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash  # Shell\n```\n\nNo Bun, no Node at runtime. The binary is self-contained.\n\n## Quick Start\n\n```bash\nagenteval lint                    # Find problems in your instruction files\nagenteval lint --explain          # Same, with explanations for each rule\nagenteval harvest --dry-run       # Preview what AI commits are in your history\nagenteval ci                      # Run all tasks, fail on regressions\n```\n\n## What It Catches\n\n- Dead references to files, paths, and headings that don't exist\n- Filler phrases that waste context tokens (\"make sure to\", \"it is important that\")\n- Contradictions between instruction files (\"always use X\" and \"never use X\")\n- Content overlap and duplication across files\n- Token budget overruns that crowd out code context\n- Vague instructions without actionable specifics\n- Stale instructions referencing code that was refactored weeks ago\n- Invalid skill metadata (per Anthropic spec)\n- Broken markdown links and heading anchors\n\n## Supported Formats\n\n| Format | Pattern |\n|--------|---------|\n| Claude Code | `CLAUDE.md` |\n| OpenAI Codex / AGENTS | `AGENTS.md` |\n| GitHub Copilot | `.github/copilot-instructions.md` |\n| Scoped Copilot | `.github/instructions/*.instructions.md` |\n| Anthropic Skills | `.claude/skills/*/SKILL.md` |\n| Cursor | `.cursorrules`, `.cursor/rules/*.mdc` |\n\n## Commands\n\n| Command | What it does | Guide |\n|---------|-------------|-------|\n| `agenteval lint` | Static analysis of instruction files | [Linting](docs/lint.md) |\n| `agenteval harvest` | Build eval tasks from AI commit history | [Harvesting](docs/harvest.md) |\n| `agenteval harvest --live` | Score working tree changes before committing | [Harvesting](docs/harvest.md) |\n| `agenteval run --task \u003cfile\u003e` | Run an AI agent, score the result | [Running Evals](docs/run.md) |\n| `agenteval compare \u003cA\u003e \u003cB\u003e` | Diff two runs side by side | [Results](docs/results.md) |\n| `agenteval ci` | Run all tasks, gate on regressions | [CI Guide](docs/ci.md) |\n| `agenteval trends` | Score history and trend analysis | [Trends](docs/trends.md) |\n| `agenteval init` | Create a starter config | [Configuration](docs/configuration.md) |\n| `agenteval update` | Self-update to the latest version | |\n| `agenteval doctor` | Check environment health | |\n\n## The Pipeline\n\n```mermaid\nflowchart LR\n    A[\"Your CLAUDE.md\"] --\u003e B[\"agenteval lint\"]\n    B --\u003e C[\"Fix quality issues\"]\n    D[\"Git history\"] --\u003e E[\"agenteval harvest\"]\n    E --\u003e F[\"Task YAML files\"]\n    F --\u003e G[\"agenteval run\"]\n    G --\u003e H[\"Scored results\"]\n    H --\u003e I[\"agenteval compare\"]\n    I --\u003e J{{\"Did my instructions improve?\"}}\n\n    style A fill:#2d333b,stroke:#444,color:#e6edf3\n    style D fill:#2d333b,stroke:#444,color:#e6edf3\n    style J fill:#1a7f37,stroke:#2ea043,color:#fff\n```\n\nLint catches problems statically. Harvest builds benchmarks from your git history. Run scores agent performance. Compare tells you what changed. CI gates regressions before they merge.\n\n## CI Integration\n\nAdd agenteval to your GitHub Actions workflow with one line:\n\n```yaml\n- uses: lukasmetzler/agenteval@v0\n  with:\n    command: ci                   # or: lint, harvest --dry-run\n```\n\nOr use the CLI directly in any CI system:\n\n```bash\nagenteval ci --min-score 0.7 --max-regression 0.05\n```\n\nSee the [CI Guide](docs/ci.md) for thresholds, configuration, and examples.\n\n## Installation Options\n\n| Method | Command | Updates via |\n|--------|---------|-------------|\n| **npm** | `npm install -g agenteval-cli` | `npm update -g agenteval-cli` |\n| **Homebrew** | `brew tap lukasmetzler/agenteval \u0026\u0026 brew install agenteval` | `brew upgrade agenteval` |\n| **Shell** | `curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh \\| bash` | `agenteval update` |\n| **GitHub Action** | `uses: lukasmetzler/agenteval@v0` | Always latest |\n| **Binary** | [GitHub Releases](https://github.com/lukasmetzler/agenteval/releases) | `agenteval update` |\n| **Source** | `git clone ... \u0026\u0026 bun install \u0026\u0026 bun run build` | `git pull \u0026\u0026 bun run build` |\n\n## Documentation\n\n| Guide | What it covers |\n|-------|---------------|\n| [Core Concepts](docs/concepts.md) | Instructions, tasks, assertions, harnesses, scoring |\n| [Getting Started](docs/getting-started.md) | Installation, first run, full walkthrough |\n| [Linting](docs/lint.md) | All lint rules, output formats, CI integration |\n| [Running Evals](docs/run.md) | Task definitions, harness adapters, scoring pipeline |\n| [Harvesting](docs/harvest.md) | AI commit detection, task generation, live review |\n| [CI Guide](docs/ci.md) | Regression detection, thresholds, GitHub Actions example |\n| [Trends](docs/trends.md) | Score history and trend analysis |\n| [Configuration](docs/configuration.md) | Every config option with types and defaults |\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukasmetzler%2Fagenteval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flukasmetzler%2Fagenteval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukasmetzler%2Fagenteval/lists"}