{"id":47605051,"url":"https://github.com/kiloloop/agent-estimate","last_synced_at":"2026-05-21T04:08:19.028Z","repository":{"id":339047422,"uuid":"1160254873","full_name":"kiloloop/agent-estimate","owner":"kiloloop","description":"The first open-source effort estimation tool built for AI coding agents. PERT + METR + wave planning.","archived":false,"fork":false,"pushed_at":"2026-03-20T08:05:01.000Z","size":264,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-20T18:46:39.131Z","etag":null,"topics":["ai-agents","claude-code","cli","developer-tools","effort-estimation","estimation","github-actions","metr","multi-agent","pert","python","wave-planning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kiloloop.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-17T18:12:58.000Z","updated_at":"2026-03-20T08:04:49.000Z","dependencies_parsed_at":"2026-02-23T16:00:36.823Z","dependency_job_id":null,"html_url":"https://github.com/kiloloop/agent-estimate","commit_stats":null,"previous_names":["haoranc/agent-estimate","kiloloop/agent-estimate"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/kiloloop/agent-estimate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiloloop%2Fagent-estimate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiloloop%2Fagent-estimate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiloloop%2Fagent-estimate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiloloop%2Fagent-estimate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kiloloop","download_url":"https://codeload.github.com/kiloloop/agent-estimate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiloloop%2Fagent-estimate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31291084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude-code","cli","developer-tools","effort-estimation","estimation","github-actions","metr","multi-agent","pert","python","wave-planning"],"created_at":"2026-04-01T19:08:39.582Z","updated_at":"2026-05-21T04:08:19.004Z","avatar_url":"https://github.com/kiloloop.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# agent-estimate\n\n[![PyPI Version](https://img.shields.io/pypi/v/agent-estimate)](https://pypi.org/project/agent-estimate/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/agent-estimate)](https://pypi.org/project/agent-estimate/)\n[![License](https://img.shields.io/pypi/l/agent-estimate)](https://github.com/kiloloop/agent-estimate/blob/main/LICENSE)\n[![CI](https://github.com/kiloloop/agent-estimate/actions/workflows/ci.yml/badge.svg)](https://github.com/kiloloop/agent-estimate/actions/workflows/ci.yml)\n\n**Know before you build.**\n\nPERT estimates for AI-agent tasks — how long, which model's reliable enough, and the human-equivalent cost. In one command.\n\n**[Website](https://kiloloop.com/agent-estimate/)** · [Compare](https://kiloloop.com/agent-estimate/compare/) · [PyPI](https://pypi.org/project/agent-estimate/)\n\n## Why\n\nAI agents can write the code — but *how long will the task actually take?* Manual estimation is slow and biased toward optimism; no estimate means scope creep and missed deadlines. The gap between \"agents can do it\" and \"we know when it'll be done\" is where projects break down.\n\n`agent-estimate` closes that gap in one command: a three-point PERT timeline calibrated on real agent runs, plus a human-speed comparison so you see the compression before you spend the compute. It sizes the task, picks a tier, routes it to a model, and flags when the work runs past that model's reliability horizon — calibrated forecasts in seconds, not meetings.\n\nMulti-model matters because the models aren't interchangeable. Opus 4.7, GPT-5.5, and Gemini 3.1 have different reliability horizons ([METR p80](https://metr.org/)) and different costs per turn. A safe 40-minute job for one model is a coin flip for another. agent-estimate models the whole fleet, not a single agent — so the number reflects who actually runs the work.\n\n## Quick Start\n\n\u003e First estimate: 30 seconds to install. Every one after: instant.\n\n### With your agent (recommended)\n\nPaste this into your Claude Code or Codex session:\n\n~~~\nInstall the agent-estimate plugin (https://github.com/kiloloop/agent-estimate) and\nestimate this task for me: \"Implement OAuth 2.0 flow (Google + GitHub)\". Tell me the\nexpected time, the human-speed equivalent, and the compression ratio.\n~~~\n\nYour agent installs the tool, runs the estimate, and reads back the numbers. Nothing to memorize — describe the task in plain English and let the agent translate to flags.\n\nFor a whole backlog:\n\n~~~\nEstimate every open issue in this repo with agent-estimate, group them into parallel\nwaves, and tell me the total wall-clock time for a 3-agent fleet versus doing them\nsequentially myself.\n~~~\n\n### Manual\n\n```bash\npip install agent-estimate\nagent-estimate estimate \"your task description here\"\n```\n\nNo config required — sensible defaults for a 3-agent fleet (Claude, Codex, Gemini). Point it at a file or GitHub issues when you're ready:\n\n```bash\nagent-estimate estimate --file tasks.txt\nagent-estimate estimate --repo myorg/myrepo --issues 11,12,14\n```\n\n## How It Works\n\nagent-estimate produces three-point [PERT](https://en.wikipedia.org/wiki/Program_evaluation_and_review_technique) estimates calibrated for agents, not humans:\n\n- **Tier classification** — auto-sizes tasks XS→XL from complexity signals\n- **PERT math** — optimistic / most-likely / pessimistic, weighted to an expected value\n- **Human comparison** — a per-task-type multiplier, so you see the compression\n- **METR thresholds** — warns when an estimate exceeds a model's p80 reliability horizon\n- **Wave planning** — schedules independent tasks in parallel across the fleet\n- **Review overhead** — models review cycles as additive cost (`standard`, `complex`, `3-round`)\n- **Modifiers** — `--spec-clarity`, `--warm-context`, `--agent-fit` tune the estimate\n\n### Task types\n\n| Type | Flag | Models |\n|------|------|--------|\n| Coding | (default) | Feature work, fixes, refactors |\n| Research | `--type research` | Audits, investigations, analysis |\n| Documentation | `--type documentation` | API docs, guides, changelogs |\n| Brainstorm | `--type brainstorm` | Ideation, spikes, design exploration |\n| Config/SRE | `--type config` | Deploys, infra, CI/CD |\n| Frontend/UI | `--type frontend` | Content patches vs. component builds |\n| App dev | `--type app_dev` | App shells, desktop/mobile builds |\n\n### METR thresholds (defaults)\n\n| Model | p80 threshold |\n|-------|---------------|\n| Opus 4.7 | 90 min |\n| GPT-5.5 | 90 min |\n| GPT-5.4 | 60 min |\n| Gemini 3.1 Pro | 45 min |\n| Sonnet 4.6 | 30 min |\n| Haiku 4.5 | 15 min |\n\n`opus_4_x` is a forward-compatible alias that resolves to the current Opus threshold. Legacy keys (`opus_4_6`, GPT-5/5.2/5.3, Gemini 3 Pro, Sonnet) stay supported. Estimates are calibrated against Claude Code (Opus 4.7, high thinking) and Codex (GPT-5.4/5.5, extra-high) — shift with `--spec-clarity` and `--warm-context` for other setups.\n\n## Examples\n\nReal estimates from production use — including the misses.\n\n**The tool, estimating its own docs.** We sized this v0.7.0 skill-and-README refresh at ~30 minutes. It took 28.\n\n**An honest over-estimate.** We pre-registered a UI mockup build at ~95 minutes with no prior app-dev data. Two agents did it in parallel in 12 and 25 minutes — a 4–8x over-estimate. agent-estimate now ships an `app_dev` prior shaped by that result. The miss stays in the README because calibration means showing where you were wrong.\n\n**Two tasks, one model** — what the tool prints, including the METR reliability flag:\n\n```text\n$ agent-estimate estimate \"Implement auth\" \"Add tests\" --model opus\n\nTask             Tier   PERT (O/M/P)    Expected   Human-eq\n───────────────────────────────────────────────────────────\nImplement auth   M      25/50/90m       57.8m      160m\nAdd tests        S      12/23/40m       24.0m       75m\n\nTimeline ──────────────────────────────\n  best 37m   ·   expected 81.8m   ·   worst 130m\n  human-equivalent: 235m  →  2.87× compression\n\n  ⚠ METR warning: \"Implement auth\" exceeds Opus p80\n```\n\n~82 minutes expected versus ~4 hours by hand — plus a flag that the auth task runs past Opus's p80 reliability horizon, so you split it or add a checkpoint before dispatching.\n\n**Three tasks, three agents, in parallel:**\n\n```bash\n$ agent-estimate estimate --file tasks.txt\n```\n\n| Metric | Value |\n| --- | --- |\n| Wave 0 | All 3 tasks in parallel (Claude + Codex + Gemini) |\n| Expected case | 131m |\n| Human-speed equivalent | 709.5m |\n| **Compression ratio** | **5.42x** |\n| Estimated cost | $4.84 |\n\n~2 hours wall-clock versus ~12 hours sequential. You see the compression before you commit the compute. More in [`examples/`](./examples/) — coding S/M, research, documentation, multi-agent.\n\n## Integrations\n\n### Claude Code plugin\n\n```\n/plugin marketplace add kiloloop/agent-estimate\n/plugin install agent-estimate@agent-estimate-marketplace\n```\n\n```\n/estimate Add a login page with OAuth\n/estimate --file spec.md\n/estimate --issues 1,2,3 --repo myorg/myrepo\n/validate-estimate observation.yaml\n/calibrate\n```\n\n### GitHub Action\n\n```yaml\n- uses: kiloloop/agent-estimate@v0\n  with:\n    issues: '11,12,14'\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eFull workflow example\u003c/summary\u003e\n\n```yaml\nname: Estimate\non:\n  pull_request:\n    types: [opened, synchronize]\n\npermissions:\n  contents: read\n  pull-requests: write\n\njobs:\n  estimate:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: kiloloop/agent-estimate@v0\n        with:\n          issues: '11,12,14'\n          output-mode: summary+pr-comment\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eAction inputs and outputs\u003c/summary\u003e\n\n| Input | Required | Default | Description |\n|-------|----------|---------|-------------|\n| `issues` | yes | — | GitHub issue numbers (comma-separated) |\n| `repo` | no | current repo | GitHub repo (owner/name) |\n| `format` | no | `markdown` | Output format: `markdown` or `json` |\n| `output-mode` | no | `summary` | `summary`, `pr-comment`, `step-output`, `summary+pr-comment` |\n| `config` | no | — | Path to agent config YAML |\n| `title` | no | `Agent Estimate Report` | Report title |\n| `review-mode` | no | `standard` | Review tier: `none`, `standard`, `complex`, `3-round` |\n| `spec-clarity` | no | `1.0` | Spec clarity modifier (0.3–1.3) |\n| `warm-context` | no | `1.0` | Warm context modifier (0.3–1.15) |\n| `agent-fit` | no | `1.0` | Agent fit modifier (0.9–1.2) |\n| `task-type` | no | — | Category: `coding`, `brainstorm`, `research`, `config`, `documentation`, `frontend`, `app_dev` |\n| `python-version` | no | `3.12` | Python version to use |\n| `version` | no | latest | `agent-estimate` version to install |\n| `token` | no | `${{ github.token }}` | GitHub token |\n\n| Output | Description |\n|--------|-------------|\n| `report` | Full estimation report content |\n| `expected-minutes` | Expected minutes (when `format: json`) |\n\n\u003c/details\u003e\n\n### Skill layout\n\nSkills follow the [oacp-skills](https://github.com/kiloloop/oacp-skills) convention:\n\n```\nskills/estimate/\n  skill.yaml            # machine-readable metadata\n  README.md             # human-readable docs\n  shared/INTENT.md      # shared intent across runtimes\n  claude/SKILL.md       # Claude Code skill definition\n  codex/SKILL.md        # Codex skill definition\n```\n\nBoth runtime slices cover the same CLI (`estimate`, `validate`, `calibrate`), phrased for their respective ecosystems.\n\n## Configuration\n\n### Agent fleet\n\nPass a config to model your own fleet:\n\n```yaml\nagents:\n  - name: Claude\n    capabilities: [planning, implementation, review]\n    parallelism: 2\n    cost_per_turn: 0.12\n    model_tier: frontier\n  - name: Codex\n    capabilities: [implementation, debugging, testing]\n    parallelism: 3\n    cost_per_turn: 0.08\n    model_tier: production\nsettings:\n  friction_multiplier: 1.15\n  inter_wave_overhead: 0.25\n  review_overhead: 0.2\n  metr_fallback_threshold: 45.0\n```\n\n```bash\nagent-estimate estimate \"Ship packaging flow\" --config ./my_agents.yaml\n```\n\n### Output formats\n\n```bash\nagent-estimate estimate \"Refactor auth pipeline\" --format json   # machine-readable\nagent-estimate estimate --repo myorg/myrepo --issues 11,12,14    # from GitHub issues\nagent-estimate estimate --file tasks.txt                          # from file\n```\n\n### Calibration\n\nValidate estimates against observed outcomes and build a calibration database:\n\n```bash\nagent-estimate validate observation.yaml --db ~/.agent-estimate/calibration.db\n```\n\n## Project\n\n- **[Website](https://kiloloop.com/agent-estimate/)** — landing page, live demo, and the [estimate comparison view](https://kiloloop.com/agent-estimate/compare/).\n- **[OACP](https://github.com/kiloloop/oacp)** — coordinate the agents you just estimated. Open Agent Coordination Protocol for multi-agent async workflows.\n- **[oacp-skills](https://github.com/kiloloop/oacp-skills)** — the skill bundle agent-estimate's `/estimate` ships in.\n- **[kiloloop](https://github.com/kiloloop)** — the rest of the ecosystem.\n\n## Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md) for the full workflow.\n\n```bash\npip install -e '.[dev]'\nruff check .\npytest -q\n```\n\n## Community\n\n- [Code of Conduct](./CODE_OF_CONDUCT.md)\n- [Security Policy](./SECURITY.md)\n- [Support](./SUPPORT.md)\n- [Changelog](./CHANGELOG.md)\n\n## License\n\nApache License 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkiloloop%2Fagent-estimate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkiloloop%2Fagent-estimate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkiloloop%2Fagent-estimate/lists"}