{"id":30750606,"url":"https://github.com/laiso/ts-bench","last_synced_at":"2026-01-28T01:37:42.888Z","repository":{"id":312446170,"uuid":"1006431218","full_name":"laiso/ts-bench","owner":"laiso","description":"Measure and compare the performance of AI coding agents on TypeScript tasks.","archived":false,"fork":false,"pushed_at":"2026-01-22T06:14:08.000Z","size":290,"stargazers_count":196,"open_issues_count":5,"forks_count":10,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-22T20:42:13.538Z","etag":null,"topics":["ai-agents","llm","typescript"],"latest_commit_sha":null,"homepage":"https://medium.com/@laiso/introducing-ts-bench-a-reproducible-benchmark-for-evaluating-ai-coding-agents-typescript-19bcf960cb7c","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/laiso.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["laiso"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2025-06-22T08:54:16.000Z","updated_at":"2026-01-21T22:37:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"fc0f0ea9-880c-49fd-831d-25736f8402e3","html_url":"https://github.com/laiso/ts-bench","commit_stats":null,"previous_names":["laiso/ts-bench"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/laiso/ts-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laiso%2Fts-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laiso%2Fts-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laiso%2Fts-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laiso%2Fts-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/laiso","download_url":"https://codeload.github.com/laiso/ts-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laiso%2Fts-bench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28831895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T23:29:49.665Z","status":"ssl_error","status_checked_at":"2026-01-27T23:25:58.379Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","llm","typescript"],"created_at":"2025-09-04T07:01:45.915Z","updated_at":"2026-01-28T01:37:42.883Z","avatar_url":"https://github.com/laiso.png","language":"TypeScript","funding_links":["https://github.com/sponsors/laiso"],"categories":["TypeScript"],"sub_categories":[],"readme":"# ts-bench: TypeScript Agent Benchmark\n\n**ts-bench** is a transparent and reproducible benchmark project for evaluating the TypeScript code editing capabilities of AI coding agents.\n\n## Leaderboard\n\n\u003c!-- BEGIN_LEADERBOARD --\u003e\n| Rank | Agent | Model | Success Rate | Solved | Avg Time | Result |\n|:----:|:------|:------|:--------------:|:------:|:----------:|:-----:|\n| 1 | opencode | openai/gpt-5 | **96.0%** | 24/25 | 64.8s | [#415419](https://github.com/yukukotani/ts-bench/actions/runs/17366415419) |\n| 2 | gemini | gemini-3-pro-preview | **96.0%** | 24/25 | 188.1s | [#914072](https://github.com/laiso/ts-bench/actions/runs/19484914072) |\n| 3 | gemini | gemini-3-flash-preview | **92.0%** | 23/25 | 99.7s | [#081278](https://github.com/laiso/ts-bench/actions/runs/20326081278) |\n| 4 | goose | claude-sonnet-4-20250514 | **92.0%** | 23/25 | 122.2s | [#186071](https://github.com/laiso/ts-bench/actions/runs/17373186071) |\n| 5 | opencode | anthropic/claude-sonnet-4-20250514 | **92.0%** | 23/25 | 127.8s | [#043809](https://github.com/laiso/ts-bench/actions/runs/17375043809) |\n| 6 | claude | glm-4.6 | **92.0%** | 23/25 | 132.3s | [#009680](https://github.com/laiso/ts-bench/actions/runs/18133009680) |\n| 7 | codex | gpt-5.2 | **92.0%** | 23/25 | 140.4s | [#260672](https://github.com/laiso/ts-bench/actions/runs/20157260672) |\n| 8 | gemini | gemini-2.5-pro | **92.0%** | 23/25 | 168.5s | [#052819](https://github.com/laiso/ts-bench/actions/runs/17351052819) |\n| 9 | codex | gpt-5 | **88.0%** | 22/25 | 91.7s | [#734992](https://github.com/laiso/ts-bench/actions/runs/17344734992) |\n| 10 | opencode | opencode/grok-code | **88.0%** | 22/25 | 97.0s | [#083421](https://github.com/laiso/ts-bench/actions/runs/17355083421) |\n\u003c!-- END_LEADERBOARD --\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## 🤖 Supported Agents\n\nCurrently supported agents:\n\n* [Claude Code](https://www.anthropic.com/claude-code)\n* [Codex CLI](https://developers.openai.com/codex/cli/)\n* [Gemini CLI](https://cloud.google.com/gemini/docs/codeassist/gemini-cli)\n* [OpenCode](https://opencode.ai/)\n* [Goose CLI](https://block.github.io/goose/)\n* [Qwen Code](https://qwenlm.github.io/qwen-code-docs/)\n* [Aider](https://aider.chat/)\n\n## 📖 Vision \u0026 Principles\n\nThis project is strongly inspired by benchmarks like [Aider Polyglot](https://aider.chat/2024/12/21/polyglot.html). Rather than measuring the performance of large language models (LLMs) alone, it focuses on evaluating the **agent layer**—the entire AI coding assistant tool, including prompt strategies, file operations, and iterative logic.\n\nBased on this vision, the benchmark is designed according to the following principles:\n\n* **TypeScript-First**: Focused on TypeScript, which is essential in modern development. Static typing presents unique challenges and opportunities for AI agents, making it a crucial evaluation target.\n* **Agent-Agnostic**: Designed to be independent of any specific AI agent, allowing fair comparison of multiple CLI-based agents such as `Aider` and `Claude Code`.\n* **Baseline Performance**: Uses self-contained problem sets sourced from Exercism to serve as a **baseline** for measuring basic code reading and editing abilities. It is not intended to measure performance on **large-scale editing tasks or complex bug fixes across entire repositories** like SWE-bench.\n\n## 📊 Results \u0026 Methodology\n\nAll benchmark results are generated and published via GitHub Actions.\n\n* **➡️ [View All Benchmark Runs Here](https://github.com/laiso/ts-bench/actions/workflows/benchmark.yml)**\n* **📜 [Read the Benchmark Methodology](docs/METHODOLOGY.md)**\n\nEach results page provides a formatted summary and downloadable artifacts containing raw data (JSON).\n\n## Documentation\nFor detailed documentation, see:\n\n- [Environment Setup](docs/environment.md): Details on setting up the local and Docker environments.\n- [Leaderboard Operation Design](docs/leaderboard.md): Explains how the leaderboard is updated and maintained.\n\n## 🚀 Getting Started\n\n### Installation\n\n```bash\nbun install\n```\n\n### Usage\n\nRun the benchmark with the following commands. Use `--help` to see all available options.\n\n```bash\n# Run the default 25 problems with Claude Code (Sonnet 3.5)\nbun src/index.ts --agent claude --model claude-3-5-sonnet-20240620\n\n# Run only the 'acronym' problem with Aider (GPT-4o)\nbun src/index.ts --agent aider --model gpt-4o --exercise acronym\n```\n\nRunning the benchmark with `--save-result` now also regenerates the local leaderboard dataset under `public/data/latest-results.json`, so you no longer need separate leaderboard flags after exporting results.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaiso%2Fts-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaiso%2Fts-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaiso%2Fts-bench/lists"}