https://github.com/steel-dev/leaderboard
Open leaderboard for browser agents
https://github.com/steel-dev/leaderboard
ai-agents browser-automation llm rpa
Last synced: 3 months ago
JSON representation
Open leaderboard for browser agents
- Host: GitHub
- URL: https://github.com/steel-dev/leaderboard
- Owner: steel-dev
- License: mit
- Created: 2025-02-27T18:44:24.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-04T23:00:06.000Z (3 months ago)
- Last Synced: 2025-03-04T23:24:40.956Z (3 months ago)
- Topics: ai-agents, browser-automation, llm, rpa
- Language: Astro
- Homepage: https://leaderboard.steel.dev
- Size: 714 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Browser Agent Leaderboard
This repository presents the current standings of various web agents evaluated on the **WebVoyager** benchmark ([paper](https://arxiv.org/abs/2401.13919)). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.
---

Steel is an open-source browser API purpose-built for AI agents.## Leaderboard
| Rank | Model | Organization | WebVoyager Score | Source | Open Source | New | SOTA |
| ---- | --------------- | -------------- | ---------------- | ------------------------------------------------------------------------------------------------- | ----------- | --- | ---- |
| 1 | Browser Use | Browser Use | 89.1% | [Source](https://browser-use.com/posts/sota-technical-report) | Yes | Yes | Yes |
| 2 | Operator | OpenAI | 87% | [Source](https://openai.com/index/introducing-operator/) | No | Yes | |
| 3 | Kura | Kura | 87% | [Source](https://www.trykura.com/benchmarks) | No | Yes | |
| 4 | Skyvern 2.0 | Skyvern | 85.85% | [Source](https://blog.skyvern.com/skyvern-2-0-state-of-the-art-web-navigation-with-85-8-on-webvoyager-eval/) | Yes | Yes | |
| 5 | Project Mariner | Google | 83.5% | [Source](https://deepmind.google/technologies/project-mariner/) | No | | |
| 6 | Proxy | Convergence AI | 82% | [Source](https://convergence.ai/training-web-agents-with-web-world-models-dec-2024/) | No | | |
| 7 | Agent-E | Emergence AI | 73.1% | [Source](https://www.emergence.ai/blog/agent-e-sota) | No | | |
| 8 | Runner H 0.1 | H Company | 67% | [Source](https://www.hcompany.ai/blog/a-research-update) | No | | |
| 9 | WILBUR | Academic Research | 60.6% | [Source](https://arxiv.org/abs/2404.05902) | No | | |
| 10 | WebVoyager | Academic Research | 59.1% | [Source](https://arxiv.org/abs/2401.13919) | Yes | | |
| 11 | Computer Use | Anthropic | 52% | [Source](https://www.hcompany.ai/blog/a-research-update) | No | | |**Notes:**
- **Open Source**: Indicates whether the agent's source code is publicly available.
- **New**: Denotes recently introduced models.
- **SOTA**: Signifies models that have achieved state-of-the-art performance.## Contributing
We encourage contributions to keep this leaderboard up-to-date. If you have information about new models or updated scores, please submit a pull request or open an issue.
## License
This project is licensed under the MIT License.