An open API service indexing awesome lists of open source software.

https://github.com/steel-dev/leaderboard

Open leaderboard for browser agents
https://github.com/steel-dev/leaderboard

ai-agents browser-automation llm rpa

Last synced: 3 months ago
JSON representation

Open leaderboard for browser agents

Awesome Lists containing this project

README

        

# Browser Agent Leaderboard

This repository presents the current standings of various web agents evaluated on the **WebVoyager** benchmark ([paper](https://arxiv.org/abs/2401.13919)). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.

---
![Steel.dev - Open-source Browser API for AI Agents & Apps](/public/github_hero.png)
Steel is an open-source browser API purpose-built for AI agents.

## Leaderboard

| Rank | Model | Organization | WebVoyager Score | Source | Open Source | New | SOTA |
| ---- | --------------- | -------------- | ---------------- | ------------------------------------------------------------------------------------------------- | ----------- | --- | ---- |
| 1 | Browser Use | Browser Use | 89.1% | [Source](https://browser-use.com/posts/sota-technical-report) | Yes | Yes | Yes |
| 2 | Operator | OpenAI | 87% | [Source](https://openai.com/index/introducing-operator/) | No | Yes | |
| 3 | Kura | Kura | 87% | [Source](https://www.trykura.com/benchmarks) | No | Yes | |
| 4 | Skyvern 2.0 | Skyvern | 85.85% | [Source](https://blog.skyvern.com/skyvern-2-0-state-of-the-art-web-navigation-with-85-8-on-webvoyager-eval/) | Yes | Yes | |
| 5 | Project Mariner | Google | 83.5% | [Source](https://deepmind.google/technologies/project-mariner/) | No | | |
| 6 | Proxy | Convergence AI | 82% | [Source](https://convergence.ai/training-web-agents-with-web-world-models-dec-2024/) | No | | |
| 7 | Agent-E | Emergence AI | 73.1% | [Source](https://www.emergence.ai/blog/agent-e-sota) | No | | |
| 8 | Runner H 0.1 | H Company | 67% | [Source](https://www.hcompany.ai/blog/a-research-update) | No | | |
| 9 | WILBUR | Academic Research | 60.6% | [Source](https://arxiv.org/abs/2404.05902) | No | | |
| 10 | WebVoyager | Academic Research | 59.1% | [Source](https://arxiv.org/abs/2401.13919) | Yes | | |
| 11 | Computer Use | Anthropic | 52% | [Source](https://www.hcompany.ai/blog/a-research-update) | No | | |

**Notes:**

- **Open Source**: Indicates whether the agent's source code is publicly available.
- **New**: Denotes recently introduced models.
- **SOTA**: Signifies models that have achieved state-of-the-art performance.

## Contributing

We encourage contributions to keep this leaderboard up-to-date. If you have information about new models or updated scores, please submit a pull request or open an issue.

## License

This project is licensed under the MIT License.