https://github.com/steel-dev/leaderboard

Open leaderboard for browser agents
https://github.com/steel-dev/leaderboard

ai-agents browser-automation llm rpa

Last synced: 3 months ago
JSON representation

Open leaderboard for browser agents

Host: GitHub
URL: https://github.com/steel-dev/leaderboard
Owner: steel-dev
License: mit
Created: 2025-02-27T18:44:24.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-03-04T23:00:06.000Z (3 months ago)
Last Synced: 2025-03-04T23:24:40.956Z (3 months ago)
Topics: ai-agents, browser-automation, llm, rpa
Language: Astro
Homepage: https://leaderboard.steel.dev
Size: 714 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Browser Agent Leaderboard

This repository presents the current standings of various web agents evaluated on the **WebVoyager** benchmark ([paper](https://arxiv.org/abs/2401.13919)). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.

---

![Steel.dev - Open-source Browser API for AI Agents & Apps](/public/github_hero.png)

Steel is an open-source browser API purpose-built for AI agents.

## Leaderboard

| Rank | Model           | Organization   | WebVoyager Score | Source                                                                                            | Open Source | New | SOTA |

| ---- | --------------- | -------------- | ---------------- | ------------------------------------------------------------------------------------------------- | ----------- | --- | ---- |

| 1 | Browser Use    | Browser Use   | 89.1%           | [Source](https://browser-use.com/posts/sota-technical-report) | Yes         | Yes | Yes  |

| 2 | Operator       | OpenAI        | 87%             | [Source](https://openai.com/index/introducing-operator/) | No          | Yes |      |

| 3 | Kura           | Kura          | 87%             | [Source](https://www.trykura.com/benchmarks) | No          | Yes |      |

| 4 | Skyvern 2.0    | Skyvern       | 85.85%          | [Source](https://blog.skyvern.com/skyvern-2-0-state-of-the-art-web-navigation-with-85-8-on-webvoyager-eval/) | Yes         | Yes |      |

| 5 | Project Mariner | Google        | 83.5%           | [Source](https://deepmind.google/technologies/project-mariner/) | No          |     |      |

| 6 | Proxy          | Convergence AI | 82%             | [Source](https://convergence.ai/training-web-agents-with-web-world-models-dec-2024/) | No          |     |      |

| 7 | Agent-E        | Emergence AI  | 73.1%           | [Source](https://www.emergence.ai/blog/agent-e-sota) | No          |     |      |

| 8 | Runner H 0.1   | H Company     | 67%             | [Source](https://www.hcompany.ai/blog/a-research-update) | No          |     |      |

| 9 | WILBUR         | Academic Research | 60.6%           | [Source](https://arxiv.org/abs/2404.05902) | No          |     |      |

| 10 | WebVoyager     | Academic Research | 59.1%           | [Source](https://arxiv.org/abs/2401.13919) | Yes         |     |      |

| 11 | Computer Use   | Anthropic     | 52%             | [Source](https://www.hcompany.ai/blog/a-research-update) | No          |     |      |

**Notes:**

- **Open Source**: Indicates whether the agent's source code is publicly available.

- **New**: Denotes recently introduced models.

- **SOTA**: Signifies models that have achieved state-of-the-art performance.

## Contributing

We encourage contributions to keep this leaderboard up-to-date. If you have information about new models or updated scores, please submit a pull request or open an issue.

## License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/steel-dev/leaderboard

Awesome Lists containing this project

README