https://github.com/dkealvaro/agent2bench
Agent2Bench is a benchmark that tests LLMs abilities in Daily life computer tasks like booking flights, downloading programs or exiting vim.
https://github.com/dkealvaro/agent2bench
benchmark computer-vision llms
Last synced: 3 months ago
JSON representation
Agent2Bench is a benchmark that tests LLMs abilities in Daily life computer tasks like booking flights, downloading programs or exiting vim.
- Host: GitHub
- URL: https://github.com/dkealvaro/agent2bench
- Owner: DKeAlvaro
- Created: 2025-02-10T10:19:30.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-10T11:28:10.000Z (8 months ago)
- Last Synced: 2025-02-10T11:33:43.128Z (8 months ago)
- Topics: benchmark, computer-vision, llms
- Language: CSS
- Homepage: https://dkealvaro.github.io/Agent2Bench/
- Size: 17.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0