https://github.com/ServiceNow/BrowserGym
BrowserGym, a gym environment for web task automation in the Chromium browser.
https://github.com/ServiceNow/BrowserGym
Last synced: 2 months ago
JSON representation
BrowserGym, a gym environment for web task automation in the Chromium browser.
- Host: GitHub
- URL: https://github.com/ServiceNow/BrowserGym
- Owner: ServiceNow
- License: other
- Created: 2024-02-07T14:30:16.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-29T13:07:16.000Z (6 months ago)
- Last Synced: 2024-10-29T15:56:42.375Z (6 months ago)
- Language: Python
- Homepage:
- Size: 963 KB
- Stars: 303
- Watchers: 8
- Forks: 39
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-web-agents - BrowserGym by ServiceNow - A gym environment for web task automation.  (Benchmarks & Research / Dev Tools)
- awesome-web-agents - BrowserGym by ServiceNow - A gym environment for web task automation.  (Benchmarks & Research / Dev Tools)
- awesome-ai-agents - ServiceNow/BrowserGym
README

🛠️ [Setup](#%EF%B8%8F-setup) -
🏋 [Usage](#-usage) -
💻 [Demo](#-demo) -
🌐 [Ecosystem](#-ecosystem) -
🚀 [AgentLab](https://github.com/ServiceNow/AgentLab) -
🌟 [Contributors](#-contributors) -
📄 [Paper](https://arxiv.org/abs/2412.05467) -
📝 [Citation](#-citing-this-work)[](https://pypi.org/project/browsergym/)
[]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))
[](https://pypistats.org/packages/browsergym-core)
[](https://star-history.com/#ServiceNow/BrowserGym)
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml)
[](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml)```python
pip install browsergym
```> [!WARNING]
> BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research.
> It is not meant to be a consumer product. Use with caution!> [!TIP]
> 🚀 Check out [AgentLab](https://github.com/ServiceNow/AgentLab)✨ !
> A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85
_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._
BrowserGym includes the following benchmarks by default:
- [MiniWoB](https://miniwob.farama.org/)
- [WebArena](https://webarena.dev/)
- [VisualWebArena](https://jykoh.com/vwa)
- [WorkArena](https://github.com/ServiceNow/WorkArena)
- [AssistantBench](https://github.com/oriyor/assistantbench)
- [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark)Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class.
## 🛠️ Setup
To use browsergym, install one of the following packages:
```sh
pip install browsergym # (recommended) everything below
pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below
pip install browsergym-core # core functionalities only (no benchmark, just the openended task)
pip install browsergym-miniwob # core + miniwob
pip install browsergym-webarena # core + webarena
pip install browsergym-visualwebarena # core + visualwebarena
pip install browsergym-workarena # core + workarena
pip install browsergym-assistantbench # core + assistantbench
pip install weblinx-browsergym # core + weblinx
```Then setup playwright by running
```sh
playwright install chromium
```Finally, each benchmark comes with its own specific setup that requires to follow additional steps.
- for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md)
- for WebArena, see [webarena/README.md](browsergym/webarena/README.md)
- for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md)
- for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena)
- for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md)### 🏗️ Development setup
To install browsergym locally for development, use the following commands:
```sh
git clone [email protected]:ServiceNow/BrowserGym.git
cd BrowserGym
make install
```Contributions are welcome! 😊
## 🏋 Usage
Boilerplate code to run an agent on an interactive, open-ended task:
```python
import gymnasium as gym
import browsergym.core # register the openended task as a gym environment# start an openended environment
env = gym.make(
"browsergym/openended",
task_kwargs={"start_url": "https://www.google.com/"}, # starting URL
wait_for_user_message=True, # wait for a user message after each agent message sent to the chat
)
# run the environment <> agent loop until termination
obs, info = env.reset()
while True:
action = ... # implement your agent here
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
# release the environment
env.close()
```MiniWoB
```python
import gymnasium as gym
import browsergym.miniwob # register miniwob tasks as gym environments# start a miniwob task
env = gym.make("browsergym/miniwob.choose-list")
...# list all the available miniwob tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))
```WorkArena
```python
import gymnasium as gym
import browsergym.workarena # register workarena tasks as gym environments# start a workarena task
env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
...# list all the available workarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))
```WebArena
```python
import gymnasium as gym
import browsergym.webarena # register webarena tasks as gym environments# start a webarena task
env = gym.make("browsergym/webarena.310")
...# list all the available webarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))
```VisualWebArena
```python
import gymnasium as gym
import browsergym.webarena # register webarena tasks as gym environments# start a visualwebarena task
env = gym.make("browsergym/visualwebarena.721")
...# list all the available visualwebarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
print("\n".join(env_ids))
```AssistantBench
```python
import gymnasium as gym
import browsergym.workarena # register assistantbench tasks as gym environments# start an assistantbench task
env = gym.make("browsergym/assistantbench.validation.3")
...# list all the available assistantbench tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))
```## 💻 Demo
If you want to experiment with a demo agent in BrowserGym, follow these steps
```sh
# conda setup
conda env create -f demo_agent/environment.yml
conda activate demo_agent# or pip setup
pip install -r demo_agent/requirements.txt# then download the browser for playwright
playwright install chromium
```Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`.
Launch the demo agent as follows
```sh
# openended (interactive chat mode)
python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com# miniwob
python demo_agent/run_demo.py --task_name miniwob.click-test# workarena
python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop# webarena
python demo_agent/run_demo.py --task_name webarena.4# visualwebarena
python demo_agent/run_demo.py --task_name visualwebarena.398
```You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more!
```python
python demo_agent/run_demo.py --help
```## 🌐 Ecosystem
- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces.
- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform.
- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains.
- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains.
- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages.
- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces.
- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web.## 🌟 Contributors
[](https://github.com/ServiceNow/BrowserGym/graphs/contributors)
## 📝 Citing This Work
Please use the following BibTeX to cite our work:
```tex
@inproceedings{workarena2024,
title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {11642--11662},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
url = {https://proceedings.mlr.press/v235/drouin24a.html},
}
```