https://github.com/ServiceNow/BrowserGym

BrowserGym, a gym environment for web task automation in the Chromium browser.
https://github.com/ServiceNow/BrowserGym

Last synced: 8 months ago
JSON representation

BrowserGym, a gym environment for web task automation in the Chromium browser.

Host: GitHub
URL: https://github.com/ServiceNow/BrowserGym
Owner: ServiceNow
License: other
Created: 2024-02-07T14:30:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-29T13:07:16.000Z (about 1 year ago)
Last Synced: 2024-10-29T15:56:42.375Z (about 1 year ago)
Language: Python
Homepage:
Size: 963 KB
Stars: 303
Watchers: 8
Forks: 39
Open Issues: 22
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-web-agents - BrowserGym by ServiceNow - A gym environment for web task automation. ![GitHub Repo stars](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=social) (Benchmarks & Research / Dev Tools)
awesome-web-agents - BrowserGym by ServiceNow - A gym environment for web task automation. ![GitHub Repo stars](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=social) (Benchmarks & Research / Dev Tools)
awesome-ai-agents - ServiceNow/BrowserGym

README

          


![BrowserGym banner](https://github.com/user-attachments/assets/4853f210-43ac-4107-a0d2-95c9c614dbe7)

🛠️ [Setup](#%EF%B8%8F-setup) -

🏋 [Usage](#-usage) -

💻 [Demo](#-demo) -

🌐 [Ecosystem](#-ecosystem) -

🚀 [AgentLab](https://github.com/ServiceNow/AgentLab) -

🌟 [Contributors](#-contributors) -

📄 [Paper](https://arxiv.org/abs/2412.05467) -

📝 [Citation](#-citing-this-work)

[![pypi](https://badge.fury.io/py/browsergym.svg)](https://pypi.org/project/browsergym/)

[![PyPI - License](https://img.shields.io/pypi/l/browsergym?style=flat-square)]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))

[![PyPI - Downloads](https://img.shields.io/pypi/dm/browsergym-core?style=flat-square)](https://pypistats.org/packages/browsergym-core)

[![GitHub star chart](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=flat-square)](https://star-history.com/#ServiceNow/BrowserGym)

[![Code Format](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml)

[![Tests](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml)

```python

pip install browsergym

```



> [!WARNING]

> BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research.

> It is not meant to be a consumer product. Use with caution!

> [!TIP]

> 🚀 Check out [AgentLab](https://github.com/ServiceNow/AgentLab)✨ !

> A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.

https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85

_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._

BrowserGym includes the following benchmarks by default:

 - [MiniWoB](https://miniwob.farama.org/)

 - [WebArena](https://webarena.dev/)

 - [VisualWebArena](https://jykoh.com/vwa)

 - [WorkArena](https://github.com/ServiceNow/WorkArena)

 - [AssistantBench](https://github.com/oriyor/assistantbench)

 - [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark)

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class.

## 🛠️ Setup

To use browsergym, install one of the following packages:

```sh

pip install browsergym  # (recommended) everything below

pip install browsergym-experiments  # experiment utilities (agent, loop, benchmarks) + everything below

pip install browsergym-core  # core functionalities only (no benchmark, just the openended task)

pip install browsergym-miniwob  # core + miniwob

pip install browsergym-webarena  # core + webarena

pip install browsergym-visualwebarena  # core + visualwebarena

pip install browsergym-workarena  # core + workarena

pip install browsergym-assistantbench  # core + assistantbench

pip install weblinx-browsergym  # core + weblinx

```

Then setup playwright by running

```sh

playwright install chromium

```

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

 - for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md)

 - for WebArena, see [webarena/README.md](browsergym/webarena/README.md)

 - for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md)

 - for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena)

 - for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md)

### 🏗️ Development setup

To install browsergym locally for development, use the following commands:

```sh

git clone git@github.com:ServiceNow/BrowserGym.git

cd BrowserGym

make install

```

Contributions are welcome! 😊

## 🏋 Usage

Boilerplate code to run an agent on an interactive, open-ended task:

```python

import gymnasium as gym

import browsergym.core  # register the openended task as a gym environment

# start an openended environment

env = gym.make(

    "browsergym/openended",

    task_kwargs={"start_url": "https://www.google.com/"},  # starting URL

    wait_for_user_message=True,  # wait for a user message after each agent message sent to the chat

)

# run the environment <> agent loop until termination

obs, info = env.reset()

while True:

    action = ...  # implement your agent here

    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:

        break

# release the environment

env.close()

```

MiniWoB

```python

import gymnasium as gym

import browsergym.miniwob  # register miniwob tasks as gym environments

# start a miniwob task

env = gym.make("browsergym/miniwob.choose-list")

...

# list all the available miniwob tasks

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]

print("\n".join(env_ids))

```

WorkArena

```python

import gymnasium as gym

import browsergym.workarena  # register workarena tasks as gym environments

# start a workarena task

env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")

...

# list all the available workarena tasks

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]

print("\n".join(env_ids))

```

WebArena

```python

import gymnasium as gym

import browsergym.webarena  # register webarena tasks as gym environments

# start a webarena task

env = gym.make("browsergym/webarena.310")

...

# list all the available webarena tasks

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]

print("\n".join(env_ids))

```

VisualWebArena

```python

import gymnasium as gym

import browsergym.webarena  # register webarena tasks as gym environments

# start a visualwebarena task

env = gym.make("browsergym/visualwebarena.721")

...

# list all the available visualwebarena tasks

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]

print("\n".join(env_ids))

```

AssistantBench

```python

import gymnasium as gym

import browsergym.workarena  # register assistantbench tasks as gym environments

# start an assistantbench task

env = gym.make("browsergym/assistantbench.validation.3")

...

# list all the available assistantbench tasks

env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]

print("\n".join(env_ids))

```

## 💻 Demo

If you want to experiment with a demo agent in BrowserGym, follow these steps

```sh

# conda setup

conda env create -f demo_agent/environment.yml

conda activate demo_agent

# or pip setup

pip install -r demo_agent/requirements.txt

# then download the browser for playwright

playwright install chromium

```

Our demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`.

Launch the demo agent as follows

```sh

# openended (interactive chat mode)

python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com

# miniwob

python demo_agent/run_demo.py --task_name miniwob.click-test

# workarena

python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop

# webarena

python demo_agent/run_demo.py --task_name webarena.4

# visualwebarena

python demo_agent/run_demo.py --task_name visualwebarena.398

```

You can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more!

```python

python demo_agent/run_demo.py --help

```

## 🌐 Ecosystem

- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces.

- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform.

- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains.

- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains.

- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages.

- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces.

- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web.

## 🌟 Contributors

[![BrowserGym contributors](https://contrib.rocks/image?repo=ServiceNow/BrowserGym&max=2000)](https://github.com/ServiceNow/BrowserGym/graphs/contributors)

## 📝 Citing This Work

Please use the following BibTeX to cite our work:

```tex

@inproceedings{workarena2024,

    title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},

    author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},

    booktitle = {Proceedings of the 41st International Conference on Machine Learning},

    pages = {11642--11662},

    year = {2024},

    editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},

    volume = {235},

    series = {Proceedings of Machine Learning Research},

    month = {21--27 Jul},

    publisher = {PMLR},

    url = {https://proceedings.mlr.press/v235/drouin24a.html},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ServiceNow/BrowserGym

Awesome Lists containing this project

README