{"id":17526884,"url":"https://github.com/ServiceNow/BrowserGym","last_synced_at":"2025-03-06T06:30:58.814Z","repository":{"id":225903822,"uuid":"754167312","full_name":"ServiceNow/BrowserGym","owner":"ServiceNow","description":"BrowserGym, a gym environment for web task automation in the Chromium browser.","archived":false,"fork":false,"pushed_at":"2024-10-29T13:07:16.000Z","size":986,"stargazers_count":303,"open_issues_count":22,"forks_count":39,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-10-29T15:56:42.375Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ServiceNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-07T14:30:16.000Z","updated_at":"2024-10-29T13:07:19.000Z","dependencies_parsed_at":"2024-05-09T21:38:11.264Z","dependency_job_id":"bc1c9af8-ed48-4cc5-a98c-6d757b52f520","html_url":"https://github.com/ServiceNow/BrowserGym","commit_stats":null,"previous_names":["servicenow/browsergym"],"tags_count":61,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FBrowserGym","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FBrowserGym/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FBrowserGym/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FBrowserGym/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ServiceNow","download_url":"https://codeload.github.com/ServiceNow/BrowserGym/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242161453,"owners_count":20081874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T15:02:35.638Z","updated_at":"2025-03-06T06:30:58.802Z","avatar_url":"https://github.com/ServiceNow.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n![BrowserGym banner](https://github.com/user-attachments/assets/4853f210-43ac-4107-a0d2-95c9c614dbe7)\n\n🛠️ [Setup](#%EF%B8%8F-setup) -\n🏋 [Usage](#-usage) -\n💻 [Demo](#-demo) -\n🌐 [Ecosystem](#-ecosystem) -\n🚀 [AgentLab](https://github.com/ServiceNow/AgentLab) -\n🌟 [Contributors](#-contributors) -\n📄 [Paper](https://arxiv.org/abs/2412.05467) -\n📝 [Citation](#-citing-this-work)\n\n[![pypi](https://badge.fury.io/py/browsergym.svg)](https://pypi.org/project/browsergym/)\n[![PyPI - License](https://img.shields.io/pypi/l/browsergym?style=flat-square)]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/browsergym-core?style=flat-square)](https://pypistats.org/packages/browsergym-core)\n[![GitHub star chart](https://img.shields.io/github/stars/ServiceNow/BrowserGym?style=flat-square)](https://star-history.com/#ServiceNow/BrowserGym)\n[![Code Format](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/code_format.yml)\n[![Tests](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/ServiceNow/BrowserGym/actions/workflows/unit_tests.yml)\n\n```python\npip install browsergym\n```\n\n\u003c/div\u003e\n\n\u003e [!WARNING]\n\u003e BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research.\n\u003e It is not meant to be a consumer product. Use with caution!\n\n\u003e [!TIP]\n\u003e 🚀 Check out [AgentLab](https://github.com/ServiceNow/AgentLab)✨ !\n\u003e A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.\n\nhttps://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85\n\n_Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)._\n\nBrowserGym includes the following benchmarks by default:\n - [MiniWoB](https://miniwob.farama.org/)\n - [WebArena](https://webarena.dev/)\n - [VisualWebArena](https://jykoh.com/vwa)\n - [WorkArena](https://github.com/ServiceNow/WorkArena)\n - [AssistantBench](https://github.com/oriyor/assistantbench)\n - [WebLINX](https://github.com/McGill-NLP/weblinx) (static benchmark)\n\nDesigning new web benchmarks with BrowserGym is easy, and simply requires to inherit the [`AbstractBrowserTask`](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/task.py#L7C7-L7C26) class.\n\n## 🛠️ Setup\n\nTo use browsergym, install one of the following packages:\n```sh\npip install browsergym  # (recommended) everything below\npip install browsergym-experiments  # experiment utilities (agent, loop, benchmarks) + everything below\npip install browsergym-core  # core functionalities only (no benchmark, just the openended task)\npip install browsergym-miniwob  # core + miniwob\npip install browsergym-webarena  # core + webarena\npip install browsergym-visualwebarena  # core + visualwebarena\npip install browsergym-workarena  # core + workarena\npip install browsergym-assistantbench  # core + assistantbench\npip install weblinx-browsergym  # core + weblinx\n```\n\nThen setup playwright by running\n```sh\nplaywright install chromium\n```\n\nFinally, each benchmark comes with its own specific setup that requires to follow additional steps.\n - for MiniWoB++, see [miniwob/README.md](browsergym/miniwob/README.md)\n - for WebArena, see [webarena/README.md](browsergym/webarena/README.md)\n - for VisualWebArena, see [visualwebarena/README.md](browsergym/visualwebarena/README.md)\n - for WorkArena, see [WorkArena](https://github.com/ServiceNow/WorkArena)\n - for AssistantBench, see [assistantbench/README.md](browsergym/assistantbench/README.md)\n\n### 🏗️ Development setup\n\nTo install browsergym locally for development, use the following commands:\n```sh\ngit clone git@github.com:ServiceNow/BrowserGym.git\ncd BrowserGym\nmake install\n```\n\nContributions are welcome! 😊\n\n## 🏋 Usage\n\nBoilerplate code to run an agent on an interactive, open-ended task:\n```python\nimport gymnasium as gym\nimport browsergym.core  # register the openended task as a gym environment\n\n# start an openended environment\nenv = gym.make(\n    \"browsergym/openended\",\n    task_kwargs={\"start_url\": \"https://www.google.com/\"},  # starting URL\n    wait_for_user_message=True,  # wait for a user message after each agent message sent to the chat\n)\n# run the environment \u003c\u003e agent loop until termination\nobs, info = env.reset()\nwhile True:\n    action = ...  # implement your agent here\n    obs, reward, terminated, truncated, info = env.step(action)\n    if terminated or truncated:\n        break\n# release the environment\nenv.close()\n```\n\nMiniWoB\n```python\nimport gymnasium as gym\nimport browsergym.miniwob  # register miniwob tasks as gym environments\n\n# start a miniwob task\nenv = gym.make(\"browsergym/miniwob.choose-list\")\n...\n\n# list all the available miniwob tasks\nenv_ids = [id for id in gym.envs.registry.keys() if id.startswith(\"browsergym/miniwob\")]\nprint(\"\\n\".join(env_ids))\n```\n\nWorkArena\n```python\nimport gymnasium as gym\nimport browsergym.workarena  # register workarena tasks as gym environments\n\n# start a workarena task\nenv = gym.make(\"browsergym/workarena.servicenow.order-ipad-pro\")\n...\n\n# list all the available workarena tasks\nenv_ids = [id for id in gym.envs.registry.keys() if id.startswith(\"browsergym/workarena\")]\nprint(\"\\n\".join(env_ids))\n```\n\nWebArena\n```python\nimport gymnasium as gym\nimport browsergym.webarena  # register webarena tasks as gym environments\n\n# start a webarena task\nenv = gym.make(\"browsergym/webarena.310\")\n...\n\n# list all the available webarena tasks\nenv_ids = [id for id in gym.envs.registry.keys() if id.startswith(\"browsergym/webarena\")]\nprint(\"\\n\".join(env_ids))\n```\n\nVisualWebArena\n```python\nimport gymnasium as gym\nimport browsergym.webarena  # register webarena tasks as gym environments\n\n# start a visualwebarena task\nenv = gym.make(\"browsergym/visualwebarena.721\")\n...\n\n# list all the available visualwebarena tasks\nenv_ids = [id for id in gym.envs.registry.keys() if id.startswith(\"browsergym/visualwebarena\")]\nprint(\"\\n\".join(env_ids))\n```\n\nAssistantBench\n```python\nimport gymnasium as gym\nimport browsergym.workarena  # register assistantbench tasks as gym environments\n\n# start an assistantbench task\nenv = gym.make(\"browsergym/assistantbench.validation.3\")\n...\n\n# list all the available assistantbench tasks\nenv_ids = [id for id in gym.envs.registry.keys() if id.startswith(\"browsergym/workarena\")]\nprint(\"\\n\".join(env_ids))\n```\n\n## 💻 Demo\n\nIf you want to experiment with a demo agent in BrowserGym, follow these steps\n```sh\n# conda setup\nconda env create -f demo_agent/environment.yml\nconda activate demo_agent\n\n# or pip setup\npip install -r demo_agent/requirements.txt\n\n# then download the browser for playwright\nplaywright install chromium\n```\n\nOur demo agent uses `openai` as a backend, be sure to set your `OPENAI_API_KEY`.\n\nLaunch the demo agent as follows\n```sh\n# openended (interactive chat mode)\npython demo_agent/run_demo.py --task_name openended --start_url https://www.google.com\n\n# miniwob\npython demo_agent/run_demo.py --task_name miniwob.click-test\n\n# workarena\npython demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop\n\n# webarena\npython demo_agent/run_demo.py --task_name webarena.4\n\n# visualwebarena\npython demo_agent/run_demo.py --task_name visualwebarena.398\n```\n\nYou can customize your experience by changing the `model_name` to your preferred LLM (it uses `gpt-4o-mini` by default), adding screenshots for your VLMs with `use_screenshot`, and much more!\n\n```python\npython demo_agent/run_demo.py --help\n```\n\n## 🌐 Ecosystem\n\n- [AgentLab](https://github.com/ServiceNow/AgentLab): Seamlessly run agents on benchmarks, collect and analyse traces.\n- [WorkArena(++)](https://github.com/ServiceNow/WorkArena): A benchmark for web agents on the ServiceNow platform.\n- [WebArena](https://github.com/web-arena-x/webarena): A benchmark of realistic web tasks on self-hosted domains.\n- [VisualWebArena](https://github.com/web-arena-x/visualwebarena): A benchmark of realistic visual web tasks on self-hosted domains.\n- [MiniWoB(++)](https://miniwob.farama.org/): A collection of over 100 web tasks on synthetic web pages.\n- [WebLINX](https://github.com/McGill-NLP/weblinx): A dataset of real-world web interaction traces.\n- [AssistantBench](https://github.com/oriyor/assistantbench): A benchmark of realistic and time-consuming tasks on the open web.\n\n## 🌟 Contributors\n\n[![BrowserGym contributors](https://contrib.rocks/image?repo=ServiceNow/BrowserGym\u0026max=2000)](https://github.com/ServiceNow/BrowserGym/graphs/contributors)\n\n## 📝 Citing This Work\n\nPlease use the following BibTeX to cite our work:\n```tex\n@inproceedings{workarena2024,\n    title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},\n    author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},\n    booktitle = {Proceedings of the 41st International Conference on Machine Learning},\n    pages = {11642--11662},\n    year = {2024},\n    editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},\n    volume = {235},\n    series = {Proceedings of Machine Learning Research},\n    month = {21--27 Jul},\n    publisher = {PMLR},\n    url = {https://proceedings.mlr.press/v235/drouin24a.html},\n}\n```\n","funding_links":[],"categories":["Python","4. Web Browsing Agents","Benchmarks \u0026 Research","📋 Contents","Agent Harnessing and Evaluation","Benchmark/Evaluator","🌍 Environments \u0026 Benchmarks"],"sub_categories":["4.2 Research Oriented Web Browsing Framework","Dev Tools","📈 9. Evaluation, Benchmarks \u0026 Datasets","Benchmark Reality Check (real-world tool use)","Tools"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FServiceNow%2FBrowserGym","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FServiceNow%2FBrowserGym","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FServiceNow%2FBrowserGym/lists"}