{"id":17526883,"url":"https://github.com/ServiceNow/WorkArena","last_synced_at":"2025-03-06T06:31:02.063Z","repository":{"id":227342846,"uuid":"764848724","full_name":"ServiceNow/WorkArena","owner":"ServiceNow","description":"WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?","archived":false,"fork":false,"pushed_at":"2024-10-07T15:46:42.000Z","size":25051,"stargazers_count":107,"open_issues_count":6,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-07T15:49:43.836Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://servicenow.github.io/WorkArena/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ServiceNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-28T20:25:35.000Z","updated_at":"2024-10-07T15:44:55.000Z","dependencies_parsed_at":"2024-10-31T00:31:22.012Z","dependency_job_id":null,"html_url":"https://github.com/ServiceNow/WorkArena","commit_stats":null,"previous_names":["servicenow/workarena"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FWorkArena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FWorkArena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FWorkArena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2FWorkArena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ServiceNow","download_url":"https://codeload.github.com/ServiceNow/WorkArena/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242161458,"owners_count":20081876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T15:02:35.609Z","updated_at":"2025-03-06T06:31:02.054Z","avatar_url":"https://github.com/ServiceNow.png","language":"Python","readme":"\u003ca href=\"./assets/WorkArena_banner.png\"\u003e\n  \u003cimg src=\"./assets/WorkArena_banner.png\" width=\"1000\" /\u003e\n\u003c/a\u003e\n\n# WorkArena: A Benchmark for Evaluating Agents on Knowledge Work Tasks \n[[Benchmark Contents]](#benchmark-contents) ♦ [[Getting Started]](#getting-started) ♦ [[Live Demo]](#live-demo) ♦ [[BrowserGym]](https://github.com/ServiceNow/BrowserGym) ♦ [[Citing This Work]](#citing-this-work) ♦ [Join us on Discord!](https://discord.gg/rDkP69X7)\n\n## Join Our Discord Community\n\nWant to brainstorm ideas, troubleshoot issues, or just geek out with fellow agent builders? Our official Discord server is the perfect place to connect and collaborate. Come hang out with us to:\n\n- Exchange tips, tricks, and success stories\n- Get real-time support and feedback\n- Stay updated on the latest features and announcements\n\n[Join us on Discord!](https://discord.gg/rDkP69X7)\n\n---\n\n### Explore the BrowserGym Ecosystem\n\nLooking for more tools and resources? Check out these open-source projects:\n\n- **[AgentLab](https://github.com/ServiceNow/AgentLab)**\n- **[BrowserGym](https://github.com/ServiceNow/BrowserGym)**\n\nBoth are part of the broader [BrowserGym ecosystem](https://arxiv.org/abs/2412.05467)\n\n### Papers\n*  [ICML 2024] WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks? [[Paper]](https://arxiv.org/abs/2403.07718)\n \n*  [NeurIPS 2024] WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [[Paper]](https://arxiv.org/abs/2407.05291)\n \n\n`WorkArena` is a suite of browser-based tasks tailored to gauge web agents' effectiveness in supporting routine tasks for knowledge workers. \nBy harnessing the ubiquitous [ServiceNow](https://www.servicenow.com/what-is-servicenow.html) platform, this benchmark will be instrumental in assessing the widespread state of such automations in modern knowledge work environments.\n\nThe preferred way to evaluate on WorkArena is with [AgentLab](https://github.com/ServiceNow/AgentLab/) which will conduct parallel experiments through [BrowserGym](https://github.com/ServiceNow/BrowserGym) and report on a [unified leaderboard](https://huggingface.co/spaces/ServiceNow/browsergym-leaderboard).\n\nhttps://github.com/ServiceNow/WorkArena/assets/2374980/68640f09-7d6f-4eb1-b556-c294a6afef70\n\n## Getting Started\n\nTo setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this.\n\n### a) Create a ServiceNow Developer Instance\n\n1. Go to https://developer.servicenow.com/ and create an account.\n2. Click on `Request an instance` and select the `Washington` release (initializing the instance will take a few minutes)\n3. Once the instance is ready, you should see your instance URL and credentials. If not, click _Return to the Developer Portal_, then navigate to _Manage instance password_ and click _Reset instance password_.\n4. You should now see your URL and credentials. Based on this information, set the following environment variables:\n    * `SNOW_INSTANCE_URL`: The URL of your ServiceNow developer instance\n    * `SNOW_INSTANCE_UNAME`: The username, should be \"admin\"\n    * `SNOW_INSTANCE_PWD`: The password, make sure you place the value in quotes \"\" and be mindful of [escaping special shell characters](https://onlinelinuxtools.com/escape-shell-characters). Running `echo $SNOW_INSTANCE_PWD` should print the correct password.\n6. Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics).\n\n**Warning:** Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process.\n\n### b) Install WorkArena and Initialize your Instance\n\nRun the following command to install WorkArena in the [BrowswerGym](https://github.com/servicenow/browsergym) environment:\n```\npip install browsergym\n```\n\nThen, install [Playwright](https://github.com/microsoft/playwright):\n```\nplaywright install\n```\n\nFinally, run this command in a terminal to upload the benchmark data to your ServiceNow instance:\n```\nworkarena-install\n```\nYour installation is now complete! 🎉\n\n\n## Benchmark Contents\n\nAt the moment, WorkArena-L1 includes `19,912` unique instances drawn from `33` tasks that cover the main components of the ServiceNow user interface, otherwise referred to as \"atomic\" tasks. WorkArena++ contains 682 tasks, each one sampling among thousands of potential configurations. WorkArena++ uses the atomic components presented in WorkArena, and composes them into real-world use cases evaluating planning, reasoning, and memorizing abilities of agents. \n\nThe following videos show an agent built on `GPT-4-vision` interacting with every atomic component of the benchmark. As emphasized by our results, this benchmark is not solved and thus, the performance of the agent is not always on point.\n\n### Knowledge Bases\n\n**Goal:** The agent must search for specific information in the company knowledge base.\n\n_The agent interacts with the user via BrowserGym's conversational interface._\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/352341ba-b501-46ac-bfa6-a6c9be1ac2b7\n\n### Forms\n\n**Goal:** The agent must fill a complex form with specific values for each field.\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/e2c2b5cb-3386-4f3c-b073-c8c619e0e81b\n\n### Service Catalogs\n\n**Goal:** The agent must order items with specific configurations from the company's service catalog.\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/ac64db3b-9abf-4b5f-84a7-e2d9c9cee863\n\n### Lists\n\n**Goal:** The agent must filter a list according to some specifications.\n\n_In this example, the agent struggles to manipulate the UI and fails to create the filter._\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/7538b3ef-d39b-4978-b9ea-8b9e106df28e\n\n### Menus\n\n**Goal:** The agent must navigate to a specific application using the main menu.\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/ca26dfaf-2358-4418-855f-80e482435e6e\n\n### Dashboards\n\n**Goal:** The agent must answer a question that requires reading charts and (optionally) performing simple reasoning over them.\n\n*Note: For demonstration purposes, a human is controlling the cursor since this is a pure retrieval task*\n\nhttps://github.com/ServiceNow/WorkArena/assets/1726818/0023232c-081f-4be4-99bd-f60c766e6c3f\n\n## Getting Started\n\nTo setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this.\n\n### a) Create a ServiceNow Developer Instance\n\n1. Go to https://developer.servicenow.com/ and create an account.\n2. Click on `Request an instance` and select the `Washington` release (initializing the instance will take a few minutes)\n3. Once the instance is ready, you should see your instance URL and credentials. If not, click _Return to the Developer Portal_, then navigate to _Manage instance password_ and click _Reset instance password_.\n4. You should now see your URL and credentials. Based on this information, set the following environment variables:\n    * `SNOW_INSTANCE_URL`: The URL of your ServiceNow developer instance\n    * `SNOW_INSTANCE_UNAME`: The username, should be \"admin\"\n    * `SNOW_INSTANCE_PWD`: The password, make sure you place the value in single quotes '' and be mindful of [escaping special shell characters](https://onlinelinuxtools.com/escape-shell-characters). Running `echo $SNOW_INSTANCE_PWD` should print the correct password.\n6. Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics).\n\n**Warning:** Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process.\n\n### b) Install WorkArena and Initialize your Instance\n\nRun the following command to install WorkArena in the [BrowswerGym](https://github.com/servicenow/browsergym) environment:\n```\npip install browsergym-workarena\n```\n\nThen, install [Playwright](https://github.com/microsoft/playwright):\n```\nplaywright install\n```\n\nFinally, run this command in a terminal to upload the benchmark data to your ServiceNow instance:\n```\nworkarena-install\n```\nYour installation is now complete! 🎉\n\n## Live Demo\n\nRun this code to see WorkArena in action.\n\nNote: the following example executes WorkArena's oracle (cheat) function to solve each task. To evaluate an agent, calls to `env.step()` must be used instead.\n\n- To run a demo of WorkArena-L1 (ICML 2024) tasks using BrowserGym, use the following script:\n```python\nimport random\n\nfrom browsergym.core.env import BrowserEnv\nfrom browsergym.workarena import ALL_WORKARENA_TASKS\nfrom time import sleep\n\n\nrandom.shuffle(ALL_WORKARENA_TASKS)\nfor task in ALL_WORKARENA_TASKS:\n    print(\"Task:\", task)\n\n    # Instantiate a new environment\n    env = BrowserEnv(task_entrypoint=task,\n                    headless=False)\n    env.reset()\n\n    # Cheat functions use Playwright to automatically solve the task\n    env.chat.add_message(role=\"assistant\", msg=\"On it. Please wait...\")\n    cheat_messages = []\n    env.task.cheat(env.page, cheat_messages)\n\n    # Send cheat messages to chat\n    for cheat_msg in cheat_messages:\n        env.chat.add_message(role=cheat_msg[\"role\"], msg=cheat_msg[\"message\"])\n\n    # Post solution to chat\n    env.chat.add_message(role=\"assistant\", msg=\"I'm done!\")\n\n    # Validate the solution\n    reward, stop, message, info = env.task.validate(env.page, cheat_messages)\n    if reward == 1:\n        env.chat.add_message(role=\"user\", msg=\"Yes, that works. Thanks!\")\n    else:\n        env.chat.add_message(role=\"user\", msg=f\"No, that doesn't work. {info.get('message', '')}\")\n\n    sleep(3)\n    env.close()\n```\n\n\n\n- To run a demo of WorkArena-L2 (WorkArena++) tasks using BrowserGym, use the following script. Change the filter on line 6 to `l3` to sample L3 tasks.\n\n```python\nimport random\n\nfrom browsergym.core.env import BrowserEnv\nfrom browsergym.workarena import get_all_tasks_agents\n \nAGENT_L2_SAMPLED_SET = get_all_tasks_agents(filter=\"l2\")\n \nAGENT_L2_SAMPLED_TASKS, AGENT_L2_SEEDS = [sampled_set[0] for sampled_set in AGENT_L2_SAMPLED_SET], [\n    sampled_set[1] for sampled_set in AGENT_L2_SAMPLED_SET\n]\nfrom time import sleep\n\nfor (task, seed) in zip(AGENT_L2_SAMPLED_TASKS, AGENT_L2_SEEDS):\n    print(\"Task:\", task)\n\n    # Instantiate a new environment\n    env = BrowserEnv(task_entrypoint=task,\n                    headless=False)\n    env.reset()\n\n    # Cheat functions use Playwright to automatically solve the task\n    env.chat.add_message(role=\"assistant\", msg=\"On it. Please wait...\")\n    \n    for i in range(len(env.task)):\n        sleep(1)\n        env.task.cheat(page=env.page, chat_messages=env.chat.messages, subtask_idx=i)\n        sleep(1)\n        reward, done, message, info = env.task.validate(page=env.page, chat_messages=env.chat.messages)\n   \n    if reward == 1:\n        env.chat.add_message(role=\"user\", msg=\"Yes, that works. Thanks!\")\n    else:\n        env.chat.add_message(role=\"user\", msg=f\"No, that doesn't work. {info.get('message', '')}\")\n\n    sleep(3)\n    env.close()\n```\n\nNote: the following example executes WorkArena's oracle (cheat) function to solve each task. To evaluate an agent, calls to `env.step()` must be used instead.\n\n## Citing This Work\n\nPlease use the following BibTeX to cite our work:\n\n### WorkArena\n```\n@misc{workarena2024,\n      title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?}, \n      author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste},\n      year={2024},\n      eprint={2403.07718},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n### WorkArena++\n```\n@misc{boisvert2024workarenacompositionalplanningreasoningbased,\n      title={WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks}, \n      author={Léo Boisvert and Megh Thakkar and Maxime Gasse and Massimo Caccia and Thibault Le Sellier De Chezelles and Quentin Cappart and Nicolas Chapados and Alexandre Lacoste and Alexandre Drouin},\n      year={2024},\n      eprint={2407.05291},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2407.05291}, \n}\n```\n","funding_links":[],"categories":["Python","Benchmarks \u0026 Research","Agent Harnessing and Evaluation","🔬 Web Agent Benchmarks","🌍 Environments \u0026 Benchmarks"],"sub_categories":["Dev Tools","Benchmark Reality Check (real-world tool use)","Paid Platforms"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FServiceNow%2FWorkArena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FServiceNow%2FWorkArena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FServiceNow%2FWorkArena/lists"}