{"id":13754203,"url":"https://github.com/web-arena-x/webarena","last_synced_at":"2025-12-14T23:05:25.362Z","repository":{"id":183421097,"uuid":"670111980","full_name":"web-arena-x/webarena","owner":"web-arena-x","description":"Code repo for \"WebArena: A Realistic Web Environment for Building Autonomous Agents\"","archived":false,"fork":false,"pushed_at":"2025-09-02T14:39:15.000Z","size":6216,"stargazers_count":1121,"open_issues_count":76,"forks_count":174,"subscribers_count":19,"default_branch":"main","last_synced_at":"2025-09-02T14:39:18.719Z","etag":null,"topics":["agent","nlp"],"latest_commit_sha":null,"homepage":"https://webarena.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/web-arena-x.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-07-24T10:17:55.000Z","updated_at":"2025-09-02T07:49:35.000Z","dependencies_parsed_at":"2023-10-24T01:51:16.629Z","dependency_job_id":"725d145a-b417-4636-be1d-46a2b54f27ed","html_url":"https://github.com/web-arena-x/webarena","commit_stats":null,"previous_names":["web-arena-x/webarena"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/web-arena-x/webarena","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web-arena-x%2Fwebarena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web-arena-x%2Fwebarena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web-arena-x%2Fwebarena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web-arena-x%2Fwebarena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/web-arena-x","download_url":"https://codeload.github.com/web-arena-x/webarena/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web-arena-x%2Fwebarena/sbom","scorecard":{"id":1236806,"data":{"date":"2025-08-25","repo":{"name":"github.com/web-arena-x/webarena","commit":"daee18de46d4b8e3c98c8cf5e5c4ef6de2f7a8eb"},"scorecard":{"version":"v5.2.1-41-g40576783","commit":"40576783fda6698350fcbbeaea760ff827433034"},"score":2.9,"checks":[{"name":"Code-Review","score":1,"reason":"Found 3/17 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/pre-commit.yml:1","Warn: no topLevel permission defined: .github/workflows/tests.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pre-commit.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/web-arena-x/webarena/pre-commit.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pre-commit.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/web-arena-x/webarena/pre-commit.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/pre-commit.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/web-arena-x/webarena/pre-commit.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/web-arena-x/webarena/tests.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/web-arena-x/webarena/tests.yml/main?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/tests.yml:25","Warn: pipCommand not pinned by hash: .github/workflows/tests.yml:28","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned","Info:   0 out of   2 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 20 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"76 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2018-66 / GHSA-562c-5r94-xh97","Warn: Project is vulnerable to: PYSEC-2019-179 / GHSA-5wv5-4vpf-pj6m","Warn: Project is vulnerable to: PYSEC-2023-62 / GHSA-m2qf-hxjv-5gpq","Warn: Project is vulnerable to: PYSEC-2021-356 / GHSA-2ww3-fxvq-293j","Warn: Project is vulnerable to: PYSEC-2024-167 / GHSA-cgvx-9447-vcch","Warn: Project is vulnerable to: PYSEC-2021-859 / GHSA-f8m6-h2c7-8h9x","Warn: Project is vulnerable to: PYSEC-2019-106 / GHSA-mr7p-25v2-35wr","Warn: Project is vulnerable to: PYSEC-2022-5 / GHSA-rqjh-jp2r-59cj","Warn: Project is vulnerable to: GHSA-3c5c-7235-994j","Warn: Project is vulnerable to: GHSA-3f63-hfp8-52jq","Warn: Project is vulnerable to: PYSEC-2021-41 / GHSA-3wvg-mj6g-m9cv","Warn: Project is vulnerable to: PYSEC-2020-77 / GHSA-3xv8-3j54-hgrp","Warn: Project is vulnerable to: PYSEC-2020-80 / GHSA-43fq-w8qq-v88h","Warn: Project is vulnerable to: GHSA-44wm-f244-xhp3","Warn: Project is vulnerable to: GHSA-4fx9-vc88-q2xc","Warn: Project is vulnerable to: PYSEC-2021-35 / GHSA-57h3-9rgr-c24m","Warn: Project is vulnerable to: PYSEC-2020-172 / GHSA-5gm3-px64-rw72","Warn: Project is vulnerable to: PYSEC-2021-331 / GHSA-7534-mm45-c74v","Warn: Project is vulnerable to: PYSEC-2021-92 / GHSA-7r7m-5h27-29hp","Warn: Project is vulnerable to: PYSEC-2020-78 / GHSA-8843-m7mw-mxqm","Warn: Project is vulnerable to: PYSEC-2023-227 / GHSA-8ghj-p4vj-mr35","Warn: Project is vulnerable to: PYSEC-2014-87 / GHSA-8m9x-pxwq-j236","Warn: Project is vulnerable to: PYSEC-2022-10 / GHSA-8vj2-vxx3-667w","Warn: Project is vulnerable to: PYSEC-2021-36 / GHSA-8xjq-8fcg-g5hw","Warn: Project is vulnerable to: PYSEC-2016-6 / GHSA-8xjv-v9xq-m5h9","Warn: Project is vulnerable to: PYSEC-2021-42 / GHSA-95q3-8gr9-gm8w","Warn: Project is vulnerable to: PYSEC-2022-168 / GHSA-9j59-75qj-795w","Warn: Project is vulnerable to: PYSEC-2014-10 / GHSA-cfmr-38g9-f2h7","Warn: Project is vulnerable to: PYSEC-2020-76 / GHSA-cqhg-xjhh-p8hf","Warn: Project is vulnerable to: PYSEC-2021-40 / GHSA-f4w8-cv6p-x6r5","Warn: Project is vulnerable to: PYSEC-2021-69 / GHSA-f5g8-5qq7-938w","Warn: Project is vulnerable to: PYSEC-2021-139 / GHSA-g6rj-rv7j-xwp4","Warn: Project is vulnerable to: PYSEC-2015-16 / GHSA-h5rf-vgqx-wjv2","Warn: Project is vulnerable to: PYSEC-2016-5 / GHSA-hggx-3h72-49ww","Warn: Project is vulnerable to: PYSEC-2020-84 / GHSA-hj69-c76v-86wr","Warn: Project is vulnerable to: PYSEC-2016-7 / GHSA-hvr8-466p-75rh","Warn: Project is vulnerable to: PYSEC-2015-15 / GHSA-j6f7-g425-4gmx","Warn: Project is vulnerable to: GHSA-j7hp-h8jx-5ppr","Warn: Project is vulnerable to: PYSEC-2019-110 / GHSA-j7mj-748x-7p78","Warn: Project is vulnerable to: GHSA-jgpv-4h4c-xhw3","Warn: Project is vulnerable to: PYSEC-2022-42979 / GHSA-m2vv-5vj5-2hm7","Warn: Project is vulnerable to: PYSEC-2021-37 / GHSA-mvg9-xffr-p774","Warn: Project is vulnerable to: PYSEC-2020-83 / GHSA-p49h-hjvm-jg3h","Warn: Project is vulnerable to: PYSEC-2022-8 / GHSA-pw3c-h7wp-cvhx","Warn: Project is vulnerable to: PYSEC-2021-93 / GHSA-q5hq-fp76-qmrc","Warn: Project is vulnerable to: PYSEC-2020-82 / GHSA-r7rm-8j6h-r933","Warn: Project is vulnerable to: PYSEC-2014-23 / GHSA-r854-96gq-rfg3","Warn: Project is vulnerable to: PYSEC-2016-8 / GHSA-rwr3-c2q8-gm56","Warn: Project is vulnerable to: PYSEC-2020-81 / GHSA-vcqg-3p29-xw73","Warn: Project is vulnerable to: PYSEC-2020-79 / GHSA-vj42-xq3r-hr3r","Warn: Project is vulnerable to: PYSEC-2021-70 / GHSA-vqcj-wrf2-7v73","Warn: Project is vulnerable to: PYSEC-2016-9 / GHSA-w4vg-rf63-f3j3","Warn: Project is vulnerable to: PYSEC-2014-22 / GHSA-x895-2wrm-hvp7","Warn: Project is vulnerable to: PYSEC-2022-9 / GHSA-xrcv-f9gm-v42c","Warn: Project is vulnerable to: PYSEC-2021-137","Warn: Project is vulnerable to: PYSEC-2021-138","Warn: Project is vulnerable to: PYSEC-2021-317","Warn: Project is vulnerable to: PYSEC-2021-38","Warn: Project is vulnerable to: PYSEC-2021-39","Warn: Project is vulnerable to: PYSEC-2021-94","Warn: Project is vulnerable to: PYSEC-2023-175","Warn: Project is vulnerable to: GHSA-qq99-p57r-g3v7","Warn: Project is vulnerable to: GHSA-37mw-44qp-f5jm","Warn: Project is vulnerable to: GHSA-37q5-v5qm-c9v8","Warn: Project is vulnerable to: PYSEC-2023-300 / GHSA-3863-2447-669p","Warn: Project is vulnerable to: GHSA-6rvg-6v2m-4j46","Warn: Project is vulnerable to: GHSA-9356-575x-2w9m","Warn: Project is vulnerable to: GHSA-fpwr-67px-3qhx","Warn: Project is vulnerable to: PYSEC-2024-229 / GHSA-hxxf-235m-72v3","Warn: Project is vulnerable to: GHSA-jjph-296x-mrcr","Warn: Project is vulnerable to: GHSA-phhr-52qp-3mj4","Warn: Project is vulnerable to: GHSA-q2wp-rjmx-x6x9","Warn: Project is vulnerable to: PYSEC-2025-40 / GHSA-qq3j-4f4f-9583","Warn: Project is vulnerable to: PYSEC-2024-227 / GHSA-qxrp-vhvm-j765","Warn: Project is vulnerable to: PYSEC-2023-301 / GHSA-v68g-wm8c-6x7j","Warn: Project is vulnerable to: PYSEC-2024-228 / GHSA-wrfc-pvp9-mr9g"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-09-02T14:40:59.183Z","repository_id":183421097,"created_at":"2025-09-02T14:40:59.183Z","updated_at":"2025-09-02T14:40:59.183Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27738669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","nlp"],"created_at":"2024-08-03T09:01:49.493Z","updated_at":"2025-12-14T23:05:25.357Z","avatar_url":"https://github.com/web-arena-x.png","language":"Python","funding_links":[],"categories":["Python","Papers","A01_文本生成_文本对话","5. 数据集","🔍 Observability and Evaluation","🛠️ Awesome Datasets \u0026 Benchmarks","Agent Harnessing and Evaluation","🔬 Web Agent Benchmarks","Benchmarks and Datasets","Benchmarks and Evaluation","Catalog","9. Evaluation, Benchmarks \u0026 Datasets","Web / Browser Environments"],"sub_categories":["Tools","大语言对话模型及数据","5.1 评测基准","Benchmarks","AI Agent Planning","Benchmark Reality Check (real-world tool use)","Paid Platforms","Embodied AI and Robotics","Evaluation Harnesses \u0026 Benchmarks"],"readme":"# WebArena: A Realistic Web Environment for Building Autonomous Agents\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"media/logo.png\" alt=\"Logo\" width=\"80px\"\u003e\n    \u003cbr\u003e\n    \u003cb\u003eWebArena is a standalone, self-hostable web environment for building autonomous agents\u003c/b\u003e\n\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://www.python.org/downloads/release/python-3109/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.10-blue.svg\" alt=\"Python 3.10\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pre-commit.com/\"\u003e\u003cimg src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white\" alt=\"pre-commit\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" alt=\"Code style: black\"\u003e\u003c/a\u003e\n\u003ca href=\"https://mypy-lang.org/\"\u003e\u003cimg src=\"https://www.mypy-lang.org/static/mypy_badge.svg\" alt=\"Checked with mypy\"\u003e\u003c/a\u003e\n\u003ca href=\"https://beartype.readthedocs.io\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/beartype/beartype-assets/main/badge/bear-ified.svg\" alt=\"bear-ified\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://webarena.dev/\"\u003eWebsite\u003c/a\u003e •\n\u003ca href=\"https://arxiv.org/abs/2307.13854\"\u003ePaper\u003c/a\u003e •\n\u003ca href=\"https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit?usp=sharing\"\u003eLeaderboard\u003c/a\u003e •\n\u003ca href=\"https://the-agent-company.com\"\u003eTheAgentCompany\u003c/a\u003e\n\u003c/p\u003e\n\n![Overview](media/overview.png)\n\n## Update on 12/5/2024\n\u003e [!IMPORTANT]\n\u003e This repository hosts the *canonical* implementation of WebArena to reproduce the results reported in the paper. The web navigation infrastructure has been significantly enhanced by [AgentLab](https://github.com/ServiceNow/AgentLab/), introducing several key features: (1) support for parallel experiments using [BrowserGym](https://github.com/ServiceNow/BrowserGym), (2) integration of popular web navigation benchmarks (e.g., VisualWebArena) within a unified framework, (3) unified leaderboard reporting, and (4) improved handling of environment edge cases. We strongly recommend using this framework for your experiments.\n\n## News\n* [12/20/2024] Check out our new benchmark on even more consequential tasks, including terminal use and coding, [TheAgentCompany](https://the-agent-company.com).\n* [12/21/2023] We release the recording of trajectories performed by human annotators on ~170 tasks. Check out the [resource page](./resources/README.md#12212023-human-trajectories) for more details.\n* [11/3/2023] Multiple features!\n  * Uploaded newest [execution trajectories](./resources/README.md#1132023-execution-traces-from-our-experiments-v2)\n  * Added [Amazon Machine Image](./environment_docker/README.md#pre-installed-amazon-machine-image) that pre-installed all websites so that you don't have to!\n  * [Zeno](https://zenoml.com/) x WebArena which allows you to analyze your agents on WebArena without pain. Check out this [notebook](./scripts/webarena-zeno.ipynb) to upload your own data to Zeno, and [this](https://hub.zenoml.com/project/9db3e1cf-6e28-4cfc-aeec-1670cac01872/WebArena%20Tester/explore?params=eyJtb2RlbCI6ImdwdDM1LWRpcmVjdCIsIm1ldHJpYyI6eyJpZCI6NzQ5MiwibmFtZSI6InN1Y2Nlc3MiLCJ0eXBlIjoibWVhbiIsImNvbHVtbnMiOlsic3VjY2VzcyJdfSwiY29tcGFyaXNvbk1vZGVsIjoiZ3B0NC1jb3QiLCJjb21wYXJpc29uQ29sdW1uIjp7ImlkIjoiYTVlMDFiZDUtZTg0NS00M2I4LTllNDgtYTU4NzRiNDJjNjNhIiwibmFtZSI6ImNvbnRleHQiLCJjb2x1bW5UeXBlIjoiT1VUUFVUIiwiZGF0YVR5cGUiOiJOT01JTkFMIiwibW9kZWwiOiJncHQzNS1kaXJlY3QifSwiY29tcGFyZVNvcnQiOltudWxsLHRydWVdLCJtZXRyaWNSYW5nZSI6WzAsMV0sInNlbGVjdGlvbnMiOnsibWV0YWRhdGEiOnt9LCJzbGljZXMiOltdLCJ0YWdzIjpbXX19) page for browsing our existing results!\n* [10/24/2023] We re-examined the whole dataset and fixed the spotted annotation bugs. The current version ([v0.2.0](https://github.com/web-arena-x/webarena/releases/tag/v0.2.0)) is relatively stable and we don't expect major updates on the annotation in the future. The new results with better prompts and the comparison with human performance can be found in our [paper](https://arxiv.org/abs/2307.13854)\n* [8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out [this page](environment_docker/README.md) for details.\n* [7/29/2023] Added [a well commented script](minimal_example.py) to walk through the environment setup.\n## Install\n```bash\n# Python 3.10+\nconda create -n webarena python=3.10; conda activate webarena\npip install -r requirements.txt\nplaywright install\npip install -e .\n\n# optional, dev only\npip install -e \".[dev]\"\nmypy --install-types --non-interactive browser_env agents evaluation_harness\npip install pre-commit\npre-commit install\n```\n## Quick Walkthrough\nCheck out [this script](minimal_example.py) for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform *reproducible* experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment.\n```python\nfrom browser_env import ScriptBrowserEnv, create_id_based_action\n# init the environment\nenv = ScriptBrowserEnv(\n    headless=False,\n    observation_type=\"accessibility_tree\",\n    current_viewport_only=True,\n    viewport_size={\"width\": 1280, \"height\": 720},\n)\n# prepare the environment for a configuration defined in a json file\nconfig_file = \"config_files/0.json\"\nobs, info = env.reset(options={\"config_file\": config_file})\n# get the text observation (e.g., html, accessibility tree) through obs[\"text\"]\n\n# create a random action\nid = random.randint(0, 1000)\naction = create_id_based_action(f\"click [id]\")\n\n# take the action\nobs, _, terminated, _, info = env.step(action)\n```\n## End-to-end Evaluation\n\u003e [!IMPORTANT]\n\u003e To ensure the correct evaluation, please setup your own WebArena websites following step 1 and step 2. The demo sites are only for browsing purpose to help you better understand the content. After evaluating the 812 examples, reset the environment to the initial state following the instructions [here](./environment_docker/README.md#environment-reset).\n\n1. Setup the standalone environment.\nPlease check out [this page](environment_docker/README.md) for details.\n\n2. Configurate the urls for each website.\n```bash\nexport SHOPPING=\"\u003cyour_shopping_site_domain\u003e:7770\"\nexport SHOPPING_ADMIN=\"\u003cyour_e_commerce_cms_domain\u003e:7780/admin\"\nexport REDDIT=\"\u003cyour_reddit_domain\u003e:9999\"\nexport GITLAB=\"\u003cyour_gitlab_domain\u003e:8023\"\nexport MAP=\"\u003cyour_map_domain\u003e:3000\"\nexport WIKIPEDIA=\"\u003cyour_wikipedia_domain\u003e:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing\"\nexport HOMEPAGE=\"\u003cyour_homepage_domain\u003e:4399\" # this is a placeholder\n```\n\n\u003e You are encouraged to update the environment variables in [github workflow](.github/workflows/tests.yml#L7) to ensure the correctness of unit tests\n\n3. Generate config file for each test example\n```bash\npython scripts/generate_test_data.py\n```\nYou will see `*.json` files generated in [config_files](./config_files) folder. Each file contains the configuration for one test example.\n\n4. Obtain the auto-login cookies for all websites\n```\nmkdir -p ./.auth\npython browser_env/auto_login.py\n```\n5. export `OPENAI_API_KEY=your_key`, a valid OpenAI API key starts with `sk-`\n\n6. Launch the evaluation\n```bash\npython run.py \\\n  --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \\ # this is the reasoning agent prompt we used in the paper\n  --test_start_idx 0 \\\n  --test_end_idx 1 \\\n  --model gpt-3.5-turbo \\\n  --result_dir \u003cyour_result_dir\u003e\n```\nThis script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in `\u003cyour_result_dir\u003e/0.html`\n\n\n## Develop Your Prompt-based Agent\n1. Define the prompts. We provide two baseline agents whose corresponding prompts are listed [here](./agent/prompts/raw). Each prompt is a dictionary with the following keys:\n```python\nprompt = {\n  \"intro\": \u003cThe overall guideline which includes the task description, available action, hint and others\u003e,\n  \"examples\": [\n    (\n      example_1_observation,\n      example_1_response\n    ),\n    (\n      example_2_observation,\n      example_2_response\n    ),\n    ...\n  ],\n  \"template\": \u003cHow to organize different information such as observation, previous action, instruction, url\u003e,\n  \"meta_data\": {\n    \"observation\": \u003cWhich observation space the agent uses\u003e,\n    \"action_type\": \u003cWhich action space the agent uses\u003e,\n    \"keywords\": \u003cThe keywords used in the template, the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content\u003e,\n    \"prompt_constructor\": \u003cWhich prompt construtor is in used, the prompt constructor will construct the input feed to an LLM and extract the action from the generation, more details below\u003e,\n    \"action_splitter\": \u003cInside which splitter can we extract the action, used by the prompt constructor\u003e\n    }\n  }\n```\n\n2. Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is [here](./agent/prompts/prompt_constructor.py#L184). The prompt constructor is a class with the following methods:\n* `construct`: construct the input feed to an LLM\n* `_extract_action`: given the generation from an LLM, how to extract the phrase that corresponds to the action\n\n## Citation\nIf you use our environment or data, please cite our paper:\n```\n@article{zhou2023webarena,\n  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},\n  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},\n  journal={arXiv preprint arXiv:2307.13854},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweb-arena-x%2Fwebarena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fweb-arena-x%2Fwebarena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweb-arena-x%2Fwebarena/lists"}