{"id":19287257,"url":"https://github.com/princeton-nlp/WebShop","last_synced_at":"2025-04-22T04:32:05.926Z","repository":{"id":45812266,"uuid":"514715868","full_name":"princeton-nlp/WebShop","owner":"princeton-nlp","description":"[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents","archived":false,"fork":false,"pushed_at":"2024-08-28T19:40:12.000Z","size":47806,"stargazers_count":247,"open_issues_count":2,"forks_count":47,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-08-28T21:19:30.995Z","etag":null,"topics":["decision-making","language","language-grounding","ml","nlp","rl","rl-environment","shopping","sim-to-real","web-based"],"latest_commit_sha":null,"homepage":"https://webshop-pnlp.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/princeton-nlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-17T00:53:15.000Z","updated_at":"2024-08-28T19:40:16.000Z","dependencies_parsed_at":"2024-08-28T21:06:16.766Z","dependency_job_id":"a3d783ac-41e6-493f-8ceb-986e40ba166b","html_url":"https://github.com/princeton-nlp/WebShop","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2FWebShop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2FWebShop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2FWebShop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/princeton-nlp%2FWebShop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/princeton-nlp","download_url":"https://codeload.github.com/princeton-nlp/WebShop/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223888467,"owners_count":17220083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-making","language","language-grounding","ml","nlp","rl","rl-environment","shopping","sim-to-real","web-based"],"created_at":"2024-11-09T22:05:39.115Z","updated_at":"2024-11-09T22:05:47.847Z","avatar_url":"https://github.com/princeton-nlp.png","language":"Python","funding_links":[],"categories":["Papers","Python","5. 数据集"],"sub_categories":["Datasets","Dataset","5.1 评测基准"],"readme":"# 🛒 WebShop\n\n[![Python version](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/downloads/release/python-3813/)\n[![License](https://img.shields.io/badge/License-Princeton-orange)](https://copyright.princeton.edu/policy)\n[![PyPI version](https://badge.fury.io/py/webshop.svg)](https://badge.fury.io/py/webshop)\n![Pytest workflow](https://github.com/princeton-nlp/webshop/actions/workflows/pytest.yml/badge.svg)\n\nImplementation of the WebShop environment and search agents for the paper:\n\n**[WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents](https://webshop-pnlp.github.io/)**  \n[Shunyu Yao*](https://ysymyth.github.io/), [Howard Chen*](https://howard50b.github.io/), [John Yang](https://john-b-yang.github.io/), [Karthik Narasimhan](https://www.cs.princeton.edu/~karthikn/)\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"assets/diagram.gif\"\u003e\n\u003c/p\u003e\n\nThis repository contains code for reproducing results. If you find this work useful in your research, please cite:\n\n```\n@inproceedings{yao2022webshop,\n  bibtex_show = {true},\n  title = {WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents},\n  author = {Yao, Shunyu and Chen, Howard and Yang, John and Narasimhan, Karthik},\n  booktitle = {ArXiv},\n  year = {preprint},\n  html = {https://arxiv.org/abs/2207.01206},\n  tag = {NLP}\n}\n```\n## 📖 Table of Contents \u003c!-- omit in toc --\u003e\n* [👋 Overview](#-overview)\n* [🚀 Setup](#-setup)\n* [🛠️ Usage](#-usage)\n* [💫 Contributions](#-contributions)\n* [🪪 License](#-license)\n## 👋 Overview\nWebShop is a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. In this environment, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase a product given an instruction. WebShop provides several challenges including understanding compositional instructions, query (re-)formulation, dealing with noisy text in webpages, and performing strategic exploration.\n\n**Hugging Face Demo**: Devise your own natural language query for a product and ask for an agent trained with WebShop to find it on Amazon or eBay, deployed as a 🤗 Hugging Face space [here](https://huggingface.co/spaces/webshop/amazon_shop)!\n\n## 🚀 Setup\nOur code is implemented in Python. To setup, do the following:\n1. Install [Python 3.8.13](https://www.python.org/downloads/release/python-3813/)\n2. Install [Java](https://www.java.com/en/download/)\n3. Download the source code:\n```sh\n\u003e git clone https://github.com/princeton-nlp/webshop.git webshop\n```\n4. Create a virtual environment using [Anaconda](https://anaconda.org/anaconda/python) and activate it\n```sh\n\u003e conda create -n webshop python=3.8.13\n\u003e conda activate webshop\n```\n5. Install requirements into the `webshop` virtual environment via the `setup.sh` script\n```sh\n\u003e ./setup.sh [-d small|all]\n```\nThe setup script performs several actions in the following order:\n* Installs Python dependencies listed in `requirements.txt`\n* Downloads product and instruction data for populating WebShop\n* Downloads `spaCy en_core_web_lg` model\n* Construct search engine index from product, instruction data\n* Downloads 50 randomly chosen trajectories generated by MTurk workers\nThe `-d` flag argument allows you to specify whether you would like to pull the entire product + instruction data set (`-d all`) or a subset of 1000 random products (`-d small`).\n\n6. By default the WebShop only loads 1,000 products for a faster environment preview. To load all products, change `web_agent_site/utils.py`:\n```python\n# DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2_1000.json')\n# DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle_1000.json')\nDEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2.json')\nDEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle.json')\n```\n\n7. (Optional) Download ResNet image feature files [here](https://drive.google.com/drive/folders/1jglJDqNV2ryrlZzrS0yOEk-aRAcLAhNw?usp=sharing) and put into `data/` for running models that require image features.\n\n8. (Optional) Human demonstration data and be downloaded [here](https://drive.google.com/file/d/1GWC8UlUzfT9PRTRxgYOwuKSJp4hyV1dp/view?usp=sharing).\n\n## 🛠️ Usage\nThe WebShop environment can be rendered in two modes - `html` and `simple` - each of which offer a different observation space. The `simple` mode strips away the extraneous meta-data that the `html` mode includes to make model training and evaluation easier.\n### Webpage Environment (`html` mode)\nLaunch the `WebShop` webpage:\n```sh\n\u003e ./run_dev.sh\n```\nThe site should then be viewable in the browser. Go to http://localhost:3000/ABC, where you should land on the search home page with a random instruction.\n\nNavigating the website will automatically generate a corresponding trajectory file in the `user_session_logs/mturk` folder. Each file corresponds to a single instruction/web session, and each step of the file corresponds to a single action (i.e. `search[...]`, `click[...]`).\n\nThe current WebShop build comes with two flags:\n* `--log`: Include this flag to create a trajectory `.jsonl` log file of actions on WebShop\n* `--attrs`: Include this flag to display an `Attributes` tab on the `item_page` of WebShop\n\n### Text Environment (`simple` mode)\nThe `simple` mode of the WebShop environment is packaged and readily available as an OpenAI environment. The OpenAI gym definitions of the text environment can be found in the `web_agent_site/envs` folder.\n\nTo start using the gym and building agents that interact with the WebShop environment, include the following statements in your Python file:\n```python\nimport gym\nfrom web_agent_site.envs import WebAgentTextEnv\n\nenv = gym.make('WebAgentTextEnv-v0', observation_mode='text', num_products=...)\n```\nNow, you can write your own agent that interacts with the environment via the standard OpenAI gym [interface](https://www.gymlibrary.ml/content/api/).\n\nExamples of a `RandomPolicy` agent interacting with the WebShop environment in both `html` and `simple` mode can be found in the `run_envs` folder. To run these examples locally, run the `run_web_agent_text_env.sh` or `run_web_agent_site_env.sh` script:\n```sh\n\u003e ./run_web_agent_text_env.sh\nProducts loaded.\nKeys Cleaned.\nAttributes Loaded.\n100%|██████████████████| 1000/1000\nLoaded 6910 goals.\nAmazon Shopping Game [SEP] Instruction: [SEP] Find me slim f...\nAvailable actions: {'has_search_bar': True, 'clickables': ['search']}\nTaking action \"search[shoes]\" -\u003e Reward = 0.0\n...\n```\nIn order to run the `run_web_agent_site_env.sh` script, you must download a version of [ChromeDriver](https://chromedriver.chromium.org/downloads) compatible with your Chrome browser version. Once you have downloaded and unzipped the executable, rename it `chromedriver` and place it in the `webshop/envs` folder.\n\n### Baseline Models\nTo run baseline models (rule, IL, RL, IL+RL) from the paper, please refer to the `README.md` in the [baseline_models](https://github.com/princeton-nlp/webshop/tree/master/baseline_models) folder.\n\n### Sim-to-real Transfer\nTo read more about how the sim-to-real transfer of agents trained on WebShop to other environments works, please refer to the `README.md` in the [transfer](https://github.com/princeton-nlp/webshop/tree/master/transfer) folder.\n\n## 💫 Contributions\nWe would love to hear from the broader NLP and Machine Learning community, and we welcome any contributions, pull requests, or issues! To do so, please either file a new pull request or issue and fill in the corresponding templates accordingly. We'll be sure to follow up shortly!\n\n## 🪪 License\nCheck `LICENSE.md`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprinceton-nlp%2FWebShop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprinceton-nlp%2FWebShop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprinceton-nlp%2FWebShop/lists"}