{"id":13677362,"url":"https://github.com/amazon-science/esci-data","last_synced_at":"2025-04-12T06:11:44.901Z","repository":{"id":37006609,"uuid":"484050569","full_name":"amazon-science/esci-data","owner":"amazon-science","description":"Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search","archived":false,"fork":false,"pushed_at":"2024-10-07T15:52:12.000Z","size":383,"stargazers_count":281,"open_issues_count":4,"forks_count":59,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-12T06:11:36.344Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://amazonkddcup.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-21T12:53:16.000Z","updated_at":"2025-04-08T12:12:21.000Z","dependencies_parsed_at":"2024-04-18T09:59:14.861Z","dependency_job_id":"0cf000b3-c2e1-46de-9053-3337656e5a52","html_url":"https://github.com/amazon-science/esci-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fesci-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fesci-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fesci-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fesci-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/esci-data/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248525138,"owners_count":21118619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T13:00:41.010Z","updated_at":"2025-04-12T06:11:44.869Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":["Python","数据搜索引擎"],"sub_categories":["网络服务_其他"],"readme":"# Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search\n\n## Introduction\n\nWe introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to the query. Each query-product pair is accompanied by additional information. The dataset is multilingual, as it contains queries in English, Japanese, and Spanish.\n\n\nThe primary objective of releasing this dataset is to create a benchmark for building new ranking strategies and simultaneously identifying interesting categories of results (i.e., substitutes) that can be used to improve the customer experience when searching for products. The three different tasks that are studied in the literature (see https://amazonkddcup.github.io/) using this Shopping Queries Dataset are:\n\n\n**Task 1 - Query-Product Ranking**: Given a user specified query and a list of matched products, the goal of this task is to rank the products so that the relevant products are ranked above the non-relevant ones.\n\n\n**Task 2 - Multi-class Product Classification**: Given a query and a result list of products retrieved for this query, the goal of this task is to classify each product as being an Exact, Substitute, Complement, or Irrelevant match for the query.\n\n\n**Task 3 - Product Substitute Identification**: This task will measure the ability of the systems to identify the substitute products in the list of results for a given query.\n\n## Dataset\n\nWe provide two different versions of the data set. One for task 1 which is reduced version in terms of number of examples and ones for tasks 2 and 3 which is a larger.\n\nThe training data set contain a list of query-result pairs with annotated E/S/C/I labels. The data is **multilingual** and it includes queries from **English**, **Japanese**, and **Spanish** languages. The examples in the data set have the following fields: `example_id`, `query`, `query_id`, `product_id`, `product_locale`, `esci_label`, `small_version`, `large_version`, `split`, `product_title`, `product_description`, `product_bullet_point`, `product_brand`, `product_color` and  `source`\n\nThe Shopping Queries Data Set is a large-scale manually annotated data set composed of challenging customer queries.\n\nThere are 2 versions of the dataset. The reduced version of the data set contains `48,300 unique queries` and `1,118,011 rows` corresponding each to a `\u003cquery, item\u003e` judgement. The larger version of the data set contains `130,652 unique queries` and `2,621,738 judgements`. The reduced version of the data accounts for queries that are deemed to be **“easy”**, and hence filtered out. The data is stratified by queries in two splits train, and test.\n\nA summary of our Shopping Queries Data Set is given in the two tables below showing the statistics of the reduced and larger version, respectively. These tables include the number of unique queries, the number of judgements, and the average number of judgements per query (i.e., average depth) across the three different languages.\n\n|       | Total | Total | Total | Train | Train | Train | Test | Test | Test |\n| ------------- | ---------- | ------------- | ---------- | ---------- | ------------- | ---------- | ---------- | ------------- | ---------- |\n| Language      | \\# Queries | \\# Judgements | Avg. Depth | \\# Queries | \\# Judgements | Avg. Depth | \\# Queries | \\# Judgements | Avg. Depth |\n| English (US)  | 29,844     | 601,354       | 20.15      | 20,888     | 419,653       | 20.09      | 8,956      | 181,701       | 20.29      |\n| Spanish (ES)  | 8,049      | 218,774       | 27.18      | 5,632      | 152,891       | 27.15      | 2,417      | 65,883        | 27.26      |\n| Japanese (JP) | 10,407     | 297,883       | 28.62      | 7,284      | 209,094       | 28.71      | 3,123      | 88,789        | 28.43      |\n| Overall       | 48,300     | 1,118,011     | 23.15      | 33,804     | 781,638       | 23.12      | 14,496     | 336,373       | 23.20      |\n\n***Table 1**: Summary of the Shopping queries data set for task 1 (reduced version) - the number of unique queries, the number of judgements, and the average number of judgements per query.*\n\n|       | Total | Total | Total | Train | Train | Train | Test | Test | Test |\n| ------------- | ---------- | ------------- | ---------- | ---------- | ------------- | ---------- | ---------- | ------------- | ---------- |\n| Language      | \\# Queries | \\# Judgements | Avg. Depth | \\# Queries | \\# Judgements | Avg. Depth | \\# Queries | \\# Judgements | Avg. Depth |\n| English (US)  | 97,345     | 1,818,825     | 18.68      | 74,888     | 1,393,063     | 18.60      | 22,458     | 425,762       | 18.96      |\n| Spanish (ES)  | 15,180     | 356,410       | 23.48      | 11,336     | 263,063       | 23.21      | 3,844      | 93,347        | 24.28      |\n| Japanese (JP) | 18,127     | 446,053       | 24.61      | 13,460     | 327,146       | 24.31      | 4,667      | 118,907       | 25.48      |\n| Overall       | 130,652    | 2,621,288     | 20.06      | 99,684     | 1,983,272     | 19.90      | 30,969     | 638,016       | 20.60      |\n\n***Table 2**: Summary of the Shopping queries data set for tasks 2 and 3 (larger version) - the number of unique queries, the number of judgements, and the average number of judgements per query.*\n\n## Usage\n\nThe [dataset](https://github.com/amazon-research/esci-code/tree/main/shopping_queries_dataset) has the following files:\n- `shopping_queries_dataset_examples.parquet` contains the following columns : `example_id`, `query`, `query_id`, `product_id`, `product_locale`, `esci_label`, `small_version`, `large_version`, `split`\n- `shopping_queries_dataset_products.parquet` contains the following columns : `product_id`, `product_title`, `product_description`, `product_bullet_point`, `product_brand`, `product_color`, `product_locale`\n- `shopping_queries_dataset_sources.csv` contains the following columns : `query_id`, `source`\n\n### Load examples, products and sources\n\n```\nimport pandas as pd\ndf_examples = pd.read_parquet('shopping_queries_dataset_examples.parquet')\ndf_products = pd.read_parquet('shopping_queries_dataset_products.parquet')\ndf_sources = pd.read_csv(\"shopping_queries_dataset_sources.csv\")\n```\n\n### Merge examples with products\n```\ndf_examples_products = pd.merge(\n    df_examples,\n    df_products,\n    how='left',\n    left_on=['product_locale','product_id'],\n    right_on=['product_locale', 'product_id']\n)\n```\n### Filter and prepare for Task 1\n\n```\ndf_task_1 = df_examples_products[df_examples_products[\"small_version\"] == 1]\ndf_task_1_train = df_task_1[df_task_1[\"split\"] == \"train\"]\ndf_task_1_test = df_task_1[df_task_1[\"split\"] == \"test\"]\n```\n\n### Filter and prepare data for Task 2\n```\ndf_task_2 = df_examples_products[df_examples_products[\"large_version\"] == 1]\ndf_task_2_train = df_task_2[df_task_2[\"split\"] == \"train\"]\ndf_task_2_test = df_task_2[df_task_2[\"split\"] == \"test\"]\n```\n\n### Filter and prepare data for Task 3\n```\ndf_task_3 = df_examples_products[df_examples_products[\"large_version\"] == 1]\ndf_task_3[\"subtitute_label\"] = df_task_3[\"esci_label\"].apply(lambda esci_label: 1 if esci_label == \"S\" else 0 )\ndel df_task_3[\"esci_label\"]\ndf_task_3_train = df_task_3[df_task_3[\"split\"] == \"train\"]\ndf_task_3_test = df_task_3[df_task_3[\"split\"] == \"test\"]\n```\n    \n### Merge queries with sources (optional)\n```\ndf_examples_products_source = pd.merge(\n    df_examples_products,\n    df_sources,\n    how='left',\n    left_on=['query_id'],\n    right_on=['query_id']\n)\n```\n\n## Baselines\nIn order to ensure the feasibility of the proposed tasks, we will provide the results obtained by standard baseline models run on this data sets. For example, for the first task (ranking), we have run a BERT model. For the remaining two tasks (classification) we will provide the results of the multilingual BERT-based models as the initial baseline.\n\n\n### Requirements\nWe launched the baselines experiments creating an environment with Python 3.6 and installing the packages dependencies shown below:\n```\nnumpy==1.19.2\npandas==1.1.5\ntransformers==4.16.2\nscikit-learn==0.24.1\nsentence-transformers==2.1.0\n```\n\nFor installing the dependencies we can launch the following command:\n```bash\npip install -r requirements.txt\n```\n\n### Reproduce published results\n\nFor a task **K**, we provide the same scripts, one for training the model (and preprocessing the data for tasks 2 and 3): `launch-experiments-taskK.sh`; and a second script for getting the predictions for the public test set using the model trained on the previous step: `launch-predictions-taskK.sh`.\n\n#### Task 1 - Query Product Ranking\n\nFor task 1, we fine-tuned 3 models one for each `product_locale`.\n\nFor `us` locacale we fine-tuned [MS MARCO Cross-Encoders](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2). For `es` and `jp` locales [multilingual MPNet](https://huggingface.co/sentence-transformers/all-mpnet-base-v1). We used the query and title of the product as input for these models.\n\nTo get the nDCG score of the ranking models is needed the `terrier` source code (download version 5.5 [here](http://terrier.org/download/))\n\n```bash\ncd ranking/\n./launch-experiments-task1.sh\n./launch-predictions-task1.sh $TERRIER_PATH\n```\n\n#### Task 2 - Multiclass Product Classification\n\nFor task 2, we trained a Multilayer perceptron (MLP) classifier whose input is the concatenation of the representations provided by [BERT multilingual base](https://huggingface.co/bert-base-multilingual-uncased) for the query and title of the product.\n\n```bash\ncd classification_identification/\n./launch-experiments-task2.sh\n./launch-predictions-task2.sh\n```\n\n#### Task 3 - Product Substitute Identification\n\nFor task 3, we followed the same approach as in task 2.\n\n```bash\ncd classification_identification/\n./launch-experiments-task3.sh\n./launch-predictions-task3.sh\n```\n\n### Results\nThe following table shows the baseline results obtained through the different public tests of the three tasks.\n\n| Task |  Metrics  | Scores |\n|:----:|:--------:|:-----:|\n|    1 | nDCG     | 0.83 |\n|    2 | Macro F1, Micro F1 | 0.23, 0.62 |\n|    3 | Macro F1, Micro F1 | 0.44, 0.76 |\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## Cite\n\nPlease cite our paper if you use this dataset for your own research:\n\n```BibTeX\n@article{reddy2022shopping,\ntitle={Shopping Queries Dataset: A Large-Scale {ESCI} Benchmark for Improving Product Search},\nauthor={Chandan K. Reddy and Lluís Màrquez and Fran Valero and Nikhil Rao and Hugo Zaragoza and Sambaran Bandyopadhyay and Arnab Biswas and Anlu Xing and Karthik Subbian},\nyear={2022},\neprint={2206.06588},\narchivePrefix={arXiv}\n}\n```\n## License\n\nThis project is licensed under the Apache-2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fesci-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fesci-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fesci-data/lists"}