{"id":34108116,"url":"https://github.com/st-tech/zozo-shift15m","last_synced_at":"2026-04-02T01:04:05.993Z","repository":{"id":43041730,"uuid":"377755799","full_name":"st-tech/zozo-shift15m","owner":"st-tech","description":"SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts","archived":false,"fork":false,"pushed_at":"2023-10-18T02:27:14.000Z","size":11352,"stargazers_count":175,"open_issues_count":9,"forks_count":16,"subscribers_count":63,"default_branch":"main","last_synced_at":"2025-12-17T01:30:06.602Z","etag":null,"topics":["covariate-shift","cvpr","cvpr2023","dataset","dataset-shifts","datasets","deep-learning","distributional-shift","fashion","fill-in-the-blank","fill-in-the-n-blank","machine-learning","research","set-matching","target-shift"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/st-tech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.CC","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-06-17T08:17:10.000Z","updated_at":"2025-12-15T07:02:15.000Z","dependencies_parsed_at":"2025-04-11T21:12:42.074Z","dependency_job_id":"0efebd2e-1e7b-4b6c-b900-f40f77134f24","html_url":"https://github.com/st-tech/zozo-shift15m","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/st-tech/zozo-shift15m","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/st-tech%2Fzozo-shift15m","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/st-tech%2Fzozo-shift15m/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/st-tech%2Fzozo-shift15m/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/st-tech%2Fzozo-shift15m/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/st-tech","download_url":"https://codeload.github.com/st-tech/zozo-shift15m/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/st-tech%2Fzozo-shift15m/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31293631,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T21:15:39.731Z","status":"ssl_error","status_checked_at":"2026-04-01T21:15:34.046Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["covariate-shift","cvpr","cvpr2023","dataset","dataset-shifts","datasets","deep-learning","distributional-shift","fashion","fill-in-the-blank","fill-in-the-n-blank","machine-learning","research","set-matching","target-shift"],"created_at":"2025-12-14T18:13:45.984Z","updated_at":"2026-04-02T01:04:05.983Z","avatar_url":"https://github.com/st-tech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/shift15m.png\" width=\"50%\" style=\"display: block; margin: 0 auto\" /\u003e\n\u003c/p\u003e\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Python](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-blue)](https://www.python.org)\n![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/st-tech/zozo-shift15m)\n[![Downloads](https://static.pepy.tech/personalized-badge/shift15m?period=month\u0026units=international_system\u0026left_color=grey\u0026right_color=blue\u0026left_text=Downloads)](https://pepy.tech/project/shift15m)\n[![PyPI version](https://badge.fury.io/py/shift15m.svg)](https://badge.fury.io/py/shift15m)\n![GitHub issues](https://img.shields.io/github/issues/st-tech/zozo-shift15m)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/st-tech/zozo-shift15m)\n![GitHub last commit](https://img.shields.io/github/last-commit/st-tech/zozo-shift15m)\n[![arXiv](https://img.shields.io/badge/arXiv-2108.12992-b31b1b.svg)](https://arxiv.org/abs/2108.12992)\n\n\n# SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts\n- [[arXiv]](https://arxiv.org/abs/2108.12992)\n- [[CVPRW2023]](https://openaccess.thecvf.com/content/CVPR2023W/CVFAD/papers/Kimura_SHIFT15M_Fashion-Specific_Dataset_for_Set-to-Set_Matching_With_Several_Distribution_Shifts_CVPRW_2023_paper.pdf)\n- accepted at CVPR2023 workshop on [CVFAD](https://sites.google.com/view/cvfad2023/home?authuser=0) as an oral paper (acceptance rate = 18.5%)\n\nSet-to-set matching is the problem of matching two different sets of items based on some criteria. Especially when each item in the set is high-dimensional, such as an image, set-to-set matching is treated as one of the applied problems to be solved by utilizing neural networks. Most machine learning-based set-to-set matching generally assumes that the training and test data follow the same distribution. However, such assumptions are often violated in real-world machine learning problems. In this paper, we propose SHIFT15M, a dataset that can be used to properly evaluate set-to-set matching models in situations where the distribution of data changes between training and testing. Some benchmark experiments show that the performance of naive methods drops due to the effects of the distribution shift. In addition, we provide software to handle the SHIFT15M dataset in a very simple way. The URL for the software will appear after this manuscript is published.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/CVPRW2023_SHIFT15M_poster.png\" width=\"100%\" style=\"display: block; margin: 0 auto\" /\u003e\n\u003c/p\u003e\n\nWe provide the [Datasheet for SHIFT15M](./DATASHEET.md).\nThis datasheet is based on the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) [1] template.\n\n|      System       |                              Python 3.6                              |                              Python 3.7                              |                              Python 3.8                              |\n| :---------------: | :------------------------------------------------------------------: | :------------------------------------------------------------------: | :------------------------------------------------------------------: |\n|     Linux CPU     | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e |\n|     Linux GPU     | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e |\n| Windows CPU / GPU |  \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e   |  \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e    | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e  |\n|    Mac OS CPU     | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e | \u003cimg src=\"https://img.shields.io/badge/build-success-brightgreen\" /\u003e |\n\nSHIFT15M is a large-scale dataset based on approximately 15 million items accumulated by the fashion search service IQON.\n\n## Installation\n\n### From PyPi\n\n```bash\n$ pip install shift15m\n```\n\n### From source\n\n```bash\n$ git clone https://github.com/st-tech/zozo-shift15m.git\n$ cd zozo-shift15m\n$ poetry build\n$ pip install dist/shift15m-xxxx-py3-none-any.whl\n```\n\n## Download SHIFT15M dataset\n\n### Use Dataset class\n\nYou can download SHIFT15M dataset as follows:\n\n```python\nfrom shift15m.datasets import NumLikesRegression\n\ndataset = NumLikesRegression(root=\"./data\", download=True)\n(x_train, y_train), (x_test, y_test) = dataset.load_dataset(target_shift=True)\n```\n\n### Download directly by using download scripts\n\nPlease download the dataset as follows:\n\n```bash\n$ bash scripts/download_all.sh\n```\n\n## Tasks\n\nThe following tasks are now available:\n\n| Tasks                                                                                                                  | Task type           | Shift type                    | # of input dim      | # of output dim |\n| ---------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------- | ------------------- | --------------- |\n| [NumLikesRegression](https://github.com/st-tech/zozo-shift15m/tree/main/benchmarks#regression-for-the-number-of-likes) | regression          | target shift                  | (N, 25)             | (N, 1)          |\n| [SumPricesRegression](https://github.com/st-tech/zozo-shift15m/tree/main/benchmarks#regression-for-the-sum-of-prices)  | regression          | covariate shift, target shift | (N, 1)              | (N, 1)          |\n| ItemPriceRegression                                                                                                    | regression          | target shift                  | (N, 4096)           | (N, 1)          |\n| [ItemCategoryClassification](https://github.com/st-tech/zozo-shift15m/tree/main/benchmarks/item_category_prediction)   | classification      | target shift                  | (N, 4096)           | (N, 7)          |\n| [Set2SetMatching](https://github.com/st-tech/zozo-shift15m/tree/main/benchmarks/set_matching)                          | set-to-set matching | covariate shift               | (N, 4096)x(M, 4096) | (1)             |\n\n## Benchmarks\n\nAs templates for numerical experiments on the SHIFT15M dataset, we have published [experimental results for each task with several models](./benchmarks).\n\n## Original Dataset Structure\n\nThe original dataset is maintained in json format, and a row consists of the following:\n\n```\n{\n  \"user\":{\"user_id\":\"xxxx\", \"fav_brand_ids\":\"xxxx,xx,...\"},\n  \"like_num\":\"xx\",\n  \"set_id\":\"xxx\",\n  \"items\":[\n    {\"price\":\"xxxx\",\"item_id\":\"xxxxxx\",\"category_id1\":\"xx\",\"category_id2\":\"xxxxx\"},\n    ...\n  ],\n  \"publish_date\":\"yyyy-mm-dd\",\n  \"tags\": \"tag_a, tag_b, tag_c, ...\"\n}\n```\n\n## Contributing\n\nTo learn more about making a contribution to SHIFT15M, please see the following materials:\n\n- [Developers Guide](./DEVELOPMENT.md)\n- [Task Proposal Guide](./TASK_PROPOSAL.md)\n- [Benchmark Proposal Guide](./BENCHMARK.md)\n\n## License\n\nThe dataset itself is provided under a [CC BY-NC 4.0 license](./LICENSE.CC).\nOn the other hand, the software in this repository is provided under the [MIT license](./LICENSE.MIT).\n\n## Dataset metadata\n\nThe following table is necessary for this dataset to be indexed by search engines such as [Google Dataset Search](https://datasetsearch.research.google.com/).\n\n\u003cdiv itemscope itemtype=\"http://schema.org/Dataset\"\u003e\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eproperty\u003c/th\u003e\n    \u003cth\u003evalue\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ename\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eSHIFT15M Dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ealternateName\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"alternateName\"\u003eSHIFT15M\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ealternateName\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"alternateName\"\u003eshift15m-dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eurl\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"url\"\u003ehttps://github.com/st-tech/zozo-shift15m\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003esameAs\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://github.com/st-tech/zozo-shift15m\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003edescription\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"description\"\u003eSHIFT15M is a multi-objective, multi-domain dataset which includes multiple dataset shifts.\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eprovider\u003c/td\u003e\n    \u003ctd\u003e\n      \u003cdiv itemscope itemtype=\"http://schema.org/Organization\" itemprop=\"provider\"\u003e\n        \u003ctable\u003e\n          \u003ctr\u003e\n            \u003cth\u003eproperty\u003c/th\u003e\n            \u003cth\u003evalue\u003c/th\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003ename\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eZOZO Research\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003esameAs\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://ja.wikipedia.org/wiki/ZOZO\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n        \u003c/table\u003e\n      \u003c/div\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003elicense\u003c/td\u003e\n    \u003ctd\u003e\n      \u003cdiv itemscope itemtype=\"http://schema.org/CreativeWork\" itemprop=\"license\"\u003e\n        \u003ctable\u003e\n          \u003ctr\u003e\n            \u003cth\u003eproperty\u003c/th\u003e\n            \u003cth\u003evalue\u003c/th\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003ename\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eCC BY-NC 4.0\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003eurl\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"url\"\u003ehttps://github.com/st-tech/zozo-shift15m/blob/main/LICENSE.CC\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n        \u003c/table\u003e\n      \u003c/div\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n## Errata\n\n- 01/08/2022, added tags info ([#187](https://github.com/st-tech/zozo-shift15m/issues/187))\n\n## Papers using this dataset\n\n- Papadopoulos, Stefanos I., et al. \"Multimodal Quasi-AutoRegression: Forecasting the visual popularity of new fashion products.\" arXiv preprint arXiv:2204.04014 (2022).\n- Papadopoulos, Stefanos, et al. Fashion Trend Analysis and Prediction Model. 1, Zenodo, 2021, doi:10.5281/zenodo.5795089.\n\n## References\n\n- [1] Gebru, Timnit, et al. \"Datasheets for datasets.\" arXiv preprint arXiv:1803.09010 (2018).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fst-tech%2Fzozo-shift15m","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fst-tech%2Fzozo-shift15m","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fst-tech%2Fzozo-shift15m/lists"}