{"id":21664628,"url":"https://github.com/benediktalkin/imagenetsubsetgenerator","last_synced_at":"2025-09-11T23:43:26.227Z","repository":{"id":49766044,"uuid":"518004264","full_name":"BenediktAlkin/ImageNetSubsetGenerator","owner":"BenediktAlkin","description":"Creates subsets of ImageNet (e.g. ImageNet100)","archived":false,"fork":false,"pushed_at":"2024-02-28T23:12:36.000Z","size":3128,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-19T02:19:54.592Z","etag":null,"topics":["dataset-generation","imagenet","imagenet-100","imagenet-1k","imagenet-dataset","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BenediktAlkin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-26T09:53:52.000Z","updated_at":"2025-03-03T20:47:28.000Z","dependencies_parsed_at":"2023-01-28T02:31:08.945Z","dependency_job_id":"a640363c-331f-470c-8578-01f1e7cbe1ac","html_url":"https://github.com/BenediktAlkin/ImageNetSubsetGenerator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BenediktAlkin/ImageNetSubsetGenerator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FImageNetSubsetGenerator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FImageNetSubsetGenerator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FImageNetSubsetGenerator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FImageNetSubsetGenerator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BenediktAlkin","download_url":"https://codeload.github.com/BenediktAlkin/ImageNetSubsetGenerator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FImageNetSubsetGenerator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274727539,"owners_count":25338399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-generation","imagenet","imagenet-100","imagenet-1k","imagenet-dataset","machine-learning"],"created_at":"2024-11-25T10:41:17.768Z","updated_at":"2025-09-11T23:43:26.190Z","avatar_url":"https://github.com/BenediktAlkin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ImageNet subset generator\n\nGenerate a subsets from the original ImageNet1K dataset.\nSome commonly used subsets:\n- [SimclrV2 10% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/10percent.txt)\n- [SemiViT 10% subset](https://github.com/amazon-science/semi-vit)\n- [SimclrV2 1% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/1percent.txt)\n- [SemiViT 1% subset](https://github.com/amazon-science/semi-vit)\n- Extreme low-shot subsets from [MSN](https://github.com/facebookresearch/msn)\n\n\n# Usage\n- `git clone https://github.com/BenediktAlkin/ImageNetSubsetGenerator`\n- `cd ImageNetSubsetGenerator`\n\n\n## Generate subset\n\n- `python main_subset.py --in1k_path \u003cImageNet1K_path\u003e --out_path \u003cout_path\u003e --version in100_sololearn`\n- this will copy the corresponding samples from the `ImageNet1K_path` to `out_path`\n- it can then be readily used with e.g. torchvision ImageFolder `subset = ImageFolder(root=\u003cout_path\u003e)`\n\nFor example: `python main_subset.py --in1k_path /data/imagenet1k --out_path /data/imagenet1k_10percent_simclrv2 --version in1k_10percent_simclrv2`\n\n\nYou can find all supported versions [here](https://github.com/BenediktAlkin/ImageNetSubsetGenerator/tree/main/imagenet_subset_generator/versions) or via `python main_subset.py --help`.\n\n\n\n## Check classes/samples of dataset\n\n`python main_statistics.py \u003cpath\u003e`\n```\ntrain n_classes: 1000\nvalid n_classes: 1000\ntrain n_samples: 1282169\nvalid n_samples: 50000\ntrain classes: ['n01440764', ...]\nvalid classes: ['n01440764', ...]\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenediktalkin%2Fimagenetsubsetgenerator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenediktalkin%2Fimagenetsubsetgenerator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenediktalkin%2Fimagenetsubsetgenerator/lists"}