Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benediktalkin/imagenetsubsetgenerator
Creates subsets of ImageNet (e.g. ImageNet100)
https://github.com/benediktalkin/imagenetsubsetgenerator
dataset-generation imagenet imagenet-100 imagenet-1k imagenet-dataset machine-learning
Last synced: about 1 month ago
JSON representation
Creates subsets of ImageNet (e.g. ImageNet100)
- Host: GitHub
- URL: https://github.com/benediktalkin/imagenetsubsetgenerator
- Owner: BenediktAlkin
- License: mit
- Created: 2022-07-26T09:53:52.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-28T23:12:36.000Z (10 months ago)
- Last Synced: 2024-02-29T00:25:18.587Z (10 months ago)
- Topics: dataset-generation, imagenet, imagenet-100, imagenet-1k, imagenet-dataset, machine-learning
- Language: Python
- Homepage:
- Size: 2.98 MB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ImageNet subset generator
Generate a subsets from the original ImageNet1K dataset.
Some commonly used subsets:
- [SimclrV2 10% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/10percent.txt)
- [SemiViT 10% subset](https://github.com/amazon-science/semi-vit)
- [SimclrV2 1% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/1percent.txt)
- [SemiViT 1% subset](https://github.com/amazon-science/semi-vit)
- Extreme low-shot subsets from [MSN](https://github.com/facebookresearch/msn)# Usage
- `git clone https://github.com/BenediktAlkin/ImageNetSubsetGenerator`
- `cd ImageNetSubsetGenerator`## Generate subset
- `python main_subset.py --in1k_path --out_path --version in100_sololearn`
- this will copy the corresponding samples from the `ImageNet1K_path` to `out_path`
- it can then be readily used with e.g. torchvision ImageFolder `subset = ImageFolder(root=)`For example: `python main_subset.py --in1k_path /data/imagenet1k --out_path /data/imagenet1k_10percent_simclrv2 --version in1k_10percent_simclrv2`
You can find all supported versions [here](https://github.com/BenediktAlkin/ImageNetSubsetGenerator/tree/main/imagenet_subset_generator/versions) or via `python main_subset.py --help`.
## Check classes/samples of dataset
`python main_statistics.py `
```
train n_classes: 1000
valid n_classes: 1000
train n_samples: 1282169
valid n_samples: 50000
train classes: ['n01440764', ...]
valid classes: ['n01440764', ...]
```