Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/benediktalkin/imagenetsubsetgenerator

Creates subsets of ImageNet (e.g. ImageNet100)
https://github.com/benediktalkin/imagenetsubsetgenerator

dataset-generation imagenet imagenet-100 imagenet-1k imagenet-dataset machine-learning

Last synced: about 1 month ago
JSON representation

Creates subsets of ImageNet (e.g. ImageNet100)

Awesome Lists containing this project

README

        

# ImageNet subset generator

Generate a subsets from the original ImageNet1K dataset.
Some commonly used subsets:
- [SimclrV2 10% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/10percent.txt)
- [SemiViT 10% subset](https://github.com/amazon-science/semi-vit)
- [SimclrV2 1% subset](https://github.com/google-research/simclr/blob/master/imagenet_subsets/1percent.txt)
- [SemiViT 1% subset](https://github.com/amazon-science/semi-vit)
- Extreme low-shot subsets from [MSN](https://github.com/facebookresearch/msn)

# Usage
- `git clone https://github.com/BenediktAlkin/ImageNetSubsetGenerator`
- `cd ImageNetSubsetGenerator`

## Generate subset

- `python main_subset.py --in1k_path --out_path --version in100_sololearn`
- this will copy the corresponding samples from the `ImageNet1K_path` to `out_path`
- it can then be readily used with e.g. torchvision ImageFolder `subset = ImageFolder(root=)`

For example: `python main_subset.py --in1k_path /data/imagenet1k --out_path /data/imagenet1k_10percent_simclrv2 --version in1k_10percent_simclrv2`

You can find all supported versions [here](https://github.com/BenediktAlkin/ImageNetSubsetGenerator/tree/main/imagenet_subset_generator/versions) or via `python main_subset.py --help`.

## Check classes/samples of dataset

`python main_statistics.py `
```
train n_classes: 1000
valid n_classes: 1000
train n_samples: 1282169
valid n_samples: 50000
train classes: ['n01440764', ...]
valid classes: ['n01440764', ...]
```