Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wellecks/mnist_multi
MNIST with Multiset Labels dataset generation
https://github.com/wellecks/mnist_multi
Last synced: 17 days ago
JSON representation
MNIST with Multiset Labels dataset generation
- Host: GitHub
- URL: https://github.com/wellecks/mnist_multi
- Owner: wellecks
- Created: 2017-11-23T13:30:02.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-09-29T13:03:59.000Z (over 4 years ago)
- Last Synced: 2024-10-29T20:13:01.012Z (2 months ago)
- Language: Python
- Size: 191 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MNIST Multiset
This repo contains a script to generate MNIST Multiset datasets, e.g. as used in [Saliency-based Sequential Image Attention with Multiset Prediction](https://arxiv.org/abs/1711.05165) and [Loss Functions for Multiset Prediction](https://arxiv.org/abs/1711.05246).
Each dataset contains images with a possibly variable number of digits that can vary in size and can have clutter. Each image has a multiset of label and bounding box annotations.
Dataset variations can be created for multiset, set, and sequence prediction, with varying levels of difficulty. This class of datasets can also be useful for evaluating generalization to different sequence lengths.
## Examples
#### 4 digits, 20-50px digit size, with clutter
```bash
python mnist_multi.py --min-digits 4 --max-digits 4 \
--min-digit-size 20 --max-digit-size 50 \
--tag min20_max50_4 --output-dir output/
```#### 1-4 digits, 20-50px digit size, without clutter
```bash
python mnist_multi.py --min-digits 1 --max-digits 4 \
--min-digit-size 20 --max-digit-size 50 \
--min-num-clutter 0 --max-num-clutter 0 \
--tag min20_max50_1_4 --output-dir output/
```#### 10 digits, 20px digit size, with clutter
```bash
python mnist_multi.py --min-digits 10 --max-digits 10 \
--min-digit-size 20 --max-digit-size 20 \
--tag min20_max20_10 --output-dir output/
```Many other variations are possible (please see the available flags with `python mnist_multi.py -h`), including:
* `--set`: no duplicate labels in an image
* `--img-width`: change the size of output images## Loading
`util.py` contains a function to load MNIST Multi into a `PyTorch` `Dataset`.
Using the `label_order` flag, labels can be ordered randomly, or used for sequence prediction by ordering the labels spatially, by object area, or according to a fixed random ordering.
To evaluate invariance to label ordering, use `randomize_dataset=True`, which will re-randomize the label order every time a minibatch is drawn.