An open API service indexing awesome lists of open source software.

https://github.com/epfml/cifar

MLO internal cifar 10 / 100 default implementation / reference implementation. single machine, variable batch sizes, allowing maybe gradient compression. need to have clear documentation to make it easy to use, and so that we don't loose time with looking for hyperparameters. we can later keep it in sync with mlbench too, but self-contained is even better
https://github.com/epfml/cifar

Last synced: about 1 year ago
JSON representation

MLO internal cifar 10 / 100 default implementation / reference implementation. single machine, variable batch sizes, allowing maybe gradient compression. need to have clear documentation to make it easy to use, and so that we don't loose time with looking for hyperparameters. we can later keep it in sync with mlbench too, but self-contained is even better

Awesome Lists containing this project

README

          

# Cifar 10/100 default implementation

MLO internal cifar 10 / 100 reference implementation.

- Single machine
- Variable batch sizes
- ...

## Getting started

- Install Python 3 and `pip`.
- Clone this repository and open it.
- `pip install -r requirements.txt`

## Code organization

### train.py
This file contains the training loop and it sets up the optimization task. It contains a global `config` dictionary that should contain all configurable parameters. This file can be run standalone (`python3 ./train.py`) or by a manager script (see below).

### experiments/
To do an experiment with specific settings for the `config` dictionary, you can import `train.py` as a module and overwrite its placeholder definitions for `config`, `log_metric` and `output_dir`.

A proper experiment could look like this:

```python
import train

train.output_dir = 'output/tuning/lr{}_mom{}'.format(lr, mom)
os.makedirs(train.output_dir)

# Configure the experiment
train.config = dict(
dataset='Cifar100',
model='resnet18',
optimizer='SGD',
optimizer_decay_at_epochs=[30, 60, 90, 120, 150, 180, 210, 240, 270],
optimizer_decay_with_factor=2.0,
optimizer_learning_rate=lr,
optimizer_momentum=mom,
optimizer_weight_decay=0.0005,
batch_size=128,
num_epochs=2,
seed=42,
)

# Save the config
with open(os.path.join(train.output_dir, 'config.json'), 'w') as fp:
json.dump(train.config, fp, indent=' ')

# Configure the logging of scalar measurements
logfile = utils.logging.JSONLogger(os.path.join(train.output_dir, 'metrics.json'))
train.log_metric = logfile.log_metric

# Train
best_accuracy = train.main()
```

The `experiments/` directory contains an example of a hyperparameter [grid search](experiments/grid_search_demo.py).

### models/
This directory contains model definitions for many popular computer vision networks. They were copied from [kuangliu/pytorch-cifar](https://github.com/kuangliu/pytorch-cifar) and slightly extended by Quentin, Praneeth and Thijs.

### hyperparameters/
This directory is supposed to contain reference settings for hyperparameters, together with the accuracy they are expected to achieve.

### utils/
Miscelaneous utilities. At the time of writing these docs, this contains accumulators for running averages and max, and a simple logging class.

## Runtime

| Model | Dataset | Epochs | Hardware | Time |
|-----------|----------|--------|----------------------|------|
| ResNet 18 | Cifar 10 | 300 | 1x Nvidia Tesla V100 | 2:11 |
| VGG 11 | Cifar 10 | 300 | 1x Nvidia Tesla V100 | 1:06 |

## job-monitor
This setup is compatible with the work-in-progress [epfml/job-monitor](https://github.com/epfml/job-monitor).