https://github.com/epfml/cifar

MLO internal cifar 10 / 100 default implementation / reference implementation. single machine, variable batch sizes, allowing maybe gradient compression. need to have clear documentation to make it easy to use, and so that we don't loose time with looking for hyperparameters. we can later keep it in sync with mlbench too, but self-contained is even better
https://github.com/epfml/cifar

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/epfml/cifar
Owner: epfml
Created: 2019-02-26T12:43:46.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2023-02-08T18:56:19.000Z (about 3 years ago)
Last Synced: 2025-01-02T14:27:19.674Z (about 1 year ago)
Language: Python
Size: 166 KB
Stars: 0
Watchers: 14
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Cifar 10/100 default implementation

MLO internal cifar 10 / 100 reference implementation.

- Single machine

- Variable batch sizes

- ...

## Getting started

- Install Python 3 and `pip`.

- Clone this repository and open it.

- `pip install -r requirements.txt`

## Code organization

### train.py

This file contains the training loop and it sets up the optimization task. It contains a global `config` dictionary that should contain all configurable parameters. This file can be run standalone (`python3 ./train.py`) or by a manager script (see below).

### experiments/

To do an experiment with specific settings for the `config` dictionary, you can import `train.py` as a module and overwrite its placeholder definitions for `config`, `log_metric` and `output_dir`.

A proper experiment could look like this:

```python

import train

train.output_dir = 'output/tuning/lr{}_mom{}'.format(lr, mom)

os.makedirs(train.output_dir)

# Configure the experiment

train.config = dict(

    dataset='Cifar100',

    model='resnet18',

    optimizer='SGD',

    optimizer_decay_at_epochs=[30, 60, 90, 120, 150, 180, 210, 240, 270],

    optimizer_decay_with_factor=2.0,

    optimizer_learning_rate=lr,

    optimizer_momentum=mom,

    optimizer_weight_decay=0.0005,

    batch_size=128,

    num_epochs=2,

    seed=42,

)

# Save the config

with open(os.path.join(train.output_dir, 'config.json'), 'w') as fp:

    json.dump(train.config, fp, indent=' ')

# Configure the logging of scalar measurements

logfile = utils.logging.JSONLogger(os.path.join(train.output_dir, 'metrics.json'))

train.log_metric = logfile.log_metric

# Train

best_accuracy = train.main()

```

The `experiments/` directory contains an example of a hyperparameter [grid search](experiments/grid_search_demo.py).

### models/

This directory contains model definitions for many popular computer vision networks. They were copied from [kuangliu/pytorch-cifar](https://github.com/kuangliu/pytorch-cifar) and slightly extended by Quentin, Praneeth and Thijs.

### hyperparameters/

This directory is supposed to contain reference settings for hyperparameters, together with the accuracy they are expected to achieve.

### utils/

Miscelaneous utilities. At the time of writing these docs, this contains accumulators for running averages and max, and a simple logging class.

## Runtime

| Model     | Dataset  | Epochs | Hardware             | Time |

|-----------|----------|--------|----------------------|------|

| ResNet 18 | Cifar 10 | 300    | 1x Nvidia Tesla V100 | 2:11 |

| VGG 11    | Cifar 10 | 300    | 1x Nvidia Tesla V100 | 1:06 |

## job-monitor

This setup is compatible with the work-in-progress [epfml/job-monitor](https://github.com/epfml/job-monitor).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/epfml/cifar

Awesome Lists containing this project

README