https://github.com/eigenfoo/batch-renorm
A Tensorflow re-implementation of batch renormalization, first introduced by Sergey Ioffe.
https://github.com/eigenfoo/batch-renorm
batch-norm batch-normalization batch-renorm batch-renormalization deep-learning sergey-ioffe tensorflow
Last synced: 10 months ago
JSON representation
A Tensorflow re-implementation of batch renormalization, first introduced by Sergey Ioffe.
- Host: GitHub
- URL: https://github.com/eigenfoo/batch-renorm
- Owner: eigenfoo
- License: mit
- Created: 2018-10-10T16:10:54.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-03-15T22:01:45.000Z (almost 5 years ago)
- Last Synced: 2025-03-25T21:14:47.960Z (11 months ago)
- Topics: batch-norm, batch-normalization, batch-renorm, batch-renormalization, deep-learning, sergey-ioffe, tensorflow
- Language: Python
- Homepage:
- Size: 752 KB
- Stars: 13
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Batch Renormalization
A Tensorflow implementation of batch renormalization, first introduced by Sergey
Ioffe.
**Paper:**
Batch Renormalization: Towards Reducing Minibatch Dependence in
Batch-Normalized Models, Sergey Ioffe
https://arxiv.org/abs/1702.03275
**GitHub repository:**
https://github.com/eigenfoo/batch-renorm
The goal of this project is to reproduce the following figure from the paper:
Below is our reproduction:
## Description
There were a few things that we did differently from the paper:
- We used the CIFAR-100 dataset, instead of the ImageNet dataset.
- We used a plain convolutional network, instead of the Inception-v3
architecture.
- We used the Adam optimizer, instead of the RMSProp optimizer.
- We split minibatches into 800 microbatches of 2 examples each, instead of 400
microbatches of 4 examples each. Note that each minibatch still consists of
1600 examples.
- We trained for a mere 8k training updates, instead of 160k training updates.
- We ran the training 5 separate times, and averaged the learning curves from
all runs. This was not explicitly stated in the paper.
The reproduced results do not exactly mirror the paper's results: for instance,
the learning curves for batch norm and batch renorm do not converge to the same
value, and the learning curve for batch norm even appears to be curving down
towards the end of training.
We suspect that these discrepancies are due to two factors:
1. Not training for long enough time (8k training steps is nothing compared to
160k), and
2. Using a different architecture/dataset to reproduce the same results. While
the behavior should still be the same, it may be the case that certain
hyperparameters are ill-chosen.