Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/weiaicunzai/Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks
experiments on Paper <Bag of Tricks for Image Classification with Convolutional Neural Networks> and other useful tricks to improve CNN acc
https://github.com/weiaicunzai/Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks
image-classification pytorch
Last synced: 9 days ago
JSON representation
experiments on Paper <Bag of Tricks for Image Classification with Convolutional Neural Networks> and other useful tricks to improve CNN acc
- Host: GitHub
- URL: https://github.com/weiaicunzai/Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks
- Owner: weiaicunzai
- Created: 2017-01-11T09:46:47.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-03-05T07:58:52.000Z (over 5 years ago)
- Last Synced: 2024-08-01T13:31:47.451Z (3 months ago)
- Topics: image-classification, pytorch
- Language: Python
- Homepage:
- Size: 53.7 KB
- Stars: 711
- Watchers: 20
- Forks: 123
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Bag of Tricks for Image Classification with Convolutional Neural Networks
This repo was inspired by Paper [Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)
I would test popular training tricks as many as I can for improving image classification accuarcy, feel
free to leave a comment about the tricks you want me to test(please write the referenced paper along with
the tricks)## hardware
Using 4 Tesla P40 to run the experiments## dataset
I will use [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset instead of ImageNet,
just for simplicity, this is a fine-grained image classification dataset, which contains 200 birds categlories,
5K+ training images, and 5K+ test images.The state of the art acc on vgg16 is around 73%(please correct me if
I was wrong).You could easily change it to the ones you like: [Stanford Dogs](http://vision.stanford.edu/aditya86/ImageNetDogs/), [Stanford Cars](http://vision.stanford.edu/aditya86/ImageNetDogs/).
Or even ImageNet.## network
Use a VGG16 network to test my tricks, also for simplicity reasons, since VGG16 is easy to implement. I'm considering
switch to AlexNet, to see how powerful these tricks are.## tricks
tricks I've tested, some of them were from the Paper [Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187) :
|trick|referenced paper|
|:---:|:---:|
|xavier init|[Understanding the difficulty of training deep feedforward neural networks](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)|
|warmup training|[Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677v2)|
|no bias decay|[Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes](https://arxiv.org/abs/1807.11205vx)|
|label smoothing|[Rethinking the inception architecture for computer vision](https://arxiv.org/abs/1512.00567v3))|
|random erasing|[Random Erasing Data Augmentation](https://arxiv.org/abs/1708.04896v2)|
|cutout|[Improved Regularization of Convolutional Neural Networks with Cutout](https://arxiv.org/abs/1708.04552v2)|
|linear scaling learning rate|[Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677v2)|
|cosine learning rate decay|[SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983)|**and more to come......**
## result
baseline(training from sctrach, no ImageNet pretrain weights are used):
vgg16 64.60% on [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset, lr=0.01, batchsize=64
effects of stacking tricks
|trick|acc|
|:---:|:---:|
|baseline|64.60%|
|+xavier init and warmup training|66.07%|
|+no bias decay|70.14%|
|+label smoothing|71.20%|
|+random erasing|does not work, drops about 4 points|
|+linear scaling learning rate(batchsize 256, lr 0.04)|71.21%|
|+cutout|does not work, drops about 1 point|
|+cosine learning rate decay|does not work, drops about 1 point|