https://github.com/fangvv/awesome-edge-intelligence-collections

About DNN compression and acceleration on Edge Devices.
https://github.com/fangvv/awesome-edge-intelligence-collections
List: awesome-edge-intelligence-collections
awesome-list copyleft dnn edge-computing edge-intelligence just-for-learning
Last synced: 6 months ago
JSON representation
About DNN compression and acceleration on Edge Devices.
Host: GitHub
URL: https://github.com/fangvv/awesome-edge-intelligence-collections
Owner: fangvv
Created: 2019-06-07T13:10:04.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2021-05-17T05:05:29.000Z (about 4 years ago)
Last Synced: 2024-05-23T08:04:43.884Z (about 1 year ago)
Topics: awesome-list, copyleft, dnn, edge-computing, edge-intelligence, just-for-learning
Homepage: https://fangvv.github.io/Homepage/Edgecomp/
Size: 80.1 KB
Stars: 52
Watchers: 3
Forks: 11
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

ultimate-awesome - awesome-edge-intelligence-collections - About DNN compression and acceleration on Edge Devices. (Other Lists / Julia Lists)
README

        这个库纯粹是因为我比较懒，所以搞了个集合，方便自己用而已，后来点star的越来越多，有点意外。我们组 [主页](https://fangvv.gitee.io/homepage/Edgecomp/) 正在做相关的研究工作，在我的账号fangvv下会不断地发布相关的论文代码，希望大家多多交流，互相学习。

Please note that I just want to collect these links from the original sites for research purposes. Welcome to join us to discuss interesting ideas on efficient DNN training/inference. 

https://zhuanlan.zhihu.com/p/58705979

http://blog.csdn.net/wspba/article/details/75671573

https://www.ctolib.com/ZhishengWang-Embedded-Neural-Network.html

https://blog.csdn.net/touch_dream/article/details/78441332

https://zhuanlan.zhihu.com/p/28439056

https://blog.csdn.net/QcloudCommunity/article/details/77719498

https://www.cnblogs.com/zhonghuasong/p/7493475.html

https://blog.csdn.net/jackytintin/article/details/53445280

https://zhuanlan.zhihu.com/p/27747628

https://blog.csdn.net/shuzfan/article/category/6271575

https://blog.csdn.net/cookie_234

https://www.jianshu.com/u/f5c90c3856bb

https://github.com/sun254/awesome-model-compression-and-acceleration

# awesome-model-compression-and-acceleration

---

## Paper

#### Overview

- [Model compression as constrained optimization, with application to neural nets. Part I: general framework](https://arxiv.org/abs/1707.01209)

- [Model compression as constrained optimization, with application to neural nets. Part II: quantization](https://arxiv.org/abs/1707.04319)

-[A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/pdf/1710.09282.pdf)

#### Structure

- [Dynamic Capacity Networks](https://arxiv.org/pdf/1511.07838.pdf)

- [ResNeXt: Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf)

- [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf)

- [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/pdf/1610.02357.pdf)

- [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083)

- [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342)

- [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360)

- [Residual Attention Network for Image Classification](https://arxiv.org/pdf/1704.06904.pdf)

- [SEP-Nets: Small and Effective Pattern Networks](https://arxiv.org/pdf/1706.03912.pdf)

- [Deep Networks with Stochastic Depth](https://arxiv.org/pdf/1603.09382.pdf)

- [Learning Infinite Layer Networks Without the Kernel Trick](https://arxiv.org/pdf/1606.05316v2.pdf)

- [Coordinating Filters for Faster Deep Neural Networks](https://arxiv.org/pdf/1703.09746v3.pdf)

- [ResBinNet: Residual Binary Neural Network](https://arxiv.org/abs/1711.01243)

- [Squeezedet: Uniﬁed, small, low power fully convolutional neural networks](https://arxiv.org/pdf/1612.01051)

- [Efficient Sparse-Winograd Convolutional Neural Networks](https://openreview.net/pdf?id=r1rqJyHKg)

- [DSD: Dense-Sparse-Dense Training for Deep Neural Networks](https://openreview.net/pdf?id=HyoST_9xl)

- [Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video](https://arxiv.org/abs/1709.05943)

- [Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation](https://arxiv.org/pdf/1801.04381.pdf)

#### Distillation

- [Dark knowledge](http://www.ttic.edu/dl/dark14.pdf)

- [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550.pdf)

- [Net2net: Accelerating learning via knowledge transfer]()

- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)

- [MobileID: Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11977)

- [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220.pdf)

- [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650.pdf)

- [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928.pdf)

- [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947.pdf)

- [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219.pdf)

- [Learning Efficient Object Detection Models with Knowledge Distillation](http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf)

- [Data-Free Knowledge Distillation For Deep Neural Networks](https://arxiv.org/pdf/1710.07535.pdf)

- [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513.pdf)

- [Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks](https://arxiv.org/pdf/1710.09505.pdf)

- [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613.pdf)

- [Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification](https://arxiv.org/pdf/1709.02929.pdf)

#### Binarization

- [Local Binary Convolutional Neural Networks](https://arxiv.org/pdf/1608.06049.pdf)

- [Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration](https://arxiv.org/pdf/1707.04693.pdf)

- [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/pdf/1602.02830.pdf)

- [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/pdf/1603.05279.pdf)

- [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/pdf/1606.06160.pdf)

#### Quantization

- [Quantize weights and activations in Recurrent Neural Networks](https://arxiv.org/pdf/1611.10176.pdf)

- [The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning](https://arxiv.org/pdf/1611.05402.pdf)

- [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/pdf/1512.06473.pdf)

- [Compressing Deep Convolutional Networks using Vector Quantization](https://arxiv.org/pdf/1412.6115.pdf)

- [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/pdf/1609.07061.pdf)

- [Fixed-Point Performance Analysis of Recurrent Neural Networks](https://arxiv.org/abs/1512.01322)

- [Loss-aware Binarization of Deep Networks](https://arxiv.org/pdf/1611.01600.pdf)

- [Towards the Limit of Network Quantization](https://arxiv.org/pdf/1612.01543.pdf)

- [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/pdf/1702.00953.pdf)

- [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/pdf/1706.02393.pdf)

- [Trained Ternary Quantization](https://arxiv.org/pdf/1612.01064.pdf)

#### Pruning

- [Data-Driven Sparse Structure Selection for Deep Neural Networks](https://arxiv.org/pdf/1707.01213.pdf)

- [Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization](https://arxiv.org/pdf/1707.09102.pdf)

- [Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing](http://www.cs.jhu.edu/~jason/papers/vieira+eisner.tacl17.pdf)

- [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/pdf/1611.05128.pdf)

- [Pruning Filters for Efficient ConvNets](https://arxiv.org/pdf/1608.08710.pdf)

- [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/pdf/1611.06440.pdf)

- [Soft Weight-Sharing for Neural Network Compression](https://arxiv.org/pdf/1702.04008.pdf)

- [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](https://arxiv.org/pdf/1510.00149.pdf)

- [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/pdf/1506.02626.pdf)

- [Dynamic Network Surgery for Efficient DNNs](https://arxiv.org/pdf/1608.04493.pdf)

- [ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA](https://arxiv.org/pdf/1612.00694.pdf)

- [Faster CNNs with Direct Sparse Convolutions and Guided Pruning](https://arxiv.org/pdf/1608.01409.pdf)

#### Low Rank Approximation

- [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](https://arxiv.org/pdf/1404.0736.pdf)

- [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](https://arxiv.org/pdf/1511.06530.pdf)

- [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](https://arxiv.org/pdf/1411.4229.pdf)

- [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/pdf/1505.06798.pdf)

- [Convolutional neural networks with low-rank regularization](https://arxiv.org/pdf/1511.06067.pdf)

- [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](https://arxiv.org/pdf/1404.0736.pdf)

- [Speeding up convolutional neural networks with low rank expansions](http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf)

https://github.com/memoiry/Awesome-model-compression-and-acceleration

# Awesome-model-compression-and-acceleration

Some papers I collected and deemed to be great to read, which is also what I'm about to read, raise a PR or issue if you have any suggestion regarding the list, Thank you.

### Survey

1. [A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/abs/1710.09282) [arXiv '17]

2. [Recent Advances in Efficient Computation of Deep Convolutional Neural Networks](https://arxiv.org/abs/1802.00939) [arXiv '18]

### Model and structure

1. [MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for

Classification, Detection and Segmentation](https://arxiv.org/pdf/1801.04381.pdf) [arXiv '18, Google]

1. [NasNet: Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/pdf/1707.07012.pdf) [arXiv '17, Google]

1. [DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices](https://arxiv.org/abs/1708.04728) [AAAI'18, Samsung]

1. [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083) [arXiv '17, Megvii]

1. [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) [arXiv '17, Google]

1. [CondenseNet: An Efficient DenseNet using Learned Group Convolutions](https://arxiv.org/abs/1711.09224) [arXiv '17]

1. [Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video](https://arxiv.org/abs/1709.05943)[arxiv'17]

1. [Shift-based Primitives for Efficient Convolutional Neural Networks](https://arxiv.org/pdf/1809.08458) [WACV'18]

### Quantization

1. [The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning](https://arxiv.org/abs/1611.05402) [ICML'17]

1. [Compressing Deep Convolutional Networks using Vector Quantization](https://arxiv.org/abs/1412.6115) [arXiv'14]

1. [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473) [CVPR '16]

1. [Fixed-Point Performance Analysis of Recurrent Neural Networks](https://arxiv.org/abs/1512.01322) [ICASSP'16]

1. [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061) [arXiv'16]

1. [Loss-aware Binarization of Deep Networks](https://arxiv.org/abs/1611.01600) [ICLR'17]

1. [Towards the Limit of Network Quantization](https://arxiv.org/abs/1612.01543) [ICLR'17]

1. [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953) [CVPR'17]

1. [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/abs/1706.02393) [arXiv'17]

1. [Training and Inference with Integers in Deep Neural Networks](https://openreview.net/forum?id=HJGXzmspb) [ICLR'18]

1. [Deep Learning with Limited Numerical Precision](https://arxiv.org/abs/1502.02551)[ICML'2015]

### Pruning

1. [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626) [NIPS'15]

2. [Pruning Filters for Efficient ConvNets](https://arxiv.org/abs/1608.08710) [ICLR'17]

3. [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) [ICLR'17]

4. [Soft Weight-Sharing for Neural Network Compression](https://arxiv.org/abs/1702.04008) [ICLR'17]

5. [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](https://arxiv.org/abs/1510.00149) [ICLR'16]

6. [Dynamic Network Surgery for Efficient DNNs](https://arxiv.org/abs/1608.04493) [NIPS'16]

7. [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/abs/1611.05128) [CVPR'17]

8. [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342) [ICCV'17]

9. [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878) [ICLR'18]

1. [Data-Driven Sparse Structure Selection for Deep Neural Networks](https://arxiv.org/pdf/1707.01213.pdf)

2. [Learning Structured Sparsity in Deep Neural Networks](https://arxiv.org/pdf/1608.03665.pdf)

3. [Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism](http://www-personal.umich.edu/~jiecaoyu/papers/jiecaoyu-isca17.pdf)

4. [Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing](http://www.cs.jhu.edu/~jason/papers/vieira+eisner.tacl17.pdf)

5. [Channel pruning for accelerating very deep neural networks](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Channel_Pruning_for_ICCV_2017_paper.pdf) [ICCV'17]

6. [Amc: Automl for model compression and acceleration on mobile devices](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yihui_He_AMC_Automated_Model_ECCV_2018_paper.pdf) [ECCV'18]

7. [RePr: Improved Training of Convolutional Filters](https://arxiv.org/pdf/1811.07275.pdf) [arXiv'18]

### Binarized neural network

1. [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/pdf/1602.02830.pdf)

2. [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/pdf/1603.05279.pdf)

3. [Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration](https://arxiv.org/pdf/1707.04693.pdf)

### Low-rank Approximation

1. [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](https://arxiv.org/abs/1411.4229) [CVPR'15]

2. [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/abs/1505.06798) (Extended version of above one)

3. [Convolutional neural networks with low-rank regularization](https://arxiv.org/abs/1511.06067) [arXiv'15]

4. [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](https://arxiv.org/abs/1404.0736) [NIPS'14]

5. [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](https://arxiv.org/abs/1511.06530) [ICLR'16]

6. [High performance ultra-low-precision convolutions on mobile devices](https://arxiv.org/abs/1712.02427) [NIPS'17]

7. [Speeding up convolutional neural networks with low rank expansions](http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf)

### Distilling

1. [Dark knowledge](http://www.ttic.edu/dl/dark14.pdf)

2. [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550.pdf)

3. [Net2net: Accelerating learning via knowledge transfer]()

4. [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)

5. [MobileID: Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11977)

6. [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220.pdf)

7. [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650.pdf)

8. [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928.pdf)

9. [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947.pdf)

1. [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219.pdf)

2. [Learning Efficient Object Detection Models with Knowledge Distillation](http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf)

3. [Data-Free Knowledge Distillation For Deep Neural Networks](https://arxiv.org/pdf/1710.07535.pdf)

4. [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513.pdf)

5. [Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks](https://arxiv.org/pdf/1710.09505.pdf)

6. [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613.pdf)

7. [Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification](https://arxiv.org/pdf/1709.02929.pdf)

### System

1. [DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications](https://www.sigmobile.org/mobisys/2017/accepted.php) [MobiSys '17]=

2. [DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware](http://fahim-kawsar.net/papers/Mathur.MobiSys2017-Camera.pdf) [MobiSys '17]

3. [MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU](https://arxiv.org/abs/1706.00878) [EMDL '17]

4. [DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices](http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=4278&context=sis_research) [WearSys '16]

5. [DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices](http://niclane.org/pubs/deepx_ipsn.pdf) [IPSN '16]

6. [EIE: Efficient Inference Engine on Compressed Deep Neural Network](https://arxiv.org/abs/1602.01528) [ISCA '16]

7. [MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints](http://haneul.github.io/papers/mcdnn.pdf) [MobiSys '16]

8. [DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit](http://niclane.org/pubs/dxtk_mobicase.pdf) [MobiCASE '16]

9. [Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables](http://niclane.org/pubs/sparsesep_sensys.pdf) [SenSys ’16]

1. [An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices](http://niclane.org/pubs/iotapp15_early.pdf) [IoT-App ’15]

2. [CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android](https://arxiv.org/abs/1511.07376) [MM '16]

3. [fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs](https://arxiv.org/abs/1711.08740) [NIPS '17]

### Some optimization techniques

1. 消灭重复计算

2. 展开循环

3. 利用SIMD指令

4. OpenMP

5. 定点化

6. 避免非连续内存读写

### References

- [Reading List](http://slazebni.cs.illinois.edu/spring17/reading_lists.html)

- [Reading List 2](https://github.com/jiecaoyu/reading_list)

- [Reading List 3](http://slazebni.cs.illinois.edu/spring17/cs598_topics.pdf)

- [Reading List 4](https://github.com/csarron/emdl)

- [Reading List 5](https://github.com/sun254/awesome-model-compression-and-acceleration)

* [纵览轻量化卷积神经网络：SqueezeNet、MobileNet、ShuffleNet、Xception](https://www.jiqizhixin.com/articles/2018-01-08-6)  

* [An Introduction to different Types of Convolutions in Deep Learning](https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d)  

* [CNN中千奇百怪的卷积方式大汇总](https://zhuanlan.zhihu.com/p/29367273)

https://github.com/chester256/Model-Compression-Papers

# Model-Compression-Papers

Papers for neural network compression and acceleration. Partly based on [link](https://github.com/memoiry/Awesome-model-compression-and-acceleration/blob/master/README.md).

### Survey

- [Recent Advances in Efficient Computation of Deep Convolutional Neural Networks](https://arxiv.org/pdf/1802.00939.pdf), [arxiv '18]

- [A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/abs/1710.09282) [arXiv '17]

### Quantization

- [The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning](https://arxiv.org/abs/1611.05402) [ICML'17]

- [Compressing Deep Convolutional Networks using Vector Quantization](https://arxiv.org/abs/1412.6115) [arXiv'14]

- [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473) [CVPR '16]

- [Fixed-Point Performance Analysis of Recurrent Neural Networks](https://arxiv.org/abs/1512.01322) [ICASSP'16]

- [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061) [arXiv'16]

- [Loss-aware Binarization of Deep Networks](https://arxiv.org/abs/1611.01600) [ICLR'17]

- [Towards the Limit of Network Quantization](https://arxiv.org/abs/1612.01543) [ICLR'17]

- [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953) [CVPR'17]

- [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/abs/1706.02393) [arXiv'17]

- [Training and Inference with Integers in Deep Neural Networks](https://openreview.net/forum?id=HJGXzmspb) [ICLR'18]

- [Deep Learning with Limited Numerical Precision](https://arxiv.org/abs/1502.02551)[ICML'2015]

- [Model compression via distillation and quantization](https://openreview.net/pdf?id=S1XolQbRW) [ICLR '18]

- [Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy](https://openreview.net/pdf?id=B1ae1lZRb) [ICLR '18]

- [On the Universal Approximability of Quantized ReLU Neural Networks](https://arxiv.org/pdf/1802.03646.pdf) [arXiv '18]

- [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/pdf/1712.05877.pdf) [CVPR '18]

### Pruning

- [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626) [NIPS'15]

- [Pruning Filters for Efficient ConvNets](https://arxiv.org/abs/1608.08710) [ICLR'17]

- [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) [ICLR'17]

- [Soft Weight-Sharing for Neural Network Compression](https://arxiv.org/abs/1702.04008) [ICLR'17]

- [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](https://arxiv.org/abs/1510.00149) [ICLR'16]

- [Dynamic Network Surgery for Efficient DNNs](https://arxiv.org/abs/1608.04493) [NIPS'16]

- [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/abs/1611.05128) [CVPR'17]

- [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342) [ICCV'17]

- [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878) [ICLR'18]

- [Data-Driven Sparse Structure Selection for Deep Neural Networks](https://arxiv.org/pdf/1707.01213.pdf) [arXiv '17]

- [Learning Structured Sparsity in Deep Neural Networks](https://arxiv.org/pdf/1608.03665.pdf) [NIPS '16]

- [Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism](http://www-personal.umich.edu/~jiecaoyu/papers/jiecaoyu-isca17.pdf) [ISCA '17]

- [Channel Pruning for Accelerating Very Deep Neural Networks](https://arxiv.org/pdf/1707.06168.pdf) [ICCV '17]

- [Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/pdf/1708.06519.pdf) [ICCV '17]

- [NISP: Pruning Networks using Neuron Importance Score Propagation](https://arxiv.org/pdf/1711.05908.pdf) [CVPR '18]

- [Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://openreview.net/pdf?id=HJ94fqApW) [ICLR '18]

- [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks](https://arxiv.org/pdf/1711.06798.pdf) [arXiv '17]

- [Efficient Sparse-Winograd Convolutional Neural Networks](https://openreview.net/pdf?id=r1rqJyHKg) [ICLR '18]

- [“Learning-Compression” Algorithms for Neural Net Pruning](http://faculty.ucmerced.edu/mcarreira-perpinan/papers/cvpr18.pdf) [CVPR '18]

### Binarized Neural Network

- [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/pdf/1602.02830.pdf) [NIPS '16]

- [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/pdf/1603.05279.pdf) [ECCV '16]

- [Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration](https://arxiv.org/pdf/1707.04693.pdf) [CVPR '17]

### Low-rank Approximation

- [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](https://arxiv.org/abs/1411.4229) [CVPR'15]

- [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/abs/1505.06798) (Extended version of above one)

- [Convolutional neural networks with low-rank regularization](https://arxiv.org/abs/1511.06067) [arXiv'15]

- [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](https://arxiv.org/abs/1404.0736) [NIPS'14]

- [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](https://arxiv.org/abs/1511.06530) [ICLR'16]

- [High performance ultra-low-precision convolutions on mobile devices](https://arxiv.org/abs/1712.02427) [NIPS'17]

- [Speeding up convolutional neural networks with low rank expansions](http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf)

- [Coordinating Filters for Faster Deep Neural Networks](https://arxiv.org/pdf/1703.09746.pdf) [ICCV '17]

### Knowledge Distillation

- [Dark knowledge](http://www.ttic.edu/dl/dark14.pdf)

- [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550.pdf) [ICLR '15]

- [Net2net: Accelerating learning via knowledge transfer](https://arxiv.org/pdf/1511.05641.pdf) [ICLR '16]

- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) [NIPS '15]

- [MobileID: Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11977) [AAAI '16]

- [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220.pdf) [arXiv '17]

- [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650.pdf) [arXiv '16]

- [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928.pdf) [ICLR '17]

- [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219.pdf) [arXiv '17]

- [Learning Efficient Object Detection Models with Knowledge Distillation](http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf) [NIPS '17]

- [Data-Free Knowledge Distillation For Deep Neural Networks](https://arxiv.org/pdf/1710.07535.pdf) [NIPS '17]

- [A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learnin](https://pdfs.semanticscholar.org/0410/659b6a311b281d10e0e44abce9b1c06be462.pdf) [CVPR '17]

- [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613.pdf) [arXiv '17]

- [Model compression via distillation and quantization](https://openreview.net/pdf?id=S1XolQbRW) [ICLR '18]

- [Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy](https://openreview.net/pdf?id=B1ae1lZRb) [ICLR '18]

### Miscellaneous

- [Beyond Filters: Compact Feature Map for Portable Deep Model](http://proceedings.mlr.press/v70/wang17m/wang17m.pdf) [ICML '17]

- [SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization](http://proceedings.mlr.press/v70/kim17b/kim17b.pdf) [ICML '17]

https://github.com/ZhishengWang/Embedded-Neural-Network

# **Papers Reading List.**

- This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspiled by [Neural-Networks-on-Silicon](https://github.com/fengbintu/Neural-Networks-on-Silicon/blob/master/README.md))

- Tutorials:

   - **Hardware Accelerator**: Efficient Processing of Deep Neural Networks. ([link](https://arxiv.org/abs/1703.09039))

   - **Model Compression**: Model Compression and Acceleration for Deep Neural Networks. ([link](https://arxiv.org/abs/1710.09282))

## **Table of Contents**

- [Our Contributions](#our-contributions)

- [Network Compression](#network-compression)

   - Parameter Sharing

   - Teacher-Student Mechanism (Distilling)

   - Fixed-precision training and storage

   - Sparsity regularizers & Pruning

   - Tensor Decomposition

   - Conditional (Adaptive) Computing

- [Hardware Accelerator](#hardware-accelerator)

   - Benchmark and Platform Analysis

   - Recurrent Neural Networks

- [Conference Papers](#conference-papers)

   - 2016: [NIPS](#nips-2016)

   - 2017: [ICASSP](#icassp-2017)、[CVPR](#cvpr-2017)、[ICML](#icml-2017)、[ICCV](#iccv-2017)、[NIPS](#nips-2017)

   - 2018：[ICLR](#iclr-2018)、[CVPR](#cvpr-2018)、[ECCV](#eccv-2018)、[ICML](#icml-2018)、[NIPS](#nips-2018)、[SysML](http://www.sysml.cc/2018/)

   - 2019：[ICLR](#iclr-2019)、[CVPR](#cvpr-2019)、[SysML](https://www.sysml.cc/)

##  **Our Contributions**

- **TODO**

##  **Network Compression**

> **This field is changing rapidly, belowing entries may be somewhat antiquated.**

### **Parameter Sharing**

- **structured matrices**

   - Structured Convolution Matrices for Energy-efficient Deep learning. (IBM Research–Almaden)

   - Structured Transforms for Small-Footprint Deep Learning. (Google Inc)

   - An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.

   - Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.

- **Hashing**

   - Functional Hashing for Compressing Neural Networks. (Baidu Inc)

   - Compressing Neural Networks with the Hashing Trick. (Washington University + NVIDIA)

- Learning compact recurrent neural networks. (University of Southern California + Google)

### **Teacher-Student Mechanism (Distilling)**

- Distilling the Knowledge in a Neural Network. (Google Inc)

- Sequence-Level Knowledge Distillation. (Harvard University)

- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. (TuSimple)

### **Fixed-precision training and storage**

- Binary/Ternary Neural Networks

   - XNOR-Net, Ternary Weight Networks (TWNs), Binary-net and their variants.

- Deep neural networks are robust to weight binarization and other non-linear distortions. (IBM Research–Almaden)

- Recurrent Neural Networks With Limited Numerical Precision. (ETH Zurich + Montréal@Yoshua Bengio)

- Neural Networks with Few Multiplications. (Montréal@Yoshua Bengio)

- 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. (Tsinghua University + Microsoft)

- Towards the Limit of Network Quantization. (Samsung US R&D Center)

- Incremental Network Quantization_Towards Lossless CNNs with Low-precision Weights. (Intel Labs China)

- Loss-aware Binarization of Deep Networks. (Hong Kong University of Science and Technology)

- Trained Ternary Quantization. (Tsinghua University + Stanford University + NVIDIA)

### **Sparsity regularizers & Pruning**

- Learning both Weights and Connections for Efficient Neural Networks. (SongHan, Stanford University)

- Deep Compression, EIE. (SongHan, Stanford University)

- Dynamic Network Surgery for Efficient DNNs. (Intel)

- Compression of Neural Machine Translation Models via Pruning. (Stanford University)

- Accelerating Deep Convolutional Networks using low-precision and sparsity. (Intel)

- Faster CNNs with Direct Sparse Convolutions and Guided Pruning. (Intel)

- Exploring Sparsity in Recurrent Neural Networks. (Baidu Research)

- Pruning Convolutional Neural Networks for Resource Efficient Inference. (NVIDIA)

- Pruning Filters for Efficient ConvNets. (University of Maryland + NEC Labs America)

- Soft Weight-Sharing for Neural Network Compression. (University of Amsterdam, [reddit discussion](https://www.reddit.com/r/MachineLearning/comments/5u7h3l/r_compressing_nn_with_shannons_blessing/))

- Sparsely-Connected Neural Networks_Towards Efficient VLSI Implementation of Deep Neural Networks. (McGill University)

- Training Compressed Fully-Connected Networks with a Density-Diversity Penalty. (University of Washington)

- **Bayesian Compression**

   - Bayesian Sparsification of Recurrent Neural Networks

   - Bayesian Compression for Deep Learning

   - Structured Bayesian Pruning via Log-Normal Multiplicative Noise

### **Tensor Decomposition**

- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. (Samsung, etc)

- Learning compact recurrent neural networks. (University of Southern California + Google)

- Tensorizing Neural Networks. (Skolkovo Institute of Science and Technology, etc)

- Ultimate tensorization_compressing convolutional and FC layers alike. (Moscow State University, etc)

- Efficient and Accurate Approximations of Nonlinear Convolutional Networks. (@CVPR2015)

- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. (New York University, etc.)

- Convolutional neural networks with low-rank regularization. (Princeton University, etc.)

- Learning with Tensors: Why Now and How? (Tensor-Learn Workshop @ NIPS'16)

##  **Conditional (Adaptive) Computing**

- Adaptive Computation Time for Recurrent Neural Networks. (Google DeepMind@Alex Graves)

- Variable Computation in Recurrent Neural Networks. (New York University + Facebook AI Research)

- Spatially Adaptive Computation Time for Residual Networks. ([github link](https://github.com/mfigurnov/sact), Google, etc.)

- Hierarchical Multiscale Recurrent Neural Networks. (Montréal)

- Outrageously Large Neural Networks_The Sparsely-Gated Mixture-of-Experts Layer. (Google Brain, etc.)

- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)

- Dynamic Deep Neural Networks_Optimizing Accuracy-Efficiency Trade-offs by Selective Execution. (University of Michigan)

- **Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation**. (@Yoshua Bengio)

- Multi-Scale Dense Convolutional Networks for Efficient Prediction. (Cornell University, etc)

## **Hardware Accelerator**

### **Benchmark and Platform Analysis**

- Fathom: Reference Workloads for Modern Deep Learning Methods. (Harvard University)

- DeepBench: Open-Source Tool for benchmarking DL operations. (svail.github.io-Baidu)

- BENCHIP: Benchmarking Intelligence Processors.

- [DAWNBench](https://dawn.cs.stanford.edu//benchmark/): An End-to-End Deep Learning Benchmark and Competition. (Stanford)

- [MLPerf](https://mlperf.org/): A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.

### **Recurrent Neural Networks**

- FPGA-based Low-power Speech Recognition with Recurrent Neural Networks. (Seoul National University)

- Accelerating Recurrent Neural Networks in Analytics Servers: Comparison of FPGA, CPU, GPU, and ASIC. (Intel)

- ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. (FPGA 2017, Best Paper Award)

- DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for GeneralPurpose Deep Neural Networks. (KAIST, ISSCC 2017)

- Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition. (University of Kaiserslautern, etc)

- Efficient Hardware Mapping of Long Short-Term Memory Neural Networks for Automatic Speech Recognition. (Master Thesis@Georgios N. Evangelopoulos)

- Hardware Accelerators for Recurrent Neural Networks on FPGA. (Purdue University, ISCAS 2017)

- Accelerating Recurrent Neural Networks: A Memory Efficient Approach. (Nanjing University)

- A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.

- An Energy-Efficient Reconfigurable Architecture for RNNs Using Dynamically Adaptive Approximate Computing.

- A Systolically Scalable Accelerator for Near-Sensor Recurrent Neural Network Inference.

- A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

- E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs (FPGA 2018, Peking Univ, Syracuse Univ, CUNY)

- DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. (FPGA 2018, ETHZ, BenevolentAI)

- Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs (MACRO 2018)

- E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs (HPCA 2019)

### **Convolutional Neural Networks**

- Please refer to  [Neural-Networks-on-Silicon](https://github.com/fengbintu/Neural-Networks-on-Silicon/blob/master/README.md)

## **Conference Papers**

### **NIPS 2016**

-  Dynamic Network Surgery for Efficient DNNs. (Intel Labs China)

-  Memory-Efficient Backpropagation Through Time. (Google DeepMind)

-  PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. (Moscow State University, etc.)

-  Learning Structured Sparsity in Deep Neural Networks. (University of Pittsburgh)

-  LightRNN: Memory and Computation-Efficient Recurrent Neural Networks. (Nanjing University + Microsoft Research)

### **ICASSP 2017**

-	lognet: energy-efficient neural networks using logarithmic computation. (Stanford University)

-	extended low rank plus diagonal adaptation for deep and recurrent neural networks. (Microsoft)

-	fixed-point optimization of deep neural networks with adaptive step size retraining. (Seoul National University)

-	implementation of efficient, low power deep neural networks on next-generation intel client platforms (Demos). (Intel)

-	knowledge distillation for small-footprint highway networks. (TTI-Chicago, etc)

-	automatic node selection for deep neural networks using group lasso regularization. (Doshisha University, etc)

-	accelerating deep convolutional networks using low-precision and sparsity. (Intel Labs)

### **CVPR 2017**

- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (MIT)

- Network Sketching: Exploiting Binary Structure in Deep CNNs. (Intel Labs China + Tsinghua University)

- Spatially Adaptive Computation Time for Residual Networks. (Google, etc)

- A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. (University of Pittsburgh, etc)

### **ICML 2017**

- Deep Tensor Convolution on Multicores. (MIT)

- Beyond Filters: Compact Feature Map for Portable Deep Model. (Peking University + University of Sydney)

- Combined Group and Exclusive Sparsity for Deep Neural Networks. (UNIST)

- Delta Networks for Optimized Recurrent Network Computation. (Institute of Neuroinformatics, etc)

- MEC: Memory-efficient Convolution for Deep Neural Network. (IBM Research)

- Deciding How to Decide: Dynamic Routing in Artificial Neural Networks. (California Institute of Technology)

- Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. (ETH Zurich, etc)

- Analytical Guarantees on Numerical Precision of Deep Neural Networks. (University of Illinois at Urbana-Champaign)

- Variational Dropout Sparsifies Deep Neural Networks. (Skoltech, etc)

- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)

- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank. (The City University of New York, etc)

### **ICCV 2017**

- Channel Pruning for Accelerating Very Deep Neural Networks. (Xi’an Jiaotong University + Megvii Inc.)

- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. (Nanjing University, etc)

- Learning Efficient Convolutional Networks through Network Slimming. (Intel Labs China, etc)

- Performance Guaranteed Network Acceleration via High-Order Residual Quantization. (Shanghai Jiao Tong University + Peking University)

- Coordinating Filters for Faster Deep Neural Networks. (University of Pittsburgh + Duke University, etc, [github link](https://github.com/wenwei202/caffe))

### **NIPS 2017**

- Towards Accurate Binary Convolutional Neural Network. (DJI)

- Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. (ETH Zurich)

- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. (Duke University, etc, [github link](https://github.com/wenwei202/terngrad))

- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. (Intel)

- Bayesian Compression for Deep Learning. (University of Amsterdam, etc)

- Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon. (Nanyang Technological Univ)

- Training Quantized Nets: A Deeper Understanding. (University of Maryland)

- Structured Bayesian Pruning via Log-Normal Multiplicative Noise. (Yandex, etc)

- Runtime Neural Pruning. (Tsinghua University)

- The Reversible Residual Network: Backpropagation Without Storing Activations. (University of Toronto, [gihub link](https://github.com/renmengye/revnet-public))

- Compression-aware Training of Deep Networks. (Toyota Research Institute + EPFL)

### **ICLR 2018**

- Oral

   - Training and Inference with Integers in Deep Neural Networks. (Tsinghua University)

- Poster

   - Learning Sparse NNs Through L0 Regularization

   - Learning Intrinsic Sparse Structures within Long Short-Term Memory

   - Variantional Network Quantization

   - Alternating Multi-BIT Quantization for Recurrent Neural Networks

   - Mixed Precision Training

   - Multi-Scale Dense Networks for Resource Efficient Image Classification

   - efficient sparse-winograd CNNs

   - Compressing Wrod Embedding via Deep Compositional Code Learning

   - Mixed Precision Training of Convolutional Neural Networks using Integer Operations

   - Adaptive Quantization of Neural Networks

   - Espresso_Efficient Forward Propagation for Binary Deep Neural Networks

   - WRPN_Wide Reduced-Precision Networks

   - Deep Rewiring_Training very sparse deep networks

   - Loss-aware Weight Quantization of Deep Network

   - Learning to share_simultaneous parameter tying and sparsification in deep learning

   - Deep Gradient Compression_Reducing the Communication Bandwidth for Distributed Training

   - Large scale distributed neural network training through online distillation

   - Learning Discrete Weights Using the Local Reparameterization Trick

   - Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

   - Training wide residual networks for deployment using a single bit for each weight

   - The High-Dimensional Geometry of Binary Neural Networks

- workshop

   - To Prune or Not to Prune_Exploring the Efficacy of Pruning for Model Compression

### **CVPR 2018**

- Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions

- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

- BlockDrop: Dynamic Inference Paths in Residual Networks

- SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks

- Two-Step Quantization for Low-Bit Neural Networks

- Towards Effective Low-Bitwidth Convolutional Neural Networks

- Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks

- CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization

- “Learning-Compression” Algorithms for Neural Net Pruning

- Wide Compression: Tensor Ring Nets

- NestedNet: Learning Nested Sparse Structures in Deep Neural Networks

- Interleaved Structured Sparse Convolutional Neural Networks

- NISP: Pruning Networks Using Neuron Importance Score Propagation

- Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition

- HydraNets: Specialized Dynamic Architectures for Efficient Inference

- Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks

### **ECCV 2018**

- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers

- **Learning Compression from Limited Unlabeled Data**

- **AMC: AutoML for Model Compression and Acceleration on Mobile Devices**

- Training Binary Weight Networks via Semi-Binary Decomposition

- Clustering Convolutional Kernels to Compress Deep Neural Networks

- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm

- Data-Driven Sparse Structure Selection for Deep Neural Networks

- Coreset-Based Neural Network Compression

- Convolutional Networks with Adaptive Inference Graphs

- Value-aware Quantization for Training and Inference of Neural Networks

- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

- Deep Expander Networks: Efficient Deep Networks from Graph Theory

- Extreme Network Compression via Filter Group Approximation

- Constraint-Aware Deep Neural Network Compression

### **ICML 2018**

- Compressing Neural Networks using the Variational Information Bottleneck

- DCFNet_Deep Neural Network with Decomposed Convolutional Filters

- Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

- Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

- High Performance Zero-Memory Overhead Direct Convolutions

- Kronecker Recurrent Units

- Learning Compact Neural Networks with Regularization

- StrassenNets_Deep Learning with a Multiplication Budge

- Weightless_Lossy weight encoding for deep neural network compression

- WSNet_Compact and Efficient Networks Through Weight Sampling

### **NIPS 2018**

- workshops

   - [Systems for ML and Open Source Software](http://learningsys.org/nips18/schedule.html)

   - [Compact Deep Neural Network Representation with Industrial Applications](https://openreview.net/group?id=NIPS.cc/2018/Workshop/CDNNRIA#accepted-papers)

   - [2nd Workshop on Machine Learning on the Phone and other Consumer Devices (MLPCD 2)](https://sites.google.com/view/nips-2018-on-device-ml/call-for-papers)

- 7761-scalable-methods-for-8-bit-training-of-neural-networks

- 7382-frequency-domain-dynamic-pruning-for-convolutional-neural-networks

- 7697-sparsified-sgd-with-memory

- 7994-training-deep-neural-networks-with-8-bit-floating-point-numbers

- 7358-kdgan-knowledge-distillation-with-generative-adversarial-networks

- 7980-knowledge-distillation-by-on-the-fly-native-ensemble

- 8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices

- 7553-moonshine-distilling-with-cheap-convolutions

- 7341-hitnet-hybrid-ternary-recurrent-neural-network

- 8116-fastgrnn-a-fast-accurate-stable-and-tiny-kilobyte-sized-gated-recurrent-neural-network

- 7327-training-dnns-with-hybrid-block-floating-point

- 8117-reversible-recurrent-neural-networks

- 485-norm-matters-efficient-and-accurate-normalization-schemes-in-deep-networks

- 8218-synaptic-strength-for-convolutional-neural-network

- 7666-tetris-tile-matching-the-tremendous-irregular-sparsity

- 7644-learning-sparse-neural-networks-via-sensitivity-driven-regularization

- 7466-pelee-a-real-time-object-detection-system-on-mobile-devices

- 7433-learning-versatile-filters-for-efficient-convolutional-neural-networks

- 7841-multi-task-zipping-via-layer-wise-neuron-sharing

- 7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication

- 7759-gradiveq-vector-quantization-for-bandwidth-efficient-gradient-aggregation-in-distributed-cnn-training

- 8191-atomo-communication-efficient-learning-via-atomic-sparsification

- 7405-gradient-sparsification-for-communication-efficient-distributed-optimization

### **ICLR 2019**

- Poster:

   - SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY

   - Rethinking the Value of Network Pruning

   - Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach

   - Dynamic Channel Pruning: Feature Boosting and Suppression

   - Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking

   - Slimmable Neural Networks

   - RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks

   - Dynamic Sparse Graph for Efficient Deep Learning

   - Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

   - Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds

   - Learning Recurrent Binary/Ternary Weights

   - Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network

   - Relaxed Quantization for Discretized Neural Networks

   - Integer Networks for Data Compression with Latent-Variable Models

   - Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

   - A Systematic Study of Binary Neural Networks' Optimisation

   - Analysis of Quantized Models

- Oral:

   - The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

### **CVPR 2019**

- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification

- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning

- T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor

- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks

- others to be added

https://github.com/cedrickchee/awesome-ml-model-compression

# Awesome ML Model Compression [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!

# Contents

- [Papers](#papers)

  - [General](#general)

  - [Architecture](#architecture)

  - [Quantization](#quantization)

  - [Binarization](#binarization)

  - [Pruning](#pruning)

  - [Distillation](#distillation)

  - [Low Rank Approximation](#low-rank-approximation)

- [Articles](#articles)

  - [Howtos](#howtos)

  - [Assorted](#assorted)

  - [Reference](#reference)

  - [Blogs](#blogs)

- [Tools](#tools)

  - [Libraries](#libraries)

  - [Frameworks](#frameworks)

- [Videos](#videos)

  - [Talks](#talks)

  - [Training & tutorials](#training--tutorials)

---

## Papers

### General

- [A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/abs/1710.09282)

- [Model compression as constrained optimization, with application to neural nets. Part I: general framework](https://arxiv.org/abs/1707.01209)

- [Model compression as constrained optimization, with application to neural nets. Part II: quantization](https://arxiv.org/abs/1707.04319)

### Architecture

- [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)

- [MobileNetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation](https://arxiv.org/abs/1801.04381)

- [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357)

- [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083)

- [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360)

- [Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video](https://arxiv.org/abs/1709.05943)

- [AddressNet: Shift-based Primitives for Efficient Convolutional Neural Networks](https://arxiv.org/abs/1809.08458)

- [ResNeXt: Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431)

- [ResBinNet: Residual Binary Neural Network](https://arxiv.org/abs/1711.01243)

- [Residual Attention Network for Image Classification](https://arxiv.org/abs/1704.06904)

- [Squeezedet: Uniﬁed, small, low power fully convolutional neural networks](https://arxiv.org/abs/1612.01051)

- [SEP-Nets: Small and Effective Pattern Networks](https://arxiv.org/abs/1706.03912)

- [Dynamic Capacity Networks](https://arxiv.org/abs/1511.07838)

- [Learning Infinite Layer Networks Without the Kernel Trick](https://arxiv.org/abs/1606.05316v2)

- [Efficient Sparse-Winograd Convolutional Neural Networks](https://openreview.net/pdf?id=r1rqJyHKg)

- [DSD: Dense-Sparse-Dense Training for Deep Neural Networks](https://openreview.net/pdf?id=HyoST_9xl)

- [Coordinating Filters for Faster Deep Neural Networks](https://arxiv.org/abs/1703.09746v3)

- [Deep Networks with Stochastic Depth](https://arxiv.org/abs/1603.09382)

### Quantization

- [Quantized Convolutional Neural Networks for Mobile Devices](https://arxiv.org/abs/1512.06473)

- [Towards the Limit of Network Quantization](https://arxiv.org/abs/1612.01543)

- [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061)

- [Compressing Deep Convolutional Networks using Vector Quantization](https://arxiv.org/abs/1412.6115)

- [Trained Ternary Quantization](https://arxiv.org/abs/1612.01064)

- [The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning](https://arxiv.org/abs/1611.05402)

- [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/abs/1706.02393)

- [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953)

- [Loss-aware Binarization of Deep Networks](https://arxiv.org/abs/1611.01600)

- [Quantize weights and activations in Recurrent Neural Networks](https://arxiv.org/abs/1611.10176)

- [Fixed-Point Performance Analysis of Recurrent Neural Networks](https://arxiv.org/abs/1512.01322)

### Binarization

- [Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration](https://arxiv.org/abs/1707.04693)

- [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830)

- [Local Binary Convolutional Neural Networks](https://arxiv.org/abs/1608.06049)

- [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/abs/1603.05279)

- [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160)

### Pruning

- [Faster CNNs with Direct Sparse Convolutions and Guided Pruning](https://arxiv.org/abs/1608.01409)

- [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](https://arxiv.org/abs/1510.00149)

- [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)

- [Pruning Filters for Efficient ConvNets](https://arxiv.org/abs/1608.08710)

- [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/abs/1611.05128)

- [Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing](http://www.cs.jhu.edu/~jason/papers/vieira+eisner.tacl17.pdf)

- [Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization](https://arxiv.org/abs/1707.09102)

- [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626)

- [ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression](https://arxiv.org/abs/1707.06342)

- [Data-Driven Sparse Structure Selection for Deep Neural Networks](https://arxiv.org/abs/1707.01213)

- [Soft Weight-Sharing for Neural Network Compression](https://arxiv.org/abs/1702.04008)

- [Dynamic Network Surgery for Efficient DNNs](https://arxiv.org/abs/1608.04493)

- [Channel pruning for accelerating very deep neural networks](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Channel_Pruning_for_ICCV_2017_paper.pdf)

- [AMC: AutoML for model compression and acceleration on mobile devices](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yihui_He_AMC_Automated_Model_ECCV_2018_paper.pdf)

- [ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA](https://arxiv.org/abs/1612.00694)

### Distillation

- [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)

- [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/abs/1610.09650)

- [Learning Efficient Object Detection Models with Knowledge Distillation](http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf)

- [Data-Free Knowledge Distillation For Deep Neural Networks](https://arxiv.org/abs/1710.07535)

- [Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks](https://arxiv.org/abs/1710.09505)

- [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/abs/1711.02613)

- [Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification](https://arxiv.org/abs/1709.02929)

- [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/abs/1707.01219)

- [Sequence-Level Knowledge Distillation](https://arxiv.org/abs/1606.07947)

- [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/abs/1709.00513)

- [Dark knowledge](http://www.ttic.edu/dl/dark14.pdf)

- [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/abs/1707.01220)

- [FitNets: Hints for Thin Deep Nets](https://arxiv.org/abs/1412.6550)

- [MobileID: Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11977)

- [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/abs/1612.03928)

### Low Rank Approximation

- [Speeding up convolutional neural networks with low rank expansions](http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14b/jaderberg14b.pdf)

- [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](https://arxiv.org/abs/1511.06530)

- [Convolutional neural networks with low-rank regularization](https://arxiv.org/abs/1511.06067)

- [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](https://arxiv.org/abs/1404.0736)

- [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/abs/1505.06798)

- [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](https://arxiv.org/abs/1411.4229)

## Articles

Content published on the Web.

### Howtos

- [How to Quantize Neural Networks with TensorFlow](https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/)

### Assorted

- [Why the Future of Machine Learning is Tiny](https://petewarden.com/2018/06/11/why-the-future-of-machine-learning-is-tiny/)

- [Deep Learning Model Compression for Image Analysis: Methods and Architectures](https://medium.com/comet-app/deep-learning-model-compression-for-image-analysis-methods-and-architectures-398f82b0c06f)

### Reference

### Blogs

- [TensorFlow Model Optimization Toolkit — Pruning API](https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-pruning-api-42cac9157a6a?linkId=67380711)

## Tools

  

### Libraries

- [TensorFlow Model Optimization Toolkit](https://github.com/tensorflow/model-optimization). Accompanied blog post, [TensorFlow Model Optimization Toolkit — Pruning API](https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-pruning-api-42cac9157a6a?linkId=67380711)

### Frameworks

## Videos

### Talks

### Training & tutorials

# License

[![CC0](http://i.creativecommons.org/p/zero/1.0/88x31.png)](http://creativecommons.org/publicdomain/zero/1.0/)

To the extent possible under law, [Cedric Chee](https://github.com/cedrickchee) has waived all copyright and related or neighboring rights to this work.

https://github.com/jnjaby/Model-Compression-Acceleration

# Model-Compression-Acceleration

# Papers

## Quantization

- Product Quantization for Nearest Neighbor Search,TPAMI,2011 [[paper]](https://hal.inria.fr/inria-00514462v2/document)

- Compressing Deep Convolutional Networks using Vector Quantization,ICLR,2015 [[paper]](https://arxiv.org/pdf/1412.6115.pdf)

- **Deep Learning with Limited Numerical Precision**, ICML, 2015 [[paper]](https://pdfs.semanticscholar.org/dec1/59bb0d83a506ec61fb8745388e585f48be44.pdf?_ga=2.16188209.660876135.1502713025-632431917.1498533020)

- Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1604.03168.pdf)

- Fixed Point Quantization of Deep Convolutional Networks, ICML, 2016 [[paper]](https://pdfs.semanticscholar.org/d88d/3d8450f2032b3a59d0006693381877bfc1da.pdf?_ga=2.82169745.660876135.1502713025-632431917.1498533020)

- Quantized Convolutional Neural Networks for Mobile Devices, CVPR, 2016 [[paper]](https://pdfs.semanticscholar.org/2353/28f8bc8b62e04918f9b4f6afe3c64cfdb63d.pdf?_ga=2.115328896.660876135.1502713025-632431917.1498533020)

- Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights, ICLR, 2017 [[paper]](https://arxiv.org/pdf/1702.03044.pdf)

- **BinaryConnect: Training Deep Neural Networks with binary weights during propagations**, NIPS, 2015 [[paper]](https://pdfs.semanticscholar.org/a573/3ff08daff727af834345b9cfff1d0aa109ec.pdf?_ga=2.32656573.200323026.1503209786-632431917.1498533020)

- **BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1**, ArXiV, 2016 [[paper]](https://arxiv.org/pdf/1602.02830.pdf)

- **XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks**, ECCV, 2016 [[paper]](https://pdfs.semanticscholar.org/9e56/cc1142e71fad78d1423791f99a5d2d2e61d7.pdf?_ga=2.259605641.200323026.1503209786-632431917.1498533020)

- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1609.07061.pdf)

- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1606.06160.pdf)

## Pruning

- **Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding**, ICLR, 2016 [[paper]](https://arxiv.org/pdf/1510.00149.pdf)

- Optimal Brain Damage, NIPS, 1990 [[paper]](https://pdfs.semanticscholar.org/17c0/a7de3c17d31f79589d245852b57d083d386e.pdf?_ga=2.267651469.200323026.1503209786-632431917.1498533020)

- Learning both Weights and Connections for Efficient Neural Network, NIPS, 2015 [[paper]](http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf)

- Pruning Filters for Efficient ConvNets, ICLR, 2017 [[paper]](https://arxiv.org/pdf/1608.08710.pdf)

- Sparsifying Neural Network Connections for Face Recognition, CVPR, 2016 [[paper]](https://pdfs.semanticscholar.org/d8e6/9677fe51836847f63e5ef84c8d3d68942d12.pdf?_ga=2.259031433.200323026.1503209786-632431917.1498533020)

- **Learning Structured Sparsity in Deep Neural Networks**, NIPS, 2016 [[paper]](https://pdfs.semanticscholar.org/35cd/36289610df4f221c309c4420036771fcb274.pdf?_ga=2.34365986.200323026.1503209786-632431917.1498533020)

- Pruning Convolutional Neural Networks for Resource Efficient Inference, ICLR, 2017 [[paper]](https://pdfs.semanticscholar.org/941b/12c4ad0e95faaea386626ee77c3a68792763.pdf?_ga=2.202066322.200323026.1503209786-632431917.1498533020)

## Knowledge Distallation

- **Distilling the Knowledge in a Neural Network**, ArXiv, 2015 [[paper]](https://arxiv.org/pdf/1503.02531.pdf)

- FitNets: Hints for Thin Deep Nets, ICLR, 2015 [[paper]](https://arxiv.org/pdf/1412.6550.pdf)

- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR, 2017 [[paper]](https://arxiv.org/pdf/1612.03928.pdf)

- Face Model Compression by Distilling Knowledge from Neurons, AAAI, 2016 [[paper]](http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/11977/12130)

- In Teacher We Trust: Learning Compressed Models for Pedestrian Detection, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1612.00478.pdf)

- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, ArXiv, 2017 [[paper]](https://arxiv.org/pdf/1707.01219.pdf)

## Network Architecture

- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1602.07360.pdf)

- Convolutional Neural Networks at Constrained Time Cost, CVPR, 2015 [[paper]](https://pdfs.semanticscholar.org/9a1b/08883a74b25f35f1df9553718899e2bdb944.pdf?_ga=2.268584178.200323026.1503209786-632431917.1498533020)

- Flattened Convolutional Neural Networks for Feedforward Acceleration, ArXiv, 2014 [[paper]](https://arxiv.org/pdf/1412.5474.pdf)

- Going deeper with convolutions, CVPR, 2015 [[paper]](https://arxiv.org/pdf/1409.4842.pdf)

- Rethinking the Inception Architecture for Computer Vision, CVPR, 2016 [[paper]](https://arxiv.org/pdf/1512.00567.pdf)

- Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure, ArXiv, 2016 [[paper]](https://arxiv.org/pdf/1608.04337.pdf)

- **Xception: Deep Learning with Depthwise Separable Convolutions**, ArXiv, 2017 [[paper]](https://arxiv.org/pdf/1610.02357.pdf)

- **MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications**, ArXiv, 2017 [[paper]](https://arxiv.org/pdf/1704.04861.pdf)

- **ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices**, ArXiv, 2017 [[paper]](https://arxiv.org/pdf/1707.01083.pdf)

## Matrix Factorization(Low-rank Approximation)

- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation, NIPS,2014 [[paper]](https://pdfs.semanticscholar.org/e5ae/8ab688051931b4814f6d32b18391f8d1fa8d.pdf?_ga=2.149358257.660876135.1502713025-632431917.1498533020)

- Speeding up Convolutional Neural Networks with Low Rank Expansions, BMVC, 2014 [[paper]](https://pdfs.semanticscholar.org/d1a8/f0d257d434add438867ffeca4f2a4b40e5ae.pdf?_ga=2.10872371.660876135.1502713025-632431917.1498533020)

- Deep Fried Convnets, ICCV, 2015 [[paper]](https://pdfs.semanticscholar.org/27a9/9c21a1324f087b2f144adc119f04137dfd87.pdf?_ga=2.269034738.200323026.1503209786-632431917.1498533020)

- **Accelerating Very Deep Convolutional Networks for Classification and Detection**, TPAMI, 2016 [[paper]](https://pdfs.semanticscholar.org/3259/b108d516f4700411f92e574a0f944462f0bc.pdf?_ga=2.215762068.200323026.1503209786-632431917.1498533020)

- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition, ICLR, 2015 [[paper]](https://arxiv.org/pdf/1412.6553.pdf)

https://github.com/mapleam/model-compression-and-acceleration-4-DNN  （进去看）

https://github.com/he-y/Awesome-Pruning 剪枝集大成
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fangvv/awesome-edge-intelligence-collections

Awesome Lists containing this project

README