Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-model-compression
papers about model compression
https://github.com/ChanChiChoi/awesome-model-compression
Last synced: 4 days ago
JSON representation
-
PAPERS
-
2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
- A simple neural network pruning algorithm with application to filter synthesis - 53.. 2001
-
2017
- The power of sparsity in convolutional neural networks
- Fixed-point optimization of deep neural networks with adaptive step size retraining - 1207.
- Low-precision batch-normalized activations
- Enabling sparse winograd convolution by native pruning
- Sharesnet: reducing residual network parameter number by sharing weights
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
- Coordinating Filters for Faster Deep Neural Networks
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Residual Attention Network for Image Classification
- SEP-Nets: Small and Effective Pattern Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- Ternary neural networks with fine-grained quantization
- Hardwaresoftware codesign of accurate, multiplier-free deep neural networks
- Exploring the regularity of sparse structure in convolutional neural networks
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Data-Driven Sparse Structure Selection for Deep Neural Networks - structure-selection](https://github.com/TuSimple/sparse-structure-selection)】
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Channel Pruning for Accelerating Very Deep Neural Networks - he/channel-pruning](https://github.com/yihui-he/channel-pruning);[Eric-mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning/tree/master/imagenet)】
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression - mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning/tree/master/imagenet)】
- Learning Transferable Architectures for Scalable Image Recognition
- Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Scnn: An accelerator for compressed-sparse convolutional neural networks
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
- Learning Efficient Convolutional Networks through Network Slimming - mingjie/network-slimming](https://github.com/Eric-mingjie/network-slimming)】
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- WRPN: wide reduced-precision networks
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- N2n learning: Network to network compression via policy gradient reinforcement learning
- To prune, or not to prune: exploring the efficacy of pruning for model compression
- Data-Free Knowledge Distillation for Deep Neural Networks
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Knowledge Projection for Deep Neural Networks
- ReBNet: Residual Binarized Neural Network
- Moonshine: Distilling with Cheap Convolutions
- Weightless: Lossy weight encoding for deep neural network compression
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
- NISP: Pruning Networks using Neuron Importance Score Propagation
- MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks - research/morph-net](https://github.com/google-research/morph-net)】
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions
- High performance ultra-low-precision convolutions on mobile devices
- StrassenNets: Deep learning with a multiplication budget
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- Bmxnet: An open-source binary neural network implementation based on mxnet - 1212.<br>【code:[hpi-xnor/BMXNet](https://github.com/hpi-xnor/BMXNet)】
- Two-bit networks for deep learning on resource-constrained embedded devices
- Quicknet: Maximizing efficiency and efficacy in deep architectures
- Compression of deep neural networks for image instance retrieval - 309.
- Variational dropout sparsifies deep neural networks - Volume 70. JMLR. org, 2017: 2498-2507.<br>【code:[ars-ashuha/variational-dropout-sparsifies-dnn](https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn)】
- Cp-decomposition with tensor power method for convolutional neural networks compression - 118.
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- Soft Weight-Sharing for Neural Network Compression
- Efficient methods and hardware for deep learning
- On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an fpga - 105.
- FP-BNN: Binarized neural network on FPGA - 1086.
- Towards accurate binary convolutional neural network - 353.
- Centered weight normalization in accelerating training of deep neural networks - 2811.
- Weighted-entropy-based quantization for deep neural networks - 5464.<br>【code:[EunhyeokPark/script_for_WQ](https://github.com/EunhyeokPark/script_for_WQ)】
- - 5963.
- Time: A training-in-memory architecture for memristor-based deep neural networks
- In-datacenter performance analysis of a tensor processing unit
- Can fpgas beat gpus in accelerating nextgeneration deep neural networks? - Programmable Gate Arrays, FPGA ’17, 2017.
- Fixed-point factorized networks
- Automated systolic array architecture synthesis for high throughput cnn inference on fpgas
- How to train a compact binary neural network with high accuracy?
- Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on fpgas
- Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks - State Circuits, 52(1):127–138, 2017.
- Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks - Programmable Gate Arrays, FPGA ’17, 2017.
- Escher: A cnn accelerator with flexible buffering to minimize off-chip transfer - Programmable Custom Computing Machines, 2017.
- Local Affine Approximators for Improving Knowledge Transfer
- Data-Free Knowledge Distillation For Deep Neural Networks
- Beyond Filters: Compact Feature Map for Portable Deep Model - Volume 70. JMLR. org, 2017: 3703-3711.
- SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization - Volume 70. JMLR. org, 2017: 1866-1874.
- Structured bayesian pruning via log-normal multiplicative noise - 6784.<br>【code;[necludov/group-sparsity-sbp](https://github.com/necludov/group-sparsity-sbp)】
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware - 81.
- Two-bit networks for deep learning on resource-constrained embedded devices
- Quicknet: Maximizing efficiency and efficacy in deep architectures
- The incredible shrinking neural network: New perspectives on learning representations through the lens of pruning
- Compression of deep neural networks for image instance retrieval - 309.
- Variational dropout sparsifies deep neural networks - Volume 70. JMLR. org, 2017: 2498-2507.<br>【code:[ars-ashuha/variational-dropout-sparsifies-dnn](https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn)】
- Cp-decomposition with tensor power method for convolutional neural networks compression - 118.
- Incremental network quantization: Towards lossless cnns with low-precision weights.
- The power of sparsity in convolutional neural networks
- Fixed-point optimization of deep neural networks with adaptive step size retraining - 1207.
- Low-precision batch-normalized activations
- Enabling sparse winograd convolution by native pruning
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications - 95.
- Sharesnet: reducing residual network parameter number by sharing weights
- Learning from Noisy Labels with Distillation - Jia Li, 2017
- Efficient processing of deep neural networks: A tutorial and survey - 2329.
- Sphereface: Deep hypersphere embedding for face recognition - 220.<br>【code;[isthatyoung/Sphereface-prune](https://github.com/isthatyoung/Sphereface-prune)】
- Bayesian compression for deep learning - 3298.
- Hardwaresoftware codesign of accurate, multiplier-free deep neural networks
- Learning to prune deep neural networks via layer-wise optimal brain surgeon - 4867.<br>【code:[csyhhu/L-OBS](https://github.com/csyhhu/L-OBS)】
- Exploring the regularity of sparse structure in convolutional neural networks
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
- Deep mutual learning - 4328.
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Interleaved Group Convolutions for Deep Neural Networks
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
- Domain-adaptive deep network compression - 4297.
- WRPN: wide reduced-precision networks
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- N2n learning: Network to network compression via policy gradient reinforcement learning
- To prune, or not to prune: exploring the efficacy of pruning for model compression
- ReBNet: Residual Binarized Neural Network
- Weightless: Lossy weight encoding for deep neural network compression
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression - mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning/tree/master/imagenet)】
- Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability
- Extremely low bit neural network: Squeeze the last bit out with admm - Second AAAI Conference on Artificial Intelligence. 2018.
- Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net
- Scnn: An accelerator for compressed-sparse convolutional neural networks
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
- Revisiting knowledge transfer for training object class detectors
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions
- Adaptive quantization for deep neural network - Second AAAI Conference on Artificial Intelligence. 2018.
- Learning Sparse Neural Networks through L0 Regularization
- Deep gradient compression: Reducing the communication bandwidth for distributed training
- High performance ultra-low-precision convolutions on mobile devices
- Adacomp: Adaptive residual gradient compression for data-parallel distributed training - Second AAAI Conference on Artificial Intelligence. 2018.
- StrassenNets: Deep learning with a multiplication budget
- Data Distillation: Towards Omni-Supervised Learning
- Learning compact recurrent neural networks with block-term tensor decomposition - 9387.
- Quicknet: Maximizing efficiency and efficacy in deep architectures
- On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an fpga - 105.
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Incremental network quantization: Towards lossless cnns with low-precision weights.
- Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net
- Revisiting knowledge transfer for training object class detectors
- Deep gradient compression: Reducing the communication bandwidth for distributed training
- Data Distillation: Towards Omni-Supervised Learning
-
2018
- Transparent Model Distillation
- Kernel Distillation for Gaussian Processes
- Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
- Model compression for faster structural separation of macromolecules captured by Cellular Electron Cryo-Tomography
- Alternating Multi-bit Quantization for Recurrent Neural Networks
- Universal Deep Neural Network Compression
- Effective Quantization Approaches for Recurrent Neural Networks
- Efficient Neural Architecture Search via Parameters Sharing
- On the Universal Approximability of Quantized ReLU Neural Networks
- Fd-mobilenet: Improved mobilenet with a fast downsampling strategy - 1367.
- Training and Inference with Integers in Deep Neural Networks
- Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression
- MobileNetV2: Inverted Residuals and Linear Bottlenecks
- Overpruning in Variational Bayesian Neural Networks
- Learning to Prune Filters in Convolutional Neural Networks
- Exploring hidden dimensions in parallelizing convolutional neural networks
- Paraphrasing Complex Network: Network Compression via Factor Transfer
- Stronger generalization bounds for deep nets via a compression approach
- Model compression via distillation and quantization
- Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
- On the optimization of deep networks: Implicit acceleration by overparameterization
- Distilling Knowledge Using Parallel Data for Far-field Speech Recognition
- DeepThin: A Self-Compressing Library for Deep Neural Networks
- Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach
- Training wide residual networks for deployment using a single bit for each weight - wide-resnet](https://github.com/szagoruyko/binary-wide-resnet)】
- Loss-aware Weight Quantization of Deep Networks
- Wide Compression: Tensor Ring Nets
- Compressing Neural Networks using the Variational Information Bottleneck
- Tensor Decomposition for Compressing Recurrent Neural Network
- The Lottery Ticket Hypothesis: Training Pruned Neural Networks - research/lottery-ticket-hypothesis](https://github.com/google-research/lottery-ticket-hypothesis)】
- FeTa: A DCA Pruning Algorithm with Generalization Error Guarantees
- Deep Co-Training for Semi-Supervised Image Recognition
- Exploring Linear Relationship in Feature Map Subspace for ConvNets Compression
- Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
- A Quantization-Friendly Separable Convolution for MobileNets
- Iterative Low-Rank Approximation for CNN Compression
- Context-aware Deep Feature Compression for High-speed Visual Tracking
- Adversarial Network Compression
- Distribution-Aware Binarization of Neural Networks for Sketch Recognition
- Compressibility and Generalization in Large-Scale Deep Learning
- Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving - 2384.
- Value-aware Quantization for Training and Inference of Neural Networks
- Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
- Structured Deep Neural Network Pruning by Varying Regularization Parameters
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers - pruning](https://github.com/KaiqiZhang/admm-pruning)】
- Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
- Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs
- Accelerator-Aware Pruning for Convolutional Neural Networks
- UNIQ: Uniform Noise Injection for the Quantization of Neural Networks
- Accelerating Neural Transformer via an Average Attention Network
- Enhancing the Regularization Effect of Weight Pruning in Artificial Neural Networks
- Quantization Mimic: Towards Very Tiny CNN for Object Detection
- Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks
- PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
- Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
- Pact: Parameterized clipping activation for quantized neural networks
- Recurrent knowledge distillation
- Neural Network Compression using Transform Coding and Clustering
- Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
- SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks
- AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference
- Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region
- Tensorized Spectrum Preserving Compression for Neural Networks
- Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
- Understanding generalization and optimization performance of deep CNNs
- Convolutional neural network compression for natural language processing
- Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation
- Distilling Knowledge for Search-based Structured Prediction
- Retraining-Based Iterative Weight Quantization for Deep Neural Networks
- A novel channel pruning method for deep neural network compression
- Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication
- Grow and Prune Compact, Fast, and Accurate LSTMs
- MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression
- Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices
- EasyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- Knowledge Distillation by On-the-Fly Native Ensemble
- Ensemble Pruning based on Objection Maximization with a General Distributed Framework
- SCSP: Spectral Clustering Filter Pruning with Soft Self-adaption Manners
- Scalable Neural Network Compression and Pruning Using Hard Clustering and L1 Regularization
- PCAS: Pruning Channels with Attention Statistics
- RAPIDNN: In-Memory Deep Neural Network Acceleration Framework
- Efficient Sparse-Winograd Convolutional Neural Networks
- Fast Convex Pruning of Deep Neural Networks
- DropBack: Continuous Pruning During Training
- Quantizing deep convolutional networks for efficient inference: A whitepaper
- Adversarial distillation of bayesian neural network posteriors
- SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
- Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays
- FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design
- OCTen: Online Compression-based Tensor Decomposition
- Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
- On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
- Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure
- Recent Advances in Convolutional Neural Network Acceleration
- PCNNA: A Photonic Convolutional Neural Network Accelerator
- Coreset-Based Neural Network Compression - smiles/CNN_Compression](https://github.com/metro-smiles/CNN_Compression)】
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks - Nets](https://github.com/microsoft/LQ-Nets)】
- Aggregated Learning: A Vector Quantization Approach to Learning with Neural Networks
- Principal Filter Analysis for Guided Network Compression
- An Acceleration Scheme for Memory Limited, Streaming PCA
- Self-supervised Knowledge Distillation Using Singular Value Decomposition
- Statistical Model Compression for Small-Footprint Natural Language Understanding
- Optimize deep convolutional neural network with ternarized weights and high accuracy - 921.
- FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software
- Crossbar-aware neural network pruning
- ADAM-ADMM: A Unified, Systematic Framework of Structured Weight Pruning for DNNs
- t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data
- SlimNets: An Exploration of Deep Model Compression and Acceleration
- Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks
- DNN Feature Map Compression using Learned Representation over GF(2)
- Joint Training of Low-Precision Neural Network with Quantization Interval Parameters
- Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks - y/soft-filter-pruning](https://github.com/he-y/soft-filter-pruning)】
- Progressive Deep Neural Networks Acceleration via Soft Filter Pruning
- An Overview of Datatype Quantization Techniques for Convolutional Neural Networks
- Approximation Trees: Statistical Stability in Model Distillation
- Spectral-Pruning: Compressing deep neural network via spectral analysis
- Accelerating Deep Neural Networks with Spatial Bottleneck Modules
- An FPGA-Accelerated Design for Deep Learning Pedestrian Detection in Self-Driving Vehicles
- Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
- SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs
- Shift-based Primitives for Efficient Convolutional Neural Networks
- Low Precision Policy Distillation with Application to Low-Power, Real-time Sensation-Cognition-Action Loop with Neuromorphic Computing
- Adaptive Pruning of Neural Language Models for Mobile Devices
- GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
- NICE: Noise Injection and Clamping Estimation for Neural Network Quantization
- Pruned and Structurally Sparse Neural Networks
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
- Dynamic sparse graph for efficient deep learning - sparse-graph](https://github.com/mtcrawshaw/dynamic-sparse-graph)】
- Proxquant: Quantized neural networks via proximal operators
- Relaxed Quantization for Discretized Neural Networks
- Progressive Weight Pruning of Deep Neural Networks using ADMM
- Pruning Deep Neural Networks using Partial Least Squares
- CNN inference acceleration using dictionary of centroids
- To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference
- Convolutional Neural Network Pruning to Accelerate Membrane Segmentation in Electron Microscopy
- Differentiable Fine-grained Quantization for Deep Neural Network Compression
- HAKD: Hardware Aware Knowledge Distillation
- Towards fast and energy-efficient binarized neural network inference on fpga
- Learning Compressed Transforms with Low Displacement Rank
- SNIP: Single-shot Network Pruning based on Connection Sensitivity - public](https://github.com/namhoonlee/snip-public)】
- Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
- Deep Neural Network Compression for Aircraft Collision Avoidance Systems
- Pruning neural networks: is it time to nip it in the bud?
- Rethinking the Value of Network Pruning - mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning)】
- Dynamic Channel Pruning: Feature Boosting and Suppression - fry/mayo](https://github.com/deep-fry/mayo)】
- FPGA-based Acceleration System for Visual Tracking
- Quantization for Rapid Deployment of Deep Neural Networks
- ACIQ: Analytical Clipping for Integer Quantization of neural networks
- Rate Distortion For Model Compression: From Theory To Practice
- Interpretable Convolutional Filter Pruning
- Bayesian Compression for Natural Language Processing
- Lossless (and Lossy) Compression of Random Forests
- Discrimination-aware Channel Pruning for Deep Neural Networks - AILab/DCP](https://github.com/SCUT-AILab/DCP)】
- DeepTwist: Learning Model Compression via Occasional Weight Distortion
- A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
- Convolutional Neural Network Quantization using Generalized Gamma Distribution
- Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration - y/filter-pruning-geometric-median](https://github.com/he-y/filter-pruning-geometric-median)】
- Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices
- Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
- ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks
- A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM
- Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators
- Demystifying Neural Network Filter Pruning
- Distilling Critical Paths in Convolutional Neural Networks
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
- GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
- Deep Compression of Sum-Product Networks on Tensor Networks
- Leveraging Filter Correlations for Deep Model Compression
- Effective, Fast, and Memory-Efficient Compressed Multi-function Convolutional Neural Networks for More Accurate Medical Image Classification
- A Framework for Fast and Efficient Neural Network Compression - Kim/ENC](https://github.com/Hyeji-Kim/ENC)】
- Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
- Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition
- Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks
- Private Model Compression via Knowledge Distillation
- Iteratively Training Look-Up Tables for Network Quantization
- QUENN: QUantization Engine for low-power Neural Networks
- Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators
- RePr: Improved Training of Convolutional Filters
- The core consistency of a compressed tensor
- Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
- Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method
- Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
- Accelerating the Evolution of Convolutional Neural Networks with Node-Level Mutations and Epigenetic Weight Initialization
- Stability Based Filter Pruning for Accelerating Deep CNNs
- Multi-layer Pruning Framework for Compressing Single Shot MultiBox Detector
- Structured Pruning for Efficient ConvNets via Incremental Regularization
- Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks
- Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
- HAQ: Hardware-Aware Automated Quantization
- Joint Neural Architecture Search and Quantization
- Dataset Distillation
- Snapshot Distillation: Teacher-Student Optimization in One Generation
- Proxylessnas: Direct neural architecture search on target task and hardware
- Network Compression via Recursive Bayesian Pruning
- Knowledge Distillation with Feature Maps for Image Classification
- Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud
- Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling
- Structure Learning Using Forced Pruning
- Prototype-based Neural Network Layers: Incorporating Vector Quantization
- Training for 'Unstable' CNN Accelerator:A Case Study on FPGA
- ECC: Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
- Knowledge Distillation from Few Samples
- DropPruning for Model Compression
- Model Compression with Generative Adversarial Networks
- DNQ: Dynamic Network Quantization
- Trained Rank Pruning for Efficient Deep Neural Networks
- Optimizing Speed/Accuracy Trade-Off for Person Re-identification via Knowledge Distillation
- Reliable Identification of Redundant Kernels for Convolutional Neural Network Compression
- Spatial Knowledge Distillation to aid Visual Reasoning
- Exploiting Processing in Non-Volatile Memory for Binary Neural Network Accelerators
- Proximal Mean-field for Neural Network Quantization
- Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
- A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks
- Channel-wise pruning of neural networks with tapering resource constraint
- Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
- Fast Adjustable Threshold For Uniform Neural Network Quantization
- DAC: Data-free Automatic Acceleration of Convolutional Networks
- Slimmable neural networks
- Precision Highway for Ultra Low-Precision Quantization
- Dynamic Runtime Feature Map Pruning
- BNN+: Improved binary network training
- Studying the Plasticity in Deep Convolutional Neural Networks using Random Pruning
- Improving the Interpretability of Deep Neural Networks with Knowledge Distillation
- Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks
- Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
- Algorithms for speeding up convolutional neural networks
- Model compression and acceleration for deep neural networks: The principles, progress, and challenges - 136.
- Shift: A zero flop, zero parameter alternative to spatial convolutions - 9135.<br>【code:[alvinwan/shiftresnet-cifar](https://github.com/alvinwan/shiftresnet-cifar)】
- Extremely low bit neural network: Squeeze the last bit out with admm - Second AAAI Conference on Artificial Intelligence. 2018.
- Learning a wavelet-like auto-encoder to accelerate deep neural networks - Second AAAI Conference on Artificial Intelligence. 2018.
- KDGAN: knowledge distillation with generative adversarial networks - 786.
- Designing by training: acceleration neural network for fast high-dimensional convolution - 1475.
- Saas: Speed as a supervisor for semi-supervised learning - 163.
- An efficient pruning algorithm for robust isotonic regression - 229.
- pruning in training: learning and ranking sparse connections in deep convolutional networks
- pruning with hints: an efficient framework for model acceleration
- Clip-q: Deep network compression learning by in-parallel pruning-quantization - 7882.
- MLPrune: Multi-Layer Pruning for Automated Neural Network Compression
- Clustering convolutional kernels to compress deep neural networks - 232.
- Product quantization network for fast image retrieval - 201.
- Variational network quantization
- LSQ++: Lower running time and higher recall in multi-codebook quantization - 506.
- Explicit loss-error-aware quantization for low-bit deep neural networks - 9435.
- Adaptive Sample-space & Adaptive Probability coding: a neural-network based approach for compression
- A Biresolution Spectral Framework for Product Quantization - 3338.
- Adaptive quantization for deep neural network - Second AAAI Conference on Artificial Intelligence. 2018.
- Deeprebirth: Accelerating deep neural network execution on mobile devices - Second AAAI Conference on Artificial Intelligence. 2018.
- True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization - 2350.
- deep-trim: revisiting l1 regularization for connection pruning of deep network
- Learning to Search Efficient DenseNet with Layer-wise Pruning
- Mean Replacement Pruning
- In search of theoretically grounded pruning
- Constraint-aware deep neural network compression - 415.<br>【code:[ChanganVR/ConstraintAwareCompression](https://github.com/ChanganVR/ConstraintAwareCompression)】
- “Learning-Compression” Algorithms for Neural Net Pruning
- Integral Pruning on Activations and Weights for Efficient Neural Networks
- Exploiting Invariant Structures for Compression in Neural Networks
- FEED: Feature-level Ensemble Effect for knowledge Distillation
- knowledge distill via learning neuron manifold
- Exploration by random distillation
- Network compression using correlation analysis of layer responses
- Architecture Compression
- What Information Does a ResNet Compress?
- representation compression and generalization in deep neural networks
- N-Ary Quantization for CNN Model Compression and Inference Acceleration
- Deepsearch: A fast image search framework for mobile devices
- Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression
- Faster gaze prediction with dense networks and Fisher pruning
- Overpruning in Variational Bayesian Neural Networks
- Learning to Prune Filters in Convolutional Neural Networks
- Kernel Distillation for Gaussian Processes
- Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
- Model compression for faster structural separation of macromolecules captured by Cellular Electron Cryo-Tomography
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers - willturner/batchnorm-pruning](https://github.com/jack-willturner/batchnorm-pruning)】
- Alternating Multi-bit Quantization for Recurrent Neural Networks
- Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning
- Effective Quantization Approaches for Recurrent Neural Networks
- Universal Deep Neural Network Compression
- Fd-mobilenet: Improved mobilenet with a fast downsampling strategy - 1367.
- ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators
- Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator
- Training and Inference with Integers in Deep Neural Networks
- Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept
- Exploring hidden dimensions in parallelizing convolutional neural networks
- Paraphrasing Complex Network: Network Compression via Factor Transfer
- Security Analysis and Enhancement of Model Compressed Deep Learning Systems under Adversarial Attacks
- Stronger generalization bounds for deep nets via a compression approach
- Model compression via distillation and quantization
- Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
- On the optimization of deep networks: Implicit acceleration by overparameterization
- Distilling Knowledge Using Parallel Data for Far-field Speech Recognition
- DeepThin: A Self-Compressing Library for Deep Neural Networks
- Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach
- Building Efficient ConvNets using Redundant Feature Pruning
- Training wide residual networks for deployment using a single bit for each weight - wide-resnet](https://github.com/szagoruyko/binary-wide-resnet)】
- Loss-aware Weight Quantization of Deep Networks
- Wide Compression: Tensor Ring Nets
- PBGen: Partial Binarization of Deconvolution-Based Generators for Edge Intelligence
- Compressing Neural Networks using the Variational Information Bottleneck
- Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
- Compressibility and Generalization in Large-Scale Deep Learning
- Interleaved structured sparse convolutional neural networks - 8856.
- Tensor Decomposition for Compressing Recurrent Neural Network
- Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
- Deep Neural Network Compression with Single and Multiple Level Quantization
- Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving - 2384.
- FeTa: A DCA Pruning Algorithm with Generalization Error Guarantees
- Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling - Pui Chau, Gang Wang, 2018
- Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation
- Exploring Linear Relationship in Feature Map Subspace for ConvNets Compression
- Value-aware Quantization for Training and Inference of Neural Networks
- Communication compression for decentralized training - 7662.
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
- Fisher Pruning of Deep Nets for Facial Trait Classification
- A Quantization-Friendly Separable Convolution for MobileNets
- Iterative Low-Rank Approximation for CNN Compression
- Fast and Accurate Single Image Super-Resolution via Information Distillation Network
- Context-aware Deep Feature Compression for High-speed Visual Tracking
- Squeezenext: Hardware-aware neural network design - 1647.
- Adversarial Network Compression
- Distribution-Aware Binarization of Neural Networks for Sketch Recognition
- Large scale distributed neural network training through online distillation
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers - pruning](https://github.com/KaiqiZhang/admm-pruning)】
- Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
- Structured Deep Neural Network Pruning by Varying Regularization Parameters
- Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs
- Accelerator-Aware Pruning for Convolutional Neural Networks
- Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
- UNIQ: Uniform Noise Injection for the Quantization of Neural Networks
- Neural Compatibility Modeling with Attentive Knowledge Distillation
- Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications
- Accelerating Neural Transformer via an Average Attention Network
- Enhancing the Regularization Effect of Weight Pruning in Artificial Neural Networks
- Quantization Mimic: Towards Very Tiny CNN for Object Detection
- Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks
- PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
- Born Again Neural Networks
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
- Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
- Pact: Parameterized clipping activation for quantized neural networks
- Fully Convolutional Model for Variable Bit Length and Lossy High Density Compression of Mammograms
- Recurrent knowledge distillation
- Neural Network Compression using Transform Coding and Clustering
- Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region
- Tensorized Spectrum Preserving Compression for Neural Networks
- Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
- Understanding generalization and optimization performance of deep CNNs
- Convolutional neural network compression for natural language processing
- Scalable methods for 8-bit training of neural networks - 5153.
- Distilling Knowledge for Search-based Structured Prediction
- Retraining-Based Iterative Weight Quantization for Deep Neural Networks
- A novel channel pruning method for deep neural network compression
- Grow and Prune Compact, Fast, and Accurate LSTMs
- MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression
- Channel Gating Neural Networks
- Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
- EasyConvPooling: Random Pooling with Easy Convolution for Accelerating Training and Testing
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- Knowledge Distillation by On-the-Fly Native Ensemble
- Ensemble Pruning based on Objection Maximization with a General Distributed Framework
- Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
- SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks
- Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication
- AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference
- SCSP: Spectral Clustering Filter Pruning with Soft Self-adaption Manners
- Scalable Neural Network Compression and Pruning Using Hard Clustering and L1 Regularization
- PCAS: Pruning Channels with Attention Statistics
- RAPIDNN: In-Memory Deep Neural Network Acceleration Framework
- Igcv3: Interleaved low-rank group convolutions for efficient deep neural networks
- Fast Convex Pruning of Deep Neural Networks
- DropBack: Continuous Pruning During Training
- Quantizing deep convolutional networks for efficient inference: A whitepaper
- On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
- Adversarial distillation of bayesian neural network posteriors
- SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
- FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design
- OCTen: Online Compression-based Tensor Decomposition
- Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure
- An Acceleration Scheme for Memory Limited, Streaming PCA
- Self-supervised Knowledge Distillation Using Singular Value Decomposition
- Statistical Model Compression for Small-Footprint Natural Language Understanding
- Optimize deep convolutional neural network with ternarized weights and high accuracy - 921.
- Recent Advances in Convolutional Neural Network Acceleration
- PCNNA: A Photonic Convolutional Neural Network Accelerator
- Coreset-Based Neural Network Compression - smiles/CNN_Compression](https://github.com/metro-smiles/CNN_Compression)】
- Aggregated Learning: A Vector Quantization Approach to Learning with Neural Networks
- Principal Filter Analysis for Guided Network Compression
- FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software
- Shufflenet v2: Practical guidelines for efficient cnn architecture design - 131.
- Extreme Network Compression via Filter Group Approximation
- t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data
- SlimNets: An Exploration of Deep Model Compression and Acceleration
- A Comprehensive Survey for Low Rank Regularization
- Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks
- DNN Feature Map Compression using Learned Representation over GF(2)
- Joint Training of Low-Precision Neural Network with Quantization Interval Parameters
- Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks - y/soft-filter-pruning](https://github.com/he-y/soft-filter-pruning)】
- Progressive Deep Neural Networks Acceleration via Soft Filter Pruning
- An Overview of Datatype Quantization Techniques for Convolutional Neural Networks
- Approximation Trees: Statistical Stability in Model Distillation
- Spectral-Pruning: Compressing deep neural network via spectral analysis
- Accelerating Deep Neural Networks with Spatial Bottleneck Modules
- An FPGA-Accelerated Design for Deep Learning Pedestrian Detection in Self-Driving Vehicles
- Crossbar-aware neural network pruning
- ADAM-ADMM: A Unified, Systematic Framework of Structured Weight Pruning for DNNs
- Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
- Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
- SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs
- Low Precision Policy Distillation with Application to Low-Power, Real-time Sensation-Cognition-Action Loop with Neuromorphic Computing
- Adaptive Pruning of Neural Language Models for Mobile Devices
- Compressing the Input for CNNs with the First-Order Scattering Transform - 316.
- GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
- NICE: Noise Injection and Clamping Estimation for Neural Network Quantization
- Pruned and Structurally Sparse Neural Networks
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
- Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks
- Dynamic sparse graph for efficient deep learning - sparse-graph](https://github.com/mtcrawshaw/dynamic-sparse-graph)】
- Proxquant: Quantized neural networks via proximal operators
- Relaxed Quantization for Discretized Neural Networks
- LIT: Block-wise Intermediate Representation Training for Model Compression
- Towards fast and energy-efficient binarized neural network inference on fpga
- Learning Compressed Transforms with Low Displacement Rank
- SNIP: Single-shot Network Pruning based on Connection Sensitivity - public](https://github.com/namhoonlee/snip-public)】
- Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
- Deep Neural Network Compression for Aircraft Collision Avoidance Systems
- Pruning neural networks: is it time to nip it in the bud?
- Rethinking the Value of Network Pruning - mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning)】
- FPGA-based Acceleration System for Visual Tracking
- Quantization for Rapid Deployment of Deep Neural Networks
- ACIQ: Analytical Clipping for Integer Quantization of neural networks
- Rate Distortion For Model Compression: From Theory To Practice
- Interpretable Convolutional Filter Pruning
- Progressive Weight Pruning of Deep Neural Networks using ADMM
- Pruning Deep Neural Networks using Partial Least Squares
- CNN inference acceleration using dictionary of centroids
- To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference
- Convolutional Neural Network Pruning to Accelerate Membrane Segmentation in Electron Microscopy
- Differentiable Fine-grained Quantization for Deep Neural Network Compression
- HAKD: Hardware Aware Knowledge Distillation
- Bayesian Compression for Natural Language Processing
- Lossless (and Lossy) Compression of Random Forests
- Discrimination-aware Channel Pruning for Deep Neural Networks - AILab/DCP](https://github.com/SCUT-AILab/DCP)】
- DeepTwist: Learning Model Compression via Occasional Weight Distortion
- A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
- Convolutional Neural Network Quantization using Generalized Gamma Distribution
- Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration - y/filter-pruning-geometric-median](https://github.com/he-y/filter-pruning-geometric-median)】
- Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices
- Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
- ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks
- A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM
- Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators
- Demystifying Neural Network Filter Pruning
- Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks
- YASENN: Explaining Neural Networks via Partitioning Activation Sequences
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
- GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
- Deep Compression of Sum-Product Networks on Tensor Networks
- Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition
- Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks
- Private Model Compression via Knowledge Distillation
- Iteratively Training Look-Up Tables for Network Quantization
- Fast Human Pose Estimation
- QUENN: QUantization Engine for low-power Neural Networks
- Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators
- The core consistency of a compressed tensor
- Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
- Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method
- Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
- Accelerating the Evolution of Convolutional Neural Networks with Node-Level Mutations and Epigenetic Weight Initialization
- Stability Based Filter Pruning for Accelerating Deep CNNs
- Multi-layer Pruning Framework for Compressing Single Shot MultiBox Detector
- Structured Pruning for Efficient ConvNets via Incremental Regularization
- Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks
- Structured Pruning of Neural Networks with Budget-Aware Regularization
- Joint Neural Architecture Search and Quantization
- On Periodic Functions as Regularizers for Quantization of Neural Networks
- Spatial Knowledge Distillation to aid Visual Reasoning
- Dataset Distillation
- Leveraging Filter Correlations for Deep Model Compression
- Effective, Fast, and Memory-Efficient Compressed Multi-function Convolutional Neural Networks for More Accurate Medical Image Classification
- A Framework for Fast and Efficient Neural Network Compression - Kim/ENC](https://github.com/Hyeji-Kim/ENC)】
- Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
- Snapshot Distillation: Teacher-Student Optimization in One Generation
- Network Compression via Recursive Bayesian Pruning
- Knowledge Distillation with Feature Maps for Image Classification
- Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud
- Accelerating Large Scale Knowledge Distillation via Dynamic Importance Sampling
- Structure Learning Using Forced Pruning
- Pre-Defined Sparse Neural Networks with Hardware Acceleration
- Prototype-based Neural Network Layers: Incorporating Vector Quantization
- Training for 'Unstable' CNN Accelerator:A Case Study on FPGA
- ECC: Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
- Knowledge Distillation from Few Samples
- DropPruning for Model Compression
- Exploiting Processing in Non-Volatile Memory for Binary Neural Network Accelerators
- DNQ: Dynamic Network Quantization
- Trained Rank Pruning for Efficient Deep Neural Networks
- Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
- MEAL: Multi-Model Ensemble via Adversarial Learning
- Online Model Distillation for Efficient Video Inference
- Optimizing Speed/Accuracy Trade-Off for Person Re-identification via Knowledge Distillation
- Reliable Identification of Redundant Kernels for Convolutional Neural Network Compression
- Accelerating Convolutional Neural Networks via Activation Map Compression
- Proximal Mean-field for Neural Network Quantization
- Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
- A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks
- Channel-wise pruning of neural networks with tapering resource constraint
- Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
- RNNFast: An Accelerator for Recurrent Neural Networks Using Domain Wall Memory
- Fast Adjustable Threshold For Uniform Neural Network Quantization
- DAC: Data-free Automatic Acceleration of Convolutional Networks
- Slimmable neural networks
- Precision Highway for Ultra Low-Precision Quantization
- Dynamic Runtime Feature Map Pruning
- BNN+: Improved binary network training
- Studying the Plasticity in Deep Convolutional Neural Networks using Random Pruning
- Improving the Interpretability of Deep Neural Networks with Knowledge Distillation
- Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks
- Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm
- Cumulative Saliency based Globally Balanced Filter Pruning For Efficient Convolutional Neural Networks
- Online Model Distillation for Efficient Video Inference
- PBGen: Partial Binarization of Deconvolution-Based Generators for Edge Intelligence
- Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? - 4932.<br>【code:[XinDongol/BENN-PyTorch](https://github.com/XinDongol/BENN-PyTorch)】
- Multi-fiber networks for video recognition - 367.<br>【code:[cypw/PyTorch-MFNet](https://github.com/cypw/PyTorch-MFNet)】
- Mnasnet: Platform-aware neural architecture search for mobile - 2828.<br>【code:[tensorflow/tpu](https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet)】
- Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network - 9200.<br>【code:[sacmehta/ESPNetv2](https://github.com/sacmehta/ESPNetv2)】
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers - willturner/batchnorm-pruning](https://github.com/jack-willturner/batchnorm-pruning)】
- Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
- Effective Quantization Approaches for Recurrent Neural Networks
- ADC: Automated Deep Compression and Acceleration with Reinforcement Learning - pruning](https://github.com/Tencent/PocketFlow#channel-pruning);[mit-han-lab/amc-release](https://github.com/mit-han-lab/amc-release);[mit-han-lab/amc-compressed-models](https://github.com/mit-han-lab/amc-compressed-models)】
- ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators
- Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator
- Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept
- Security Analysis and Enhancement of Model Compressed Deep Learning Systems under Adversarial Attacks
- Building Efficient ConvNets using Redundant Feature Pruning
- Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
- Deep Neural Network Compression with Single and Multiple Level Quantization
- Interpreting Deep Classifier by Visual Distillation of Dark Knowledge
- Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation
- Defensive Collaborative Multi-task Training - Defending against Adversarial Attack towards Deep Neural Networks
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
- Fisher Pruning of Deep Nets for Facial Trait Classification
- Fast and Accurate Single Image Super-Resolution via Information Distillation Network
- Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
- Neural Compatibility Modeling with Attentive Knowledge Distillation
- Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications
- Fully Convolutional Model for Variable Bit Length and Lossy High Density Compression of Mammograms
- Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks
- Structured Pruning of Neural Networks with Budget-Aware Regularization
- On Periodic Functions as Regularizers for Quantization of Neural Networks
- Pre-Defined Sparse Neural Networks with Hardware Acceleration
- Accelerating Convolutional Neural Networks via Activation Map Compression
- RNNFast: An Accelerator for Recurrent Neural Networks Using Domain Wall Memory
- From hashing to cnns: Training binary weight networks via hashing
- Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling - Pui Chau, Gang Wang, 2018
- Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy
- Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation
- Faster gaze prediction with dense networks and Fisher pruning
- A Comprehensive Survey for Low Rank Regularization
-
2019
- PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training
- Information-Theoretic Understanding of Population Risk Improvement with Model Compression - xichen/pytorch-playgroun](https://github.com/aaron-xichen/pytorch-playgroun)】
- FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
- Learning Efficient Detector with Semi-supervised Adaptive Distillation
- A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- Bandwidth Reduction using Importance Weighted Pruning on Ring AllReduce
- Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks
- Spatial-Winograd Pruning Enabling Sparse Winograd Convolution
- How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning
- CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs
- EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search - NAS](https://github.com/JaminFong/EAT-NAS)】
- Using Quantization to Deploy Heterogeneous Nodes in Two-Tier Wireless Sensor Networks
- AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks
- On Compression of Unsupervised Neural Nets by Pruning Weak Connections
- Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
- Distillation Strategies for Proximal Policy Optimization
- Really should we pruning after model be totally trained? Pruning based on a small amount of training
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting - zhang/dnn-quant-ocs](https://github.com/cornell-zhang/dnn-quant-ocs)】
- Tensorized Embedding Layers for Efficient Model Compression
- Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks
- Deep Triplet Quantization
- Learnable Embedding Space for Efficient Neural Architecture Compression
- MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression
- CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators
- Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
- Distilling Policy Distillation
- Compression of Recurrent Neural Networks for Efficient Language Modeling
- Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications
- FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
- Architecture Compression
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher - Assistant-Knowledge-Distillation](https://github.com/imirzadeh/Teacher-Assistant-Knowledge-Distillation)】
- Adversarially Trained Model Compression: When Robustness Meets Efficiency
- Effective Network Compression Using Simulation-Guided Iterative Pruning
- Structured Bayesian Compression for Deep models in mobile enabled devices for connected healthcare
- AutoQB: AutoML for Network Quantization and Binarization on Mobile Devices
- Evaluating Pruning Methods in Gene Network Inference
- Single-shot Channel Pruning Based on Alternating Direction Method of Multipliers
- Low-bit Quantization of Neural Networks for Efficient Inference
- Learned Step Size Quantization
- Ising-Dropout: A Regularization Method for Training and Compression of Deep Neural Networks
- Cluster Regularized Quantization for Deep Networks Compression
- TKD: Temporal Knowledge Distillation for Active Perception
- On the Quantization of Cellular Neural Networks for Cyber-Physical Systems
- Compressing complex convolutional neural network based on an improved deep compression algorithm
- Efficient and Effective Quantization for Sparse DNNs
- Everything old is new again: A multi-view learning approach to learning using privileged information and distillation
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search - Order-Pruning](https://github.com/lixincn2015/Partial-Order-Pruning)】
- Continual Learning via Neural Pruning
- Cascaded Projection: End-to-End Network Compression and Acceleration
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
- Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
- One time is not enough: iterative tensor decomposition for neural network compression
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
- Robustness of Neural Networks to Parameter Quantization
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Second Rethinking of Network Pruning in the Adversarial Setting
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
- M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning
- Correlation Congruence for Knowledge Distillation
- A Comprehensive Overhaul of Feature Distillation - heo/overhaul](https://sites.google.com/view/byeongho-heo/overhaul)】
- Progressive Stochastic Binarization of Deep Networks
- White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks
- Accelerating Deep Unsupervised Domain Adaptation with Transfer Channel Pruning
- Paying More Attention to Motion: Attention Distillation for Learning Video Representations
- Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN
- Ultrafast Video Attention Prediction with Coupled Knowledge Distillation
- C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning
- Long-Term Vehicle Localization by Recursive Knowledge Distillation
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure - SGD]( https://github.com/ShawnDing1994/Centripetal-SGD)】
- Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
- ASAP: Architecture Search, Anneal and Prune
- Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization
- Back to the Future: Knowledge Distillation for Human Action Anticipation
- Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency
- Relational Knowledge Distillation
- Knowledge Squeezed Adversarial Network Compression
- Variational Information Distillation for Knowledge Transfer
- Cramnet: Layer-wise Deep Neural Network Compression with Knowledge Transfer from a Teacher Network
- Matrix and tensor decompositions for training binary neural networks
- Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience
- End-to-End Speech Translation with Knowledge Distillation
- Defensive Quantization: When Efficiency Meets Robustness
- Feature Fusion for Online Mutual Knowledge Distillation
- Knowledge Distillation via Route Constrained Optimization
- Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
- LeGR: Filter Pruning via Learned Global Ranking
- Neuromorphic Acceleration for Approximate Bayesian Inference on Neural Networks via Permanent Dropout
- Ensemble Distribution Distillation
- ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
- Full-stack Optimization for Accelerating CNNs with FPGA Validation
- Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training
- Creating Lightweight Object Detectors with Model Compression for Deployment on Edge Devices
- Searching for mobilenetv3
- 2-bit Model Compression of Deep Convolutional Neural Network on ASIC Engine for Image Retrieval
- AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
- Deep Learning Acceleration Techniques for Real Time Mobile Vision Applications
- HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
- Play and Prune: Adaptive Filter Pruning for Deep Model Compression
- Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
- Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
- Differentiable Pruning Method for Neural Networks
- Learning to Prune: Speeding up Repeated Computations
- Triplet Distillation for Deep Face Recognition
- Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning
- Network Pruning for Low-Rank Binary Indexing
- Diagonal Acceleration for Covariance Matrix Adaptation Evolution Strategies
- Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
- Investigating Channel Pruning through Structural Redundancy Reduction - A Statistical Study
- Dream Distillation: A Data-Independent Model Compression Framework
- Combining Experience Replay with Exploration by Random Network Distillation
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
- Zero-Shot Knowledge Distillation in Deep Networks
- DARC: Differentiable ARchitecture Compression
- DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
- Time-varying Autoregression with Low Rank Tensors
- Revisiting hard thresholding for DNN pruning
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
- Network Pruning via Transformable Architecture Search
- Adversarially Robust Distillation
- Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks
- X-TrainCaps: Accelerated Training of Capsule Nets through Lightweight Software Optimizations
- HadaNets: Flexible Quantization Strategies for Neural Networks
- Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
- Natural Compression for Distributed Deep Learning
- Differentiable Quantization of Deep Neural Networks
- Learning In Practice: Reasoning About Quantization
- CGaP: Continuous Growth and Pruning for Efficient Deep Learning
- Accelerating Extreme Classification via Adaptive Feature Agglomeration
- Online Filter Clustering and Pruning for Efficient Convnets
- Instant Quantization of Neural Networks using Monte Carlo Methods
- Attention Based Pruning for Shift Networks
- Data-Free Quantization through Weight Equalization and Bias Correction
- Run-Time Efficient RNN Compression for Inference on Edge Devices
- Using Small Proxy Datasets to Accelerate Hyperparameter Search
- Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
- Dynamic Distribution Pruning for Efficient Network Architecture Search
- Quantization Loss Re-Learning Method
- L0 Regularization Based Neural Network Design and Compression
- PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
- The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning
- Multi-objective Pruning for CNNs using Genetic Algorithm
- Dimensionality compression and expansion in Deep Neural Networks
- Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
- Deep Face Recognition Model Compression via Knowledge Transfer and Distillation
- Parameterized Structured Pruning for Deep Neural Networks
- Optimal low rank tensor recovery
- A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
- Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques
- (Pen-) Ultimate DNN Pruning
- Fighting Quantization Bias With Bias
- Ensemble Pruning via Margin Maximization
- Distilling Object Detectors with Fine-grained Feature Imitation
- The Generalization-Stability Tradeoff in Neural Network Pruning
- Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
- BlockSwap: Fisher-guided Block Substitution for Network Compression
- A Taxonomy of Channel Pruning Signals in CNNs
- Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation
- Linear Distillation Learning
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- Scalable Syntax-Aware Language Models Using Knowledge Distillation
- Model Compression by Entropy Penalized Reparameterization
- Deep Recurrent Quantization for Generating Sequential Binary Codes
- Structured Pruning of Recurrent Neural Networks through Neuron Selection
- A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks
- Prune and Replace NAS
- Joint Pruning on Activations and Weights for Efficient Neural Networks
- ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection
- PABO: Pseudo Agent-Based Multi-Objective Bayesian Hyperparameter Optimization for Efficient Neural Accelerator Design
- GAN-Knowledge Distillation for one-stage Object Detection
- Back to Simplicity: How to Train Accurate BNNs from Scratch?
- An Improved Trade-off Between Accuracy and Complexity with Progressive Gradient Pruning
- COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
- Importance Estimation for Neural Network Pruning
- Deep Model Compression via Filter Auto-sampling
- And the Bit Goes Down: Revisiting the Quantization of Neural Networks - the-bits](https://github.com/facebookresearch/kill-the-bits)】
- Essence Knowledge Distillation for Speech Recognition
- Dissecting Pruned Neural Networks
- Weight Normalization based Quantization for Deep Neural Network Compression
- Compression of Acoustic Event Detection Models With Quantized Distillation
- Accelerating Deconvolution on Unmodified CNN Accelerators for Generative Adversarial Networks -- A Software Approach
- Non-structured DNN Weight Pruning Considered Harmful
- Graph-based Knowledge Distillation by Multi-head Attention Network
- A Survey of Pruning Methods for Efficient Person Re-identification Across Domains
- An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis
- Distilling with Residual Network for Single Image Super Resolution
- ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
- On Activation Function Coresets for Network Pruning
- A Targeted Acceleration and Compression Framework for Low bit Neural Networks
- Light Multi-segment Activation for Model Compression
- ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining
- EnSyth: A Pruning Approach to Synthesis of Deep Learning Ensembles
- Highlight Every Step: Knowledge Distillation via Collaborative Teaching
- Similarity-Preserving Knowledge Distillation
- Adaptive Compression-based Lifelong Learning
- Real-Time Correlation Tracking via Joint Model Compression and Transfer
- Teacher-Students Knowledge Distillation for Siamese Trackers
- Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
- Learning Instance-wise Sparsity for Accelerating Deep Models
- DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
- Unsupervised Neural Quantization for Compressed-Domain Similarity Search
- UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
- Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
- Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
- Accelerated CNN Training Through Gradient Approximation
- Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning
- Automatic Compiler Based FPGA Accelerator for CNN Training
- Compression and Acceleration of Neural Networks for Communications
- Accelerating CNN Training by Sparsifying Activation Gradients
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation
- Distilling Knowledge From a Deep Pose Regressor Network
- U-Net Fixed-Point Quantization for Medical Image Segmentation
- GDRQ: Group-based Distribution Reshaping for Quantization
- Architecture-aware Network Pruning for Vision Quality Applications
- Exploiting Channel Similarity for Accelerating Deep Convolutional Neural Networks
- Efficient Inference of CNNs via Channel Pruning
- Multivariate Convolutional Sparse Coding with Low Rank Tensor
- Group Pruning using a Bounded-Lp norm for Group Gating and Regularization
- A New Fast Weighted All-pairs Shortest Path Search Algorithm Based on Pruning by Shortest Path Trees
- A New Fast Unweighted All-pairs Shortest Path Search Algorithm Based on Pruning by Shortest Path Trees
- Knowledge distillation for semi-supervised domain adaptation
- Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement
- Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation
- MASR: A Modular Accelerator for Sparse RNNs
- Relation Distillation Networks for Video Object Detection
- Differentiable Product Quantization for End-to-End Embedding Compression
- Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation
- Online Sensor Hallucination via Knowledge Distillation for Multimodal Image Classification
- PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors
- Survey and Benchmarking of Machine Learning Accelerators
- AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms
- High Performance Scalable FPGA Accelerator for Deep Neural Networks
- Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
- EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks
- Knowledge Distillation for End-to-End Person Search
- What Happens on the Edge, Stays on the Edge: Toward Compressive Deep Learning
- Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks
- A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting
- Accelerating Training using Tensor Decomposition
- TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks
- HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices
- A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks
- Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
- Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network
- Transformer to CNN: Label-scarce distillation for efficient text classification
- LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution
- VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks
- Differentiable Mask Pruning for Neural Networks
- WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
- PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
- Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation
- TinyBERT: Distilling BERT for Natural Language Understanding
- Class-dependent Compression of Deep Neural Networks
- FEED: Feature-level Ensemble for Knowledge Distillation
- Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network
- FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN
- Accurate and Compact Convolutional Neural Networks with Trained Binarization
- Revisit Knowledge Distillation: a Teacher-free Framework - free-Knowledge-Distillation](https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation)】
- Lightweight Image Super-Resolution with Information Multi-distillation Network
- Smart Ternary Quantization
- Pruning from Scratch
- Global Sparse Momentum SGD for Pruning Very Deep Neural Networks - SGD](https://github.com/DingXiaoH/GSM-SGD)】
- Training convolutional neural networks with cheap convolutions and online distillation - cheap-convolution](https://github.com/EthanZhangYC/OD-cheap-convolution)】
- Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
- REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
- EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis - Pytorch](https://github.com/alecwangcq/EigenDamage-Pytorch)】
- Adversarial Neural Pruning
- Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
- UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing
- CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation
- Adaptive mixture of low-rank factorizations for compact neural modeling
- Collaborative Channel Pruning for Deep Networks - 5122.
- Variational Convolutional Neural Network Pruning - 2789.
- OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks - 7055.
- An Empirical study of Binary Neural Networks' Optimisation
- CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs
- Quantization Networks
- EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search - NAS](https://github.com/JaminFong/EAT-NAS)】
- [paper
- Learning to quantize deep networks by optimizing quantization intervals with task loss - 4359.
- FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
- Learning Efficient Detector with Semi-supervised Adaptive Distillation
- A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing
- Bandwidth Reduction using Importance Weighted Pruning on Ring AllReduce
- CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation
- Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks
- Spatial-Winograd Pruning Enabling Sparse Winograd Convolution
- How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning
- Using Quantization to Deploy Heterogeneous Nodes in Two-Tier Wireless Sensor Networks
- AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks
- On Compression of Unsupervised Neural Nets by Pruning Weak Connections
- Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
- Distillation Strategies for Proximal Policy Optimization
- Really should we pruning after model be totally trained? Pruning based on a small amount of training
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression
- PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training
- Information-Theoretic Understanding of Population Risk Improvement with Model Compression - xichen/pytorch-playgroun](https://github.com/aaron-xichen/pytorch-playgroun)】
- Tensorized Embedding Layers for Efficient Model Compression
- Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks
- Deep Triplet Quantization
- Learnable Embedding Space for Efficient Neural Architecture Compression
- MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression
- Compression of Recurrent Neural Networks for Efficient Language Modeling
- Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications
- FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
- Architecture Compression
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher - Assistant-Knowledge-Distillation](https://github.com/imirzadeh/Teacher-Assistant-Knowledge-Distillation)】
- Adversarially Trained Model Compression: When Robustness Meets Efficiency
- Effective Network Compression Using Simulation-Guided Iterative Pruning
- Structured Bayesian Compression for Deep models in mobile enabled devices for connected healthcare
- AutoQB: AutoML for Network Quantization and Binarization on Mobile Devices
- Evaluating Pruning Methods in Gene Network Inference
- Single-shot Channel Pruning Based on Alternating Direction Method of Multipliers
- Speeding up convolutional networks pruning with coarse ranking
- Low-bit Quantization of Neural Networks for Efficient Inference
- Learned Step Size Quantization
- Ising-Dropout: A Regularization Method for Training and Compression of Deep Neural Networks
- Adaptive Estimators Show Information Compression in Deep Neural Networks
- CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators
- Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
- Multi-loss-aware Channel Pruning of Deep Networks
- Cluster Regularized Quantization for Deep Networks Compression
- TKD: Temporal Knowledge Distillation for Active Perception
- On the Quantization of Cellular Neural Networks for Cyber-Physical Systems
- Compressing complex convolutional neural network based on an improved deep compression algorithm
- Efficient and Effective Quantization for Sparse DNNs
- Everything old is new again: A multi-view learning approach to learning using privileged information and distillation
- Structured Knowledge Distillation for Semantic Segmentation
- Continual Learning via Neural Pruning
- Knowledge Adaptation for Efficient Semantic Segmentation
- Cascaded Projection: End-to-End Network Compression and Acceleration
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
- Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
- One time is not enough: iterative tensor decomposition for neural network compression
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
- Robustness of Neural Networks to Parameter Quantization
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Second Rethinking of Network Pruning in the Adversarial Setting
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
- M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning
- Correlation Congruence for Knowledge Distillation
- A Comprehensive Overhaul of Feature Distillation - heo/overhaul](https://sites.google.com/view/byeongho-heo/overhaul)】
- Progressive Stochastic Binarization of Deep Networks
- White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks
- Accelerating Deep Unsupervised Domain Adaptation with Transfer Channel Pruning
- Paying More Attention to Motion: Attention Distillation for Learning Video Representations
- C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning
- Long-Term Vehicle Localization by Recursive Knowledge Distillation
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure - SGD]( https://github.com/ShawnDing1994/Centripetal-SGD)】
- Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
- ASAP: Architecture Search, Anneal and Prune
- Ultrafast Video Attention Prediction with Coupled Knowledge Distillation
- Back to the Future: Knowledge Distillation for Human Action Anticipation
- Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency
- Relational Knowledge Distillation
- Knowledge Squeezed Adversarial Network Compression
- Variational Information Distillation for Knowledge Transfer
- Cramnet: Layer-wise Deep Neural Network Compression with Knowledge Transfer from a Teacher Network
- Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience
- End-to-End Speech Translation with Knowledge Distillation
- Defensive Quantization: When Efficiency Meets Robustness
- Feature Fusion for Online Mutual Knowledge Distillation
- Knowledge Distillation via Route Constrained Optimization
- Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
- Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
- Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
- Differentiable Pruning Method for Neural Networks
- Learning to Prune: Speeding up Repeated Computations
- LeGR: Filter Pruning via Learned Global Ranking
- Neuromorphic Acceleration for Approximate Bayesian Inference on Neural Networks via Permanent Dropout
- Ensemble Distribution Distillation
- ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
- Full-stack Optimization for Accelerating CNNs with FPGA Validation
- Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training
- Creating Lightweight Object Detectors with Model Compression for Deployment on Edge Devices
- Deep Learning Acceleration Techniques for Real Time Mobile Vision Applications
- 2-bit Model Compression of Deep Convolutional Neural Network on ASIC Engine for Image Retrieval
- AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
- Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization
- Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN
- Play and Prune: Adaptive Filter Pruning for Deep Model Compression
- Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
- Approximated Oracle Filter Pruning for Destructive CNN Width Optimization
- Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning
- Network Pruning for Low-Rank Binary Indexing
- Diagonal Acceleration for Covariance Matrix Adaptation Evolution Strategies
- EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis - Pytorch](https://github.com/alecwangcq/EigenDamage-Pytorch)】
- Investigating Channel Pruning through Structural Redundancy Reduction - A Statistical Study
- Dream Distillation: A Data-Independent Model Compression Framework
- Combining Experience Replay with Exploration by Random Network Distillation
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
- Zero-Shot Knowledge Distillation in Deep Networks
- DARC: Differentiable ARchitecture Compression
- DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
- Time-varying Autoregression with Low Rank Tensors
- Disentangling Redundancy for Multi-Task Pruning
- Network Pruning via Transformable Architecture Search
- Adversarially Robust Distillation
- Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks
- X-TrainCaps: Accelerated Training of Capsule Nets through Lightweight Software Optimizations
- HadaNets: Flexible Quantization Strategies for Neural Networks
- Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques
- Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
- Natural Compression for Distributed Deep Learning
- Quantization-Based Regularization for Autoencoders
- Differentiable Quantization of Deep Neural Networks
- Learning In Practice: Reasoning About Quantization
- CGaP: Continuous Growth and Pruning for Efficient Deep Learning
- Online Filter Clustering and Pruning for Efficient Convnets
- Instant Quantization of Neural Networks using Monte Carlo Methods
- Revisiting hard thresholding for DNN pruning
- Attention Based Pruning for Shift Networks
- Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
- Dynamic Distribution Pruning for Efficient Network Architecture Search
- Quantization Loss Re-Learning Method
- L0 Regularization Based Neural Network Design and Compression
- (Pen-) Ultimate DNN Pruning
- The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning
- Multi-objective Pruning for CNNs using Genetic Algorithm
- Dimensionality compression and expansion in Deep Neural Networks
- Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
- A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
- Fighting Quantization Bias With Bias
- Ensemble Pruning via Margin Maximization
- Distilling Object Detectors with Fine-grained Feature Imitation
- Using Small Proxy Datasets to Accelerate Hyperparameter Search
- Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
- BlockSwap: Fisher-guided Block Substitution for Network Compression
- A Taxonomy of Channel Pruning Signals in CNNs
- Data-Free Quantization through Weight Equalization and Bias Correction
- Run-Time Efficient RNN Compression for Inference on Edge Devices
- Parameterized Structured Pruning for Deep Neural Networks
- Optimal low rank tensor recovery
- Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation
- Linear Distillation Learning
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- Scalable Syntax-Aware Language Models Using Knowledge Distillation
- Model Compression by Entropy Penalized Reparameterization
- Deep Recurrent Quantization for Generating Sequential Binary Codes
- Structured Pruning of Recurrent Neural Networks through Neuron Selection
- Light Multi-segment Activation for Model Compression
- ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining
- EnSyth: A Pruning Approach to Synthesis of Deep Learning Ensembles
- A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks
- Prune and Replace NAS
- Joint Pruning on Activations and Weights for Efficient Neural Networks
- ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection
- PABO: Pseudo Agent-Based Multi-Objective Bayesian Hyperparameter Optimization for Efficient Neural Accelerator Design
- GAN-Knowledge Distillation for one-stage Object Detection
- Highlight Every Step: Knowledge Distillation via Collaborative Teaching
- An Improved Trade-off Between Accuracy and Complexity with Progressive Gradient Pruning
- COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
- Importance Estimation for Neural Network Pruning
- Essence Knowledge Distillation for Speech Recognition
- Accelerating Large-Kernel Convolution Using Summed-Area Tables
- Dissecting Pruned Neural Networks
- Weight Normalization based Quantization for Deep Neural Network Compression
- Compression of Acoustic Event Detection Models With Quantized Distillation
- Accelerating Deconvolution on Unmodified CNN Accelerators for Generative Adversarial Networks -- A Software Approach
- Non-structured DNN Weight Pruning Considered Harmful
- Graph-based Knowledge Distillation by Multi-head Attention Network
- A Survey of Pruning Methods for Efficient Person Re-identification Across Domains
- Distilling with Residual Network for Single Image Super Resolution
- AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
- ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
- On Activation Function Coresets for Network Pruning
- A Targeted Acceleration and Compression Framework for Low bit Neural Networks
- Deep Model Compression via Filter Auto-sampling
- And the Bit Goes Down: Revisiting the Quantization of Neural Networks - the-bits](https://github.com/facebookresearch/kill-the-bits)】
- An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis
- Adaptive Compression-based Lifelong Learning
- Real-Time Correlation Tracking via Joint Model Compression and Transfer
- Teacher-Students Knowledge Distillation for Siamese Trackers
- Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
- Learning Instance-wise Sparsity for Accelerating Deep Models
- DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
- Compression and Acceleration of Neural Networks for Communications
- Accelerating CNN Training by Sparsifying Activation Gradients
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation
- Distilling Knowledge From a Deep Pose Regressor Network
- U-Net Fixed-Point Quantization for Medical Image Segmentation
- GDRQ: Group-based Distribution Reshaping for Quantization
- Architecture-aware Network Pruning for Vision Quality Applications
- Exploiting Channel Similarity for Accelerating Deep Convolutional Neural Networks
- Efficient Inference of CNNs via Channel Pruning
- Multivariate Convolutional Sparse Coding with Low Rank Tensor
- Group Pruning using a Bounded-Lp norm for Group Gating and Regularization
- Unsupervised Neural Quantization for Compressed-Domain Similarity Search
- UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
- Adversarial Neural Pruning
- Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
- Accelerated CNN Training Through Gradient Approximation
- Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning
- Automatic Compiler Based FPGA Accelerator for CNN Training
- A New Fast Weighted All-pairs Shortest Path Search Algorithm Based on Pruning by Shortest Path Trees
- A New Fast Unweighted All-pairs Shortest Path Search Algorithm Based on Pruning by Shortest Path Trees
- Knowledge distillation for semi-supervised domain adaptation
- Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement
- Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
- Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation
- Learning Filter Basis for Convolutional Neural Network Compression
- MASR: A Modular Accelerator for Sparse RNNs
- Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
- Relation Distillation Networks for Video Object Detection
- Differentiable Product Quantization for End-to-End Embedding Compression
- Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation
- Online Sensor Hallucination via Knowledge Distillation for Multimodal Image Classification
- UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing
- PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors
- Survey and Benchmarking of Machine Learning Accelerators
- EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
- An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
- AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms
- High Performance Scalable FPGA Accelerator for Deep Neural Networks
- Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
- Transformer to CNN: Label-scarce distillation for efficient text classification
- LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution
- VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks
- Differentiable Mask Pruning for Neural Networks
- WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
- PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
- A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting
- Accelerating Training using Tensor Decomposition
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks
- EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks
- SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
- Knowledge Distillation for End-to-End Person Search
- What Happens on the Edge, Stays on the Edge: Toward Compressive Deep Learning
- Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks
- Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network
- HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices
- A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks
- Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
- Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks - decorator-pruning](https://github.com/youzhonghui/gate-decorator-pruning)】
- Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation
- Class-dependent Compression of Deep Neural Networks
- FEED: Feature-level Ensemble for Knowledge Distillation
- Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network
- FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN
- Accurate and Compact Convolutional Neural Networks with Trained Binarization
- Revisit Knowledge Distillation: a Teacher-free Framework - free-Knowledge-Distillation](https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation)】
- Lightweight Image Super-Resolution with Information Multi-distillation Network
- Learning Filter Basis for Convolutional Neural Network Compression
- Smart Ternary Quantization
- Model Pruning Enables Efficient Federated Learning on Edge Devices
- Pruning from Scratch
- Global Sparse Momentum SGD for Pruning Very Deep Neural Networks - SGD](https://github.com/DingXiaoH/GSM-SGD)】
- Training convolutional neural networks with cheap convolutions and online distillation - cheap-convolution](https://github.com/EthanZhangYC/OD-cheap-convolution)】
- Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
- REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
- On implicit filter level sparsity in convolutional neural networks - 528.<br>【code:[mehtadushy/SelecSLS-Pytorch](https://github.com/mehtadushy/SelecSLS-Pytorch)】
- A New Fast Unweighted All-pairs Shortest Path Search Algorithm Based on Pruning by Shortest Path Trees
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
- Approximated Oracle Filter Pruning for Destructive CNN Width Optimization
- Cross-Resolution Face Recognition via Prior-Aided Face Hallucination and Residual Knowledge Distillation
- Quantization-Based Regularization for Autoencoders
- Accelerating Large-Kernel Convolution Using Summed-Area Tables
- AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
- An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
- EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
- SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks - decorator-pruning](https://github.com/youzhonghui/gate-decorator-pruning)】
- Compressing GANs using Knowledge Distillation
- Adaptive Estimators Show Information Compression in Deep Neural Networks
- Multi-loss-aware Channel Pruning of Deep Networks
- Speeding up convolutional networks pruning with coarse ranking
- Structured Knowledge Distillation for Semantic Segmentation
-
2013
-
2014
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Speeding up convolutional neural networks with low rank expansions
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Flattened convolutional neural networks for feedforward acceleration
- Compressing Deep Convolutional Networks using Vector Quantization
- FitNets: Hints for Thin Deep Nets
- Speeding-up convolutional neural networks using fine-tuned cp-decomposition - v-lebedev/cp-decomposition](https://github.com/vadim-v-lebedev/cp-decomposition); [jacobgil/pytorch-tensor-decompositions](https://github.com/jacobgil/pytorch-tensor-decompositions); [medium.com/@keremturgutlu/tensor-decomposition-fast-cnn-in-your-pocket-f03e9b2a6788](https://medium.com/@keremturgutlu/tensor-decomposition-fast-cnn-in-your-pocket-f03e9b2a6788)】
- Dadiannao: A machinelearning supercomputer
- Speeding up convolutional neural networks with low rank expansions
- Learning with Pseudo-Ensembles
- Compressing Deep Convolutional Networks using Vector Quantization
- Training deep neural networks with low precision multiplications
- 1.1 computing’s energy problem (and what we can do about it) - State Circuits Conference Digest of Technical Papers, pages 10–14, 2014.
- Learning with Pseudo-Ensembles
- Compressing Deep Convolutional Networks using Vector Quantization
-
2015
- Deep Learning with Limited Numerical Precision
- Distilling the Knowledge in a Neural Network
- Training binary multilayer neural networks for image classification using expectation backpropagation
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Learning both Weights and Connections for Efficient Neural Networks - willturner/DeepCompression-PyTorch](https://github.com/jack-willturner/DeepCompression-PyTorch)】
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding - Compression-AlexNet](https://github.com/songhan/Deep-Compression-AlexNet)】
- Neural networks with few multiplications
- Net2Net: Accelerating Learning via Knowledge Transfer
- Convolutional neural networks with low-rank regularization
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- Dynamic Capacity Networks
- Unifying distillation and privileged information - Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015
- 8-bit approximations for parallelism in deep learning
- Reduced-precision strategies for bounded memory in deep neural nets
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Quantized Convolutional Neural Networks for Mobile Devices - wu/quantized-cnn](https://github.com/jiaxiang-wu/quantized-cnn)】
- High-performance hardware for machine learning
- Sparse convolutional neural networks - 814.
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices
- Deep fried convnets - 1483.
- Convolutional neural networks at constrained time cost - 5360.
- Deep Learning with Limited Numerical Precision
- Training binary multilayer neural networks for image classification using expectation backpropagation
- Recurrent Neural Network Training with Dark Knowledge Transfer
- Data-free parameter pruning for deep neural networks
- Distilling Model Knowledge
- Neural networks with few multiplications
- Binaryconnect: Training deep neural networks with binary weights during propagations - 3131.
- 8-bit approximations for parallelism in deep learning
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
- Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization - Gang Jiang, Boyang Li, Leonid Sigal, 2015
- Reduced-precision strategies for bounded memory in deep neural nets
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Cross Modal Distillation for Supervision Transfer
- Recurrent Neural Network Training with Dark Knowledge Transfer
-
2016
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks - Net](https://github.com/allenai/XNOR-Net)】
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Ternary weight networks - chris/caffe-twns](https://github.com/fengfu-chris/caffe-twns)】
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients - Net](https://github.com/tensorpack/tensorpack/tree/master/examples/DoReFa-Net)】
- Sequence-Level Knowledge Distillation
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Knowledge Distillation for Small-footprint Highway Networks
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Learning Structured Sparsity in Deep Neural Networks
- Dynamic Network Surgery for Efficient DNNs - Network-Surgery](https://github.com/yiwenguo/Dynamic-Network-Surgery)】
- Xception: Deep Learning with Depthwise Separable Convolutions
- Ultimate tensorization: compressing convolutional and fc layers alike - TF](https://github.com/timgaripov/TensorNet-TF);[Bihaqo/TensorNet](https://github.com/Bihaqo/TensorNet)】
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Local Binary Convolutional Neural Networks
- Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing
- Pruning Convolutional Neural Networks for Resource Efficient Inference - pruning](https://github.com/Tencent/PocketFlow#channel-pruning)】
- Effective Quantization Methods for Recurrent Neural Networks
- Trained ternary quantization
- Towards the Limit of Network Quantization
- Fasttext. zip: Compressing text classification models
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Understanding the impact of precision quantization on the accuracy and energy of neural networks - 1479.
- Finn: A framework for fast, scalable binarized neural network inference
- Deepburning: automatic generation of fpga-based learning accelerators for the neural network family
- DeepRebirth: A General Approach for Accelerating Deep Neural Network Execution on Mobile Devices
- Face model compression by distilling knowledge from neurons
- Fast algorithms for convolutional neural networks - 4021.
- Dynamic network surgery for efficient dnns
- Learning structured sparsity in deep neural networks.
- Fast convnets using groupwise brain damage
- Sparsifying neural network connections for face recognition - 4864.
- A Simple yet Effective Method to Prune Dense Layers of Neural Networks
- Cambricon-x: An accelerator for sparse neural networks
- fpgaconvnet: A framework for mapping convolutional neural networks on fpgas - Programmable Custom Computing Machines, pages 40–47, 2016.
- Accelerating convolutional neural networks for mobile applications
- Switched by input: Power efficient structure for rram-based convolutional neural network
- Fusedlayer cnn accelerators
- Lradnn: High-throughput and energy-efficient deep neural network accelerator using low rank approximation
- Quantized convolutional neural networks for mobile devices
- Cnvlutin: Ineffectual-neuron-free deep neural network computing
- From highlevel deep neural models to fpgas.
- Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory
- Energy-efficient cnn implementation on a deeply pipelined fpga cluster
- Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks - Aided Design, page 12, 2016.
- Fixed point quantization of deep convolutional networks.
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables - ROM. ACM, 2016.
- Bitwise neural networks
- Convolutional neural networks using logarithmic data representation
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- On the efficient representation and execution of deep acoustic models
- Stealing machine learning models via prediction apis - 618.
- Google's neural machine translation system: Bridging the gap between human and machine translation
- LightRNN: Memory and computation-efficient recurrent neural networks - 4393.
- Do deep convolutional nets really need to be deep and convolutional?
- Adapting Models to Signal Degradation using Distillation - Chyi Su, Subhransu Maji,2016
- Ristretto: Hardware-oriented approximation of convolutional neural networks.
- Learning Infinite-Layer Networks: Without the Kernel Trick
- Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices
- Ultimate tensorization: compressing convolutional and fc layers alike - TF](https://github.com/timgaripov/TensorNet-TF);[Bihaqo/TensorNet](https://github.com/Bihaqo/TensorNet)】
- Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing
- Lcnn: Lookup-based convolutional neural network - 7129.
- Pvanet: Lightweight deep neural networks for real-time object detection - faster-rcnn](https://github.com/sanghoon/pva-faster-rcnn)】
- In teacher we trust: Learning compressed models for pedestrian detection
- Fasttext. zip: Compressing text classification models
- Understanding the impact of precision quantization on the accuracy and energy of neural networks - 1479.
- Finn: A framework for fast, scalable binarized neural network inference
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Bitwise neural networks
- EIE: Efficient Inference Engine on Compressed Deep Neural Network
- Convolutional neural networks using logarithmic data representation
- Learning Infinite-Layer Networks: Without the Kernel Trick
- Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices
- On the efficient representation and execution of deep acoustic models
- Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial" bottleneck" structure
- Pruning Filters for Efficient ConvNets - mingjie/rethinking-network-pruning](https://github.com/Eric-mingjie/rethinking-network-pruning/tree/master/imagenet/l1-norm-pruning)】
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Loss-aware Binarization of Deep Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Aggregated Residual Transformations for Deep Neural Networks
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
- Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks - Programmable Gate Arrays, FPGA ’16, 2016.
- Going deeper with embedded fpga platform for convolutional neural network. - Programmable Gate Arrays, FPGA ’16, 2016.
- Net-trim: Convex pruning of deep neural networks with performance guarantee - 3186.<br>【code:[DNNToolBox/Net-Trim-v1](https://github.com/DNNToolBox/Net-Trim-v1)】
- Deep Networks with Stochastic Depth
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Accelerating convolutional neural networks for mobile applications
- Do deep convolutional nets really need to be deep and convolutional?
- Ristretto: Hardware-oriented approximation of convolutional neural networks.
- In teacher we trust: Learning compressed models for pedestrian detection
- Lradnn: High-throughput and energy-efficient deep neural network accelerator using low rank approximation
-
1990
-
1993
-
1997
-
1998
-
2000
-
2006
-
2011
-
-
PROJECTS
-
2023
- mxnet/quantization - DNN or CUDNN.
- TensoRT4-Example
- Tensorflow lite - device inference.;
- tensorflow/quantize
- Core ML
-
-
BLOGS & ATRICLES
-
LIBRARIES
-
2023
-
-
REFERENCE