Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-compression-papers
Paper collection about model compression and acceleration: Pruning, Quantization, Knowledge Distillation, Low Rank Factorization, etc
https://github.com/chenbong/awesome-compression-papers
Last synced: 5 days ago
JSON representation
-
2020
-
2020-NIPS
- Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
- Universally Quantized Neural Compression
- WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
- Rotated Binary Neural Network
- Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
- Efficient Exact Verification of Binarized Neural Networks
- Reintroducing Straight-Through Estimators asPrincipled Methods for Stochastic Binary Networks
- Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
- Directional Pruning of Deep Neural Networks
- Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
- Movement Pruning: Adaptive Sparsity by Fine-Tuning
- The Generalization-Stability Tradeoff In Neural Network Pruning
- Pruning neural networks without any data by conserving synaptic flow
- HYDRA: Pruning Adversarially Robust Neural Networks
- Logarithmic Pruning is All You Need
- Pruning Filter in Filter
- Bayesian Bits: Unifying Quantization and Pruning
- Searching for Low-Bit Weights in Quantized Neural Networks
- Robust Quantization: One Model to Rule Them All
- Position-based Scaled Gradient for Model Quantization and Sparse Training
- Universally Quantized Neural Compression
- FleXOR: Trainable Fractional Quantization
- Efficient Exact Verification of Binarized Neural Networks
- Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
- WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
- Self-Distillation Amplifies Regularization in Hilbert Space
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Self-Distillation as Instance-Specific Label Smoothing
- Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
- Searching for Low-Bit Weights in Quantized Neural Networks
- The Generalization-Stability Tradeoff In Neural Network Pruning
- Pruning neural networks without any data by conserving synaptic flow
- Logarithmic Pruning is All You Need
- Bayesian Bits: Unifying Quantization and Pruning
- Robust Quantization: One Model to Rule Them All
- Position-based Scaled Gradient for Model Quantization and Sparse Training
- FleXOR: Trainable Fractional Quantization
- Efficient Exact Verification of Binarized Neural Networks
- Self-Distillation Amplifies Regularization in Hilbert Space
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Self-Distillation as Instance-Specific Label Smoothing
- Ensemble Distillation for Robust Model Fusion in Federated Learning
- Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
- Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
- Directional Pruning of Deep Neural Networks
- Movement Pruning: Adaptive Sparsity by Fine-Tuning
-
2020-CVPR
- Regularizing Class-Wise Predictions via Self-Knowledge Distillation
- Explaining Knowledge Distillation by Quantifying the Knowledge
- Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion
- Multi-Dimensional Pruning: A Unified Framework for Model Compression
- Discrete Model Compression With Resource Constraint for Deep Neural Networks
- Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach
- Few Sample Knowledge Distillation for Efficient Network Compression
- Structured Multi-Hashing for Model Compression
- ZeroQ: A Novel Zero Shot Quantization Framework
- Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model
- HRank: Filter Pruning Using High-Rank Feature Map
- Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
- DMCP: Differentiable Markov Channel Pruning for Neural Networks
- Neural Network Pruning With Residual-Connections and Limited-Data
- GhostNet: More Features from Cheap Operations - noah/ghostnet)]
- AdderNet: Do We Really Need Multiplications in Deep Learning? - noah/AdderNet)]
- Online Knowledge Distillation via Collaborative Learning
- Low-rank Compression of Neural Nets: Learning the Rank of Each Layer
- Filter Grafting for Deep Neural Networks
- Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
- APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
- Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
- Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer
- The Knowledge Within: Methods for Data-Free Model Compression
- GAN Compression: Efficient Architectures for Interactive Conditional GANs - han-lab/gan-compression)]
- Towards Efficient Model Compression via Learned Global Ranking - enyac/LeGR)]
- Training Quantized Neural Networks With a Full-Precision Auxiliary Module
- Adaptive Loss-aware Quantization for Multi-bit Networks
- BiDet: An Efficient Binarized Object Detector
- Forward and Backward Information Retention for Accurate Binary Neural Networks
- Binarizing MobileNet via Evolution-Based Searching
- Collaborative Distillation for Ultra-Resolution Universal Style Transfer - Tse/Collaborative-Distillation)]
- Self-training with Noisy Student improves ImageNet classification - research/noisystudent)]
- Heterogeneous Knowledge Distillation Using Information Flow Modeling
- Revisiting Knowledge Distillation via Label Smoothing Regularization
- Distilling Knowledge From Graph Convolutional Networks
- MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images
- Distilling Cross-Task Knowledge via Relationship Matching
-
2020-ICML
- PENNI: Pruned Kernel Sharing for Efficient CNN Inference
- Operation-Aware Soft Channel Pruning using Differentiable Masks
- DropNet: Reducing Neural Network Complexity via Iterative Pruning
- Proving the Lottery Ticket Hypothesis: Pruning is All You Need
- Network Pruning by Greedy Subnetwork Selection
- AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks
- Adversarial Neural Pruning with Latent Vulnerability Suppression
- Feature-map-level Online Adversarial Knowledge Distillation
- Knowledge transfer with jacobian matching
- Good Subnetworks Provably Exist Pruning via Greedy Forward Selection
- Training Binary Neural Networks through Learning with Noisy Supervision
- Multi-Precision Policy Enforced Training (MuPPET) : A Precision-Switching Strategy for Quantised Fixed-Point Training of CNNs
- Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks
- Feature Quantization Improves GAN Training - GAN)]
- Towards Accurate Post-training Network Quantization via Bit-Split and Stitching
- Accelerating Large-Scale Inference with Anisotropic Vector Quantization
- Differentiable Product Quantization for Learning Compact Embedding Layers
- Up or Down? Adaptive Rounding for Post-Training Quantization
-
2020-ICLR
- Lookahead: A Far-sighted Alternative of Magnitude-based Pruning
- Dynamic Model Pruning with Feedback
- Provable Filter Pruning for Efficient Neural Networks
- Data-Independent Neural Pruning via Coresets
- FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
- Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks
- BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
- Neural Epitome Search for Architecture-Agnostic Network Compression
- One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
- DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures
- Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
- Scalable Model Compression by Entropy Penalized Reparameterization
- One-shot Pruning of Recurrent Neural Neworks by Jacobian Spectrum Evaluation
- Mixed Precision DNNs: All you need is a good parametrization
- Comparing Fine-tuning and Rewinding in Neural Network Pruning
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware
- AutoQ: Automated Kernel-Wise Neural Network Quantization
- Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks
- Learned Step Size Quantization
- Sampling-Free Learning of Bayesian Quantized Neural Networks
- Gradient $\ell_1$ Regularization for Quantization Robustness
- BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations - K1m/BinaryDuo)]
- Training binary neural networks with real-to-binary convolutions - martinez/real2binary)
- Critical initialisation in continuous approximations of binary neural networks
- Comparing Rewinding and Fine-tuning in Neural Network Pruning
- ProxSGD: Training Structured Neural Networks under Regularization and Constraints
-
2020-AAAI
-
-
2019
-
2019-CVPR
- HAQ: hardware-aware automated quantization
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration - y/filter-pruning-geometric-median)]
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- Importance Estimation for Neural Network Pruning
- HetConv Heterogeneous Kernel-Based Convolutions for Deep CNNs
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search - Order-Pruning)]
- Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
- MnasNet: Platform-Aware Neural Architecture Search for Mobile - PyTorch)]
- MFAS: Multimodal Fusion Architecture Search
- A Neurobiological Evaluation Metric for Neural Network Model Search
- Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
- Efficient Neural Network Compression - Kim/ENC)]
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure - SGD)]
- DSC: Dense-Sparse Convolution for Vectorized Inference of Convolutional Neural Networks
- DupNet: Towards Very Tiny Quantized CNN With Improved Accuracy for Face Detection
- ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
- Accelerating Convolutional Neural Networks via Activation Map Compression
- Compressing Convolutional Neural Networks via Factorized Convolutional Filters
- Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks
- Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
- MBS: Macroblock Scaling for CNN Model Reduction
- On Implicit Filter Level Sparsity in Convolutional Neural Networks
- Structured Pruning of Neural Networks With Budget-Aware Regularization
- Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization - siyuan-qiao/NeuralRejuvenation-CVPR19)]
- Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation
- Knowledge Distillation via Instance Relationship Graph
- Variational Information Distillation for Knowledge Transfer
- Learning Metrics from Teachers Compact Networks for Image Embedding
- 2019-CVPR-OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
-
2019-ICCV
- Rethinking ImageNet Pre-training
- Universally Slimmable Networks and Improved Training Techniques
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
- Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
- A Comprehensive Overhaul of Feature Distillation
- Similarity-Preserving Knowledge Distillation
- Correlation Congruence for Knowledge Distillation
- Data-Free Learning of Student Networks
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation - for-Lane-Detection)]
- Attention bridging network for knowledge transfer
- Accelerate CNN via Recursive Bayesian Pruning
- Adversarial Robustness vs Model Compression, or Both?
-
2019-NIPS
- Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
- Model Compression with Adversarial Robustness: A Unified Optimization Framework
- AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters
- Double Quantization for Communication-Efficient Distributed Optimization
- Focused Quantization for Sparse CNNs
- E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
- MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization
- Random Projections with Asymmetric Quantization
- Network Pruning via Transformable Architecture Search - X-Y/TAS)]
- Point-Voxel CNN for Efficient 3D Deep Learning - han-lab/pvcnn)]
- Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks
- A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
- Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations
- Post training 4-bit quantization of convolutional networks for rapid-deployment
- Zero-shot Knowledge Transfer via Adversarial Belief Matching
- [paper\
- [paper\
- [paper\
- [paper\ - research/einconv)
- [paper\
- [paper\
- [paper\
- [paper\
- [paper\
- [paper\
- 2019-NIPS-One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers
-
2019-ICML
- EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis - Pytorch)]
- Zero-Shot Knowledge Distillation in Deep Networks - iisc/ZSKD)]
- LegoNet: Efficient Convolutional Neural Networks with Lego Filters - yang/LegoNet_pytorch)]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- Collaborative Channel Pruning for Deep Networks
- Training CNNs with Selective Allocation of Channels
- NAS-Bench-101: Towards Reproducible Neural Architecture Search - research/nasbench)]
- Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
- 2019-ICML-Approximated Oracle Filter Pruning for Destructive CNN Width Optimization github
-
2019-ICLR
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Slimmable Neural Networks
- Defensive Quantization: When Efficiency Meets Robustness
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware - HAN-LAB/ProxylessNAS)]
- SNIP: Single-shot Network Pruning based on Connection Sensitivity
- Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
- Dynamic Channel Pruning: Feature Boosting and Suppression
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
- Dynamic Sparse Graph for Efficient Deep Learning
- Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
- Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
- Learning Recurrent Binary/Ternary Weights
- Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
- Relaxed Quantization for Discretized Neural Networks
- Integer Networks for Data Compression with Latent-Variable Models
- Analysis of Quantized Models
- DARTS: Differentiable Architecture Search
- Graph HyperNetworks for Neural Architecture Search
- Learnable Embedding Space for Efficient Neural Architecture Compression
- Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution
- SNAS: stochastic neural architecture search
- Integral Pruning on Activations and Weights for Efficient Neural Networks
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
-
-
2018
-
2018-CVPR
- MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks - research/morph-net)]
- NISP: Pruning Networks using Neuron Importance Score Propagation
- PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
- Shift: A zero flop, zero parameter alternative to spatial convolutions
- Interleaved structured sparse convolutional neural networks
- Blockdrop: Dynamic inference paths in residual networks
- Nestednet: Learning nested sparse structures in deep neural networks
- Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks
- Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks
- HydraNets: Specialized Dynamic Architectures for Efficient Inference
- SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
- Towards Effective Low-Bitwidth Convolutional Neural Networks
- Two-Step Quantization for Low-Bit Neural Networks
- "Learning-Compression" Algorithms for Neural Net Pruning
- NISP: Pruning Networks using Neuron Importance Score Propagation
-
2018-ECCV
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
- Coreset-Based Neural Network Compression
- Data-Driven Sparse Structure Selection for Deep Neural Networks - structure-selection)]
- Training Binary Weight Networks via Semi-Binary Decomposition
- Learning Compression from Limited Unlabeled Data
- Sparsely Aggregated Convolutional Networks
- Deep Expander Networks: Efficient Deep Networks from Graph Theory - Expander-Networks)]
- SparseNet-Sparsely Aggregated Convolutional Networks
- Ask, acquire, and attack: Data-free uap generation using class impressions
- Netadapt: Platform-aware neural network adaptation for mobile applications
- Clustering Convolutional Kernels to Compress Deep Neural Networks
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- Extreme Network Compression via Filter Group Approximation
- Convolutional Networks with Adaptive Inference Graphs
- SkipNet: Learning Dynamic Routing in Convolutional Networks
- Value-aware Quantization for Training and Inference of Neural Networks
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices
-
2018-NIPS
- Discrimination-aware Channel Pruning for Deep Neural Networks
- Training deep neural networks with 8-bit floating point numbers
- ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions
- DropBlock: A regularization method for convolutional networks
- Constructing fast network through deconstruction of convolution
- Learning Versatile Filters for Efficient Convolutional Neural Networks - Filters)]
- Moonshine: Distilling with cheap convolutions
- HitNet: hybrid ternary recurrent neural network
- FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
- Training DNNs with Hybrid Block Floating Point
- Reversible Recurrent Neural Networks
- Synaptic Strength For Convolutional Neural Network
- Learning sparse neural networks via sensitivity-driven regularization
- Multi-Task Zipping via Layer-wise Neuron Sharing
- A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
- Gradient Sparsification for Communication-Efficient Distributed Optimization
- GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
- ATOMO: Communication-efficient Learning via Atomic Sparsification
- Norm matters: efficient and accurate normalization schemes in deep networks
- Sparsified SGD with memory
- Pelee: A Real-Time Object Detection System on Mobile Devices
- Scalable methods for 8-bit training of neural networks
- TETRIS: TilE-matching the TRemendous Irregular Sparsity
- Multiple instance learning for efficient sequential data classification on resource-constrained devices
- Pruning neural networks: is it time to nip it in the bud?
- Rethinking the Value of Network Pruning
- Structured Pruning for Efficient ConvNets via Incremental Regularization - Tse/Caffe_IncReg)]
- Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling
- Learning Sparse Networks Using Targeted Dropout - ai/TD)]
-
2018-ICML
- Compressing Neural Networks using the Variational Information Bottleneck
- DCFNet: Deep Neural Network with Decomposed Convolutional Filters
- Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
- Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
- High Performance Zero-Memory Overhead Direct Convolutions
- Kronecker Recurrent Units
- Weightless: Lossy weight encoding for deep neural network compression
- StrassenNets: Deep learning with a multiplication budget
- Learning Compact Neural Networks with Regularization
- WSNet: Compact and Efficient Networks Through Weight Sampling
- Gradually Updated Neural Networks for Large-Scale Image Recognition - siyuan-qiao/GUNN)]
-
2018-ICLR
- Training and Inference with Integers in Deep Neural Networks
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
- N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning
- Model compression via distillation and quantization
- Towards Image Understanding from Deep Compression Without Decoding
- Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
- Mixed Precision Training of Convolutional Neural Networks using Integer Operations
- Mixed Precision Training
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
- Loss-aware Weight Quantization of Deep Networks
- Alternating Multi-bit Quantization for Recurrent Neural Networks
- Adaptive Quantization of Neural Networks
- Variational Network Quantization
- Espresso: Efficient Forward Propagation for Binary Deep Neural Networks
- Learning to share: Simultaneous parameter tying and sparsification in deep learning
- Learning Sparse Neural Networks through L0 Regularization
- WRPN: Wide Reduced-Precision Networks
- Deep rewiring: Training very sparse deep networks
- Efficient sparse-winograd convolutional neural networks - Winograd-CNN)]
- Learning Intrinsic Sparse Structures within Long Short-term Memory
- Multi-scale dense networks for resource efficient image classification
- Compressing Word Embedding via Deep Compositional Code Learning
- Learning Discrete Weights Using the Local Reparameterization Trick
- Training wide residual networks for deployment using a single bit for each weight
- The High-Dimensional Geometry of Binary Neural Networks
- To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression - NIPSw-nip in the bud](https://openreview.net/forum?id=r1lbgwFj5m), [2018-NIPSw-rethink](https://openreview.net/forum?id=r1eLk2mKiX))
- Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
- Weightless: Lossy weight encoding for deep neural network compression
- Variance-based Gradient Compression for Efficient Distributed Deep Learning
- Stacked Filters Stationary Flow For Hardware-Oriented Acceleration Of Deep Convolutional Neural Networks
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks
- Accelerating Neural Architecture Search using Performance Prediction
- Nonlinear Acceleration of CNNs
- Attention-Based Guided Structured Sparsity of Deep Neural Networks - guided-sparsity)]
- Learning Intrinsic Sparse Structures within Long Short-term Memory
-