An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with model-compression

A curated list of projects in awesome lists tagged with model-compression .

https://github.com/huawei-noah/pretrained-language-model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation large-scale-distributed model-compression pretrained-models quantization

Last synced: 14 May 2025

https://github.com/huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation large-scale-distributed model-compression pretrained-models quantization

Last synced: 16 Mar 2025

https://github.com/tencent/pocketflow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

automl computer-vision deep-learning mobile-app model-compression

Last synced: 14 Apr 2025

https://github.com/Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

automl computer-vision deep-learning mobile-app model-compression

Last synced: 20 Mar 2025

https://github.com/666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

batch-normalization-fuse bnn convolutional-networks dorefa group-convolution integer-arithmetic-only model-compression network-in-network network-slimming neuromorphic-computing onnx post-training-quantization pruning pytorch quantization quantization-aware-training tensorrt tensorrt-int8-python twn xnor-net

Last synced: 20 Mar 2025

https://github.com/haitongli/knowledge-distillation-pytorch

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

cifar10 computer-vision dark-knowledge deep-neural-networks knowledge-distillation model-compression pytorch

Last synced: 15 May 2025

https://github.com/tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

compression deep-learning keras machine-learning ml model-compression optimization pruning quantization quantized-networks quantized-neural-networks quantized-training sparsity tensorflow

Last synced: 12 May 2025

https://github.com/MingSun-Tse/EfficientDNNs

Collection of recent methods on (deep) neural network compression and acceleration.

deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning

Last synced: 27 Apr 2025

https://github.com/MingSun-Tse/Efficient-Deep-Learning

Collection of recent methods on (deep) neural network compression and acceleration.

deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning

Last synced: 16 Mar 2025

https://github.com/horseee/deepcache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

diffusion-models efficient-inference model-compression stable-diffusion training-free

Last synced: 15 May 2025

https://github.com/alibaba/tinyneuralnetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

deep-learning deep-neural-networks model-compression model-converter post-training-quantization pruning pytorch quantization-aware-training

Last synced: 14 Oct 2025

https://github.com/horseee/DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

diffusion-models efficient-inference model-compression stable-diffusion training-free

Last synced: 25 Aug 2025

https://github.com/cnkuangshi/LightCTR

Lightweight and Scalable framework that combines mainstream algorithms of Click-Through-Rate prediction based computational DAG, philosophy of Parameter Server and Ring-AllReduce collective communication.

computational-graphs deep-learning distributed-systems factorization-machines machine-learning model-compression parameter-server

Last synced: 15 Mar 2025

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 16 May 2025

https://github.com/he-y/filter-pruning-geometric-median

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)

model-compression pruning pytorch

Last synced: 04 Apr 2025

https://github.com/microsoft/archai

Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.

automated-machine-learning automl darts deep-learning hyperparameter-optimization machine-learning model-compression nas neural-architecture-search petridish python pytorch

Last synced: 16 May 2025

https://github.com/mit-han-lab/amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

automl automl-for-compression channel-pruning efficient-model model-compression on-device-ai

Last synced: 13 May 2025

https://github.com/pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

gpt-j interpretability laser llama2 llm llms model-compression transformers

Last synced: 17 Nov 2025

https://github.com/he-y/soft-filter-pruning

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

model-compression pruning pytorch

Last synced: 06 Apr 2025

https://github.com/JetRunner/BERT-of-Theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

bert glue model-compression nlp transformers

Last synced: 02 Apr 2025

https://github.com/czg1225/SlimSAM

[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim

knowledge-distillation model-compression model-pruning segment-anything-model

Last synced: 24 Jul 2025

https://github.com/Sharpiless/Yolov5-distillation-train-inference

Yolov5 distillation training | Yolov5知识蒸馏训练,支持训练自己的数据

distillation konwledge-distillation model-compression object-detection yolov5

Last synced: 20 Apr 2025

https://github.com/princeton-nlp/cofipruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

bert model-compression nlp pruning

Last synced: 27 Apr 2025

https://github.com/mit-han-lab/amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

automl efficient-model model-compression on-device-ai

Last synced: 13 May 2025

https://github.com/liyuanlucasliu/ld-net

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

contextualized-representation language-model model-compression named-entity-recognition ner pytorch sequence-labeling

Last synced: 15 Apr 2025

https://github.com/NVlabs/condensa

Programmable Neural Network Compression

deep-neural-networks model-compression model-pruning

Last synced: 03 May 2025

https://github.com/thu-nics/MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

large-language-models model-compression sparse-attention

Last synced: 19 Jul 2025

https://github.com/vita-group/svite

[NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

dynamic-sparsity efficient-transformers model-compression pruning sparse-training token-slimming vision-transformers

Last synced: 19 Apr 2025

https://github.com/microsoft/moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

inference-efficiency model-compression neural-architecture-search token-pruning

Last synced: 07 Apr 2025

https://github.com/iamhankai/Versatile-Filters

Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)

convolutional-neural-networks model-compression

Last synced: 18 Nov 2025

https://github.com/iamhankai/versatile-filters

Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)

convolutional-neural-networks model-compression

Last synced: 25 Mar 2025

https://github.com/bloomberg/minilmv2.bb

Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)

distillation language-model model-compression model-distillation python pytorch transformers

Last synced: 07 May 2025

https://github.com/CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

deep-learning large-language-models machine-learning mixture-of-experts model-compression natural-language-processing

Last synced: 11 May 2025

https://github.com/vita-group/atmc

[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”

model-compression pruning quantization robustness unified-optimization-framework

Last synced: 07 Aug 2025

https://github.com/asahi417/lm-vocab-trimmer

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.

bert gpt language-model model-compression nlp t5

Last synced: 17 Jul 2025

https://github.com/esceptico/squeezer

Lightweight knowledge distillation pipeline

distillation knowledge-distillation model-compression pytorch

Last synced: 22 Apr 2025

https://github.com/vita-group/prac-lth

[ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang

coreset efficiency lottery-tickets-hypothesis model-compression pruning-aware-critical-set winning-tickets

Last synced: 11 Sep 2025

https://github.com/frankaging/causal-distill

The Codebase for Causal Distillation for Language Models (NAACL '22)

bert-model distilbert language-model model-compression model-distillation

Last synced: 29 Apr 2025

https://github.com/huangcongqing/model-compression-optimization

model compression and optimization for deployment for Pytorch, including knowledge distillation, quantization and pruning.(知识蒸馏,量化,剪枝)

knowledge-distillation model-compression nas pruning pytorch quantization quantized-networks sparsity sparsity-optimization

Last synced: 05 May 2025

https://github.com/linkedin/quantease

QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.

generative-ai large-language-models model-compression publication-code quantization

Last synced: 17 Aug 2025

https://github.com/iamhankai/full-stack-filters

Pytorch code for paper: Full-Stack Filters to Build Minimum Viable CNNs

convolutional-neural-networks model-compression

Last synced: 27 Jul 2025

https://github.com/changwoolee/blast

[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

efficient-inference large-language-models llama matrix-factorization matrix-multiplication model-compression

Last synced: 11 Apr 2025

https://github.com/hkuds/lightgnn

[WSDM'25] "LightGNN: Simple Graph Neural Network for Recommendation"

graph-learning graph-neural-networks knowledge-distillation model-compression recommendation

Last synced: 04 Jul 2025

https://github.com/stonesjtu/basis-embedding

basis embedding: a product quantization based model compression method for language models.

language-model model-compression product-quantization pytorch

Last synced: 24 Jun 2025

https://github.com/anishacharya/online-embedding-compression-aaai-2019

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.

compression deep-learning low-rank-approximation low-rank-representaion matrix-decompositions model-compression tensorflow text-classification word-embedding word-embeddings

Last synced: 04 Mar 2026

https://github.com/labouteille/torchprune

Deep learning compression framework in Pytorch [WIP]

deep-learning knowledge-distillation model-compression pruning quantization

Last synced: 03 Feb 2026

https://github.com/shuai-xie/bisenet-compression

10 variants of original BiSeNet with performance comparison, the faster, the better.

model-compression resnet sematic-segmentation

Last synced: 28 Feb 2025

https://github.com/msadeqsirjani/adaptive_edge_ai

Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.

deep-learning edge-ai knowledge-distillation model-compression onnx-optimization pytorch

Last synced: 30 Oct 2025

https://github.com/memgonzales/mirror-segmentation

Presented at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023). Lightweight mirror segmentation CNN that uses an EfficientNet backbone, employs parallel convolutional layers to capture edge features, and applies filter pruning for model compression

cnn cnn-compression computer-vision convolutional-neural-networks deep-learning efficientnet model-compression model-pruning object-detection object-segmentation pruning pytorch segmentation

Last synced: 25 Dec 2025

https://github.com/mohd-faizy/hyperparameter-tuning-with-microsoft-network-intelligence-toolkit-nni

Hyperparameter Tuning with Microsoft NNI to automated machine learning (AutoML) experiments. The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.

automl feature-engineering hyperparameter-optimization hyperparameter-tuning microsoft-nni model-compression neural-architecture-search neural-network-intelligence nni nnictl

Last synced: 23 Jul 2025

https://github.com/r-papso/torch-optimizer

PyTorch models optimization by neural network pruning

deep-learning model-compression neural-network-pruning optimization pruning pytorch

Last synced: 07 Oct 2025

https://github.com/18520339/unstructured-local-search-pruning

Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance

artificial-intelligence genetic-algorithm local-search-algoirthms model-compression neural-network simulated-annealing unstructured-pruning

Last synced: 15 Jul 2025

https://github.com/elphinkuo/distiller

The original experiments code for AAAI 2020 paper, "AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates"

auto-ml computer-vision deep-learning deep-reinforcement-learning model-compression model-pruning

Last synced: 17 Jul 2025

https://github.com/blue-no1/quantization-experiments

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.

inference llm model-compression quantization

Last synced: 22 Nov 2025

https://github.com/ksm26/quantization-fundamentals-with-hugging-face

Learn linear quantization techniques using the Quanto library and downcasting methods with the Transformers library to compress and optimize generative AI models effectively.

compression downcasting generative-ai hugging-face linear-quantization model-compression model-deployment model-optimization optimize quantization quantization-fundamentals quanto-library transformers-library

Last synced: 28 Mar 2025

https://github.com/rakutentech/iterative_training

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

deep-learning machine-learning model-compression neural-network

Last synced: 04 Jul 2025