Projects in Awesome Lists tagged with model-compression
A curated list of projects in awesome lists tagged with model-compression .
https://github.com/microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
automated-machine-learning automl bayesian-optimization data-science deep-learning deep-neural-network distributed feature-engineering hyperparameter-optimization hyperparameter-tuning machine-learning machine-learning-algorithms mlops model-compression nas neural-architecture-search neural-network python pytorch tensorflow
Last synced: 05 Oct 2025
https://github.com/Microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
automated-machine-learning automl bayesian-optimization data-science deep-learning deep-neural-network distributed feature-engineering hyperparameter-optimization hyperparameter-tuning machine-learning machine-learning-algorithms mlops model-compression nas neural-architecture-search neural-network python pytorch tensorflow
Last synced: 18 Apr 2025
https://github.com/huawei-noah/efficient-ai-backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer
Last synced: 13 May 2025
https://github.com/huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer
Last synced: 20 Mar 2025
https://github.com/huawei-noah/pretrained-language-model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
knowledge-distillation large-scale-distributed model-compression pretrained-models quantization
Last synced: 14 May 2025
https://github.com/huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
knowledge-distillation large-scale-distributed model-compression pretrained-models quantization
Last synced: 16 Mar 2025
https://github.com/vainf/torch-pruning
[CVPR 2023] DepGraph: Towards Any Structural Pruning
channel-pruning cvpr2023 depgraph efficient-deep-learning model-compression network-pruning pruning structural-pruning structured-pruning
Last synced: 12 May 2025
https://github.com/VainF/Torch-Pruning
[CVPR 2023] DepGraph: Towards Any Structural Pruning
channel-pruning cvpr2023 depgraph efficient-deep-learning model-compression network-pruning pruning structural-pruning structured-pruning
Last synced: 20 Mar 2025
https://github.com/tencent/pocketflow
An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
automl computer-vision deep-learning mobile-app model-compression
Last synced: 14 Apr 2025
https://github.com/Tencent/PocketFlow
An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
automl computer-vision deep-learning mobile-app model-compression
Last synced: 20 Mar 2025
https://github.com/666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
batch-normalization-fuse bnn convolutional-networks dorefa group-convolution integer-arithmetic-only model-compression network-in-network network-slimming neuromorphic-computing onnx post-training-quantization pruning pytorch quantization quantization-aware-training tensorrt tensorrt-int8-python twn xnor-net
Last synced: 20 Mar 2025
https://github.com/haitongli/knowledge-distillation-pytorch
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
cifar10 computer-vision dark-knowledge deep-neural-networks knowledge-distillation model-compression pytorch
Last synced: 15 May 2025
https://github.com/tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
compression deep-learning keras machine-learning ml model-compression optimization pruning quantization quantized-networks quantized-neural-networks quantized-training sparsity tensorflow
Last synced: 12 May 2025
https://github.com/microsoft/NeuronBlocks
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
artificial-intelligence deep-learning dnn knowledge-distillation model-compression natural-language-processing pytorch qna question-answering sequence-labeling text-classification text-matching
Last synced: 07 Apr 2025
https://github.com/huawei-noah/efficient-computing
Efficient computing methods developed by Huawei Noah's Ark Lab
binary-neural-networks knowledge-distillation model-compression pruning quantization self-supervised
Last synced: 14 May 2025
https://github.com/huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
binary-neural-networks knowledge-distillation model-compression pruning quantization self-supervised
Last synced: 20 Mar 2025
https://github.com/ethanhe42/channel-pruning
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
acceleration channel-pruning deep-neural-networks image-classification image-recognition model-compression object-detection
Last synced: 16 May 2025
https://github.com/MingSun-Tse/EfficientDNNs
Collection of recent methods on (deep) neural network compression and acceleration.
deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning
Last synced: 27 Apr 2025
https://github.com/MingSun-Tse/Efficient-Deep-Learning
Collection of recent methods on (deep) neural network compression and acceleration.
deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning
Last synced: 16 Mar 2025
https://github.com/horseee/deepcache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
diffusion-models efficient-inference model-compression stable-diffusion training-free
Last synced: 15 May 2025
https://github.com/alibaba/tinyneuralnetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
deep-learning deep-neural-networks model-compression model-converter post-training-quantization pruning pytorch quantization-aware-training
Last synced: 14 Oct 2025
https://github.com/lhyfst/knowledge-distillation-papers
knowledge distillation papers
dark-knowledge knowledge-distillation model-compression paper reading-list
Last synced: 05 May 2025
https://github.com/horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
diffusion-models efficient-inference model-compression stable-diffusion training-free
Last synced: 25 Aug 2025
https://github.com/squeezeailab/squeezellm
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
efficient-inference large-language-models llama llm localllm model-compression natural-language-processing post-training-quantization quantization small-models text-generation transformer
Last synced: 13 Apr 2025
https://github.com/cnkuangshi/LightCTR
Lightweight and Scalable framework that combines mainstream algorithms of Click-Through-Rate prediction based computational DAG, philosophy of Parameter Server and Ring-AllReduce collective communication.
computational-graphs deep-learning distributed-systems factorization-machines machine-learning model-compression parameter-server
Last synced: 15 Mar 2025
https://github.com/sforaidl/kd_lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization
Last synced: 16 May 2025
https://github.com/he-y/filter-pruning-geometric-median
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
model-compression pruning pytorch
Last synced: 04 Apr 2025
https://github.com/microsoft/archai
Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.
automated-machine-learning automl darts deep-learning hyperparameter-optimization machine-learning model-compression nas neural-architecture-search petridish python pytorch
Last synced: 16 May 2025
https://github.com/mit-han-lab/amc
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
automl automl-for-compression channel-pruning efficient-model model-compression on-device-ai
Last synced: 13 May 2025
https://github.com/pratyushasharma/laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
gpt-j interpretability laser llama2 llm llms model-compression transformers
Last synced: 17 Nov 2025
https://github.com/he-y/soft-filter-pruning
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
model-compression pruning pytorch
Last synced: 06 Apr 2025
https://github.com/xiuyu-li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
ddim diffusion-models model-compression post-training-quantization pytorch quantization stable-diffusion
Last synced: 06 Apr 2025
https://github.com/SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer
Last synced: 08 May 2025
https://github.com/JetRunner/BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
bert glue model-compression nlp transformers
Last synced: 02 Apr 2025
https://github.com/czg1225/SlimSAM
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
knowledge-distillation model-compression model-pruning segment-anything-model
Last synced: 24 Jul 2025
https://github.com/squeezeailab/kvquant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer
Last synced: 07 Apr 2025
https://github.com/vinhkhuc/JFastText
Java interface for fastText
java jni machine-learning model-compression nlp text-classification word-embeddings
Last synced: 05 Jan 2026
https://github.com/kssteven418/i-bert
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
bert efficient-model efficient-neural-networks model-compression natural-language-processing quantization transformer
Last synced: 06 Apr 2025
https://github.com/picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llm llm-inference llms mistral mixtral model-compression natural-language-processing quantization self-hosted
Last synced: 23 Oct 2025
https://github.com/Sharpiless/Yolov5-distillation-train-inference
Yolov5 distillation training | Yolov5知识蒸馏训练,支持训练自己的数据
distillation konwledge-distillation model-compression object-detection yolov5
Last synced: 20 Apr 2025
https://github.com/princeton-nlp/cofipruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
bert model-compression nlp pruning
Last synced: 27 Apr 2025
https://github.com/Peterisfar/YOLOV3
yolov3 by pytorch
mobilenetv2 model-compression object-detection pytorch voc yolov3
Last synced: 20 Apr 2025
https://github.com/vainf/diff-pruning
[NeurIPS 2023] Structural Pruning for Diffusion Models
diffusion-models efficient-deep-learning model-compression network-pruning pytorch structured-pruning
Last synced: 30 Jun 2025
https://github.com/mit-han-lab/amc-models
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
automl efficient-model model-compression on-device-ai
Last synced: 13 May 2025
https://github.com/DwangoMediaVillage/keras_compressor
Model Compression CLI Tool for Keras.
deep-learning keras machine-learning model-compression
Last synced: 22 Jul 2025
https://github.com/liyuanlucasliu/ld-net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
contextualized-representation language-model model-compression named-entity-recognition ner pytorch sequence-labeling
Last synced: 15 Apr 2025
https://github.com/NVlabs/condensa
Programmable Neural Network Compression
deep-neural-networks model-compression model-pruning
Last synced: 03 May 2025
https://github.com/VainF/Diff-Pruning
[NeurIPS 2023] Structural Pruning for Diffusion Models
diffusion-models efficient-deep-learning model-compression network-pruning pytorch structured-pruning
Last synced: 13 Mar 2025
https://github.com/jim-schwoebel/allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
autokeras automl autopytorch data-augmentation data-cleaning data-cleaning-pipeline data-transformation data-visualization datasets deep-learning ludwig machine-learning machine-learning-api machine-learning-library machine-learning-models model-compression model-deployment tpot voice-computing
Last synced: 21 Aug 2025
https://datawhalechina.github.io/awesome-compression/
模型压缩的小白入门教程
compression kd knowledge-distillation model-compression model-pruning model-quantization neural-architecture-search prune quantization tinyml
Last synced: 24 Sep 2025
https://github.com/kssteven418/ltp
[KDD'22] Learned Token Pruning for Transformers
bert efficient-model efficient-neural-networks model-compression natural-language-processing pruning transformer
Last synced: 03 Sep 2025
https://github.com/thu-nics/MoA
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
large-language-models model-compression sparse-attention
Last synced: 19 Jul 2025
https://github.com/archsyscall/aquvitae
Knowledge Distillation Toolkit
deep-learning knowledge-distillation light-weight machine-learning model-compression pytorch tensorflow
Last synced: 27 Jul 2025
https://github.com/vita-group/svite
[NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang
dynamic-sparsity efficient-transformers model-compression pruning sparse-training token-slimming vision-transformers
Last synced: 19 Apr 2025
https://github.com/microsoft/moonlit
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
inference-efficiency model-compression neural-architecture-search token-pruning
Last synced: 07 Apr 2025
https://github.com/iamhankai/Versatile-Filters
Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)
convolutional-neural-networks model-compression
Last synced: 18 Nov 2025
https://github.com/iamhankai/versatile-filters
Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)
convolutional-neural-networks model-compression
Last synced: 25 Mar 2025
https://github.com/onnx/neural-compressor
Model compression for ONNX
deep-learning model-compression model-pruning onnx onnxruntime quantization
Last synced: 29 Jul 2025
https://github.com/musco-ai/musco-pytorch
MUSCO: MUlti-Stage COmpression of neural networks
cp-decomposition deep-neural-networks low-rank model-acceleration model-compression network-acceleration network-compression pytorch tensor-decomposition truncated-svd tucker vbmf
Last synced: 15 Apr 2025
https://github.com/bloomberg/minilmv2.bb
Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
distillation language-model model-compression model-distillation python pytorch transformers
Last synced: 07 May 2025
https://github.com/CASE-Lab-UMD/Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
deep-learning large-language-models machine-learning mixture-of-experts model-compression natural-language-processing
Last synced: 11 May 2025
https://github.com/kxytechnologies/kxy-python
A toolkit to boost the productivity of machine learning engineers.
feature-engineering feature-selection information-theory machine-learning machine-learning-library model-compression python
Last synced: 30 Oct 2025
https://github.com/vita-group/atmc
[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”
model-compression pruning quantization robustness unified-optimization-framework
Last synced: 07 Aug 2025
https://github.com/asahi417/lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.
bert gpt language-model model-compression nlp t5
Last synced: 17 Jul 2025
https://github.com/kssteven418/q-asr
[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition
automatic-speech-recognition deep-learning efficient-model efficient-neural-networks jasper model-compression quantization quartznet speech speech-recognition
Last synced: 31 Jul 2025
https://github.com/esceptico/squeezer
Lightweight knowledge distillation pipeline
distillation knowledge-distillation model-compression pytorch
Last synced: 22 Apr 2025
https://github.com/vita-group/prac-lth
[ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang
coreset efficiency lottery-tickets-hypothesis model-compression pruning-aware-critical-set winning-tickets
Last synced: 11 Sep 2025
https://github.com/frankaging/causal-distill
The Codebase for Causal Distillation for Language Models (NAACL '22)
bert-model distilbert language-model model-compression model-distillation
Last synced: 29 Apr 2025
https://github.com/z7zuqer/model-compression-and-acceleration-4-dnn
model-compression-and-acceleration-4-DNN
decomposition distillation model-compression pruning quantization
Last synced: 04 Jan 2026
https://github.com/huangcongqing/model-compression-optimization
model compression and optimization for deployment for Pytorch, including knowledge distillation, quantization and pruning.(知识蒸馏,量化,剪枝)
knowledge-distillation model-compression nas pruning pytorch quantization quantized-networks sparsity sparsity-optimization
Last synced: 05 May 2025
https://github.com/linkedin/quantease
QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.
generative-ai large-language-models model-compression publication-code quantization
Last synced: 17 Aug 2025
https://github.com/iamhankai/full-stack-filters
Pytorch code for paper: Full-Stack Filters to Build Minimum Viable CNNs
convolutional-neural-networks model-compression
Last synced: 27 Jul 2025
https://github.com/changwoolee/blast
[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
efficient-inference large-language-models llama matrix-factorization matrix-multiplication model-compression
Last synced: 11 Apr 2025
https://github.com/densechen/eve-mli
eve-mli: making learning interesting
deep-learning deep-reinforcement-learning eve-mli model-compression network-architecture pruning pypi pytorch quantization-efficient-network reinforcement-learning spiking-neural-networks
Last synced: 16 Jan 2026
https://github.com/hkuds/lightgnn
[WSDM'25] "LightGNN: Simple Graph Neural Network for Recommendation"
graph-learning graph-neural-networks knowledge-distillation model-compression recommendation
Last synced: 04 Jul 2025
https://github.com/stonesjtu/basis-embedding
basis embedding: a product quantization based model compression method for language models.
language-model model-compression product-quantization pytorch
Last synced: 24 Jun 2025
https://github.com/anishacharya/online-embedding-compression-aaai-2019
Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.
compression deep-learning low-rank-approximation low-rank-representaion matrix-decompositions model-compression tensorflow text-classification word-embedding word-embeddings
Last synced: 04 Mar 2026
https://github.com/labouteille/torchprune
Deep learning compression framework in Pytorch [WIP]
deep-learning knowledge-distillation model-compression pruning quantization
Last synced: 03 Feb 2026
https://github.com/shuai-xie/bisenet-compression
10 variants of original BiSeNet with performance comparison, the faster, the better.
model-compression resnet sematic-segmentation
Last synced: 28 Feb 2025
https://github.com/m-pektas/bfas
Brute Force Architecture Search
deep-learning deep-neural-networks hyperparameter-optimization mlops model-comparison model-compression nas neural-architecture-search python pytorch random-search
Last synced: 08 Oct 2025
https://github.com/msadeqsirjani/adaptive_edge_ai
Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.
deep-learning edge-ai knowledge-distillation model-compression onnx-optimization pytorch
Last synced: 30 Oct 2025
https://github.com/memgonzales/mirror-segmentation
Presented at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023). Lightweight mirror segmentation CNN that uses an EfficientNet backbone, employs parallel convolutional layers to capture edge features, and applies filter pruning for model compression
cnn cnn-compression computer-vision convolutional-neural-networks deep-learning efficientnet model-compression model-pruning object-detection object-segmentation pruning pytorch segmentation
Last synced: 25 Dec 2025
https://github.com/mohd-faizy/hyperparameter-tuning-with-microsoft-network-intelligence-toolkit-nni
Hyperparameter Tuning with Microsoft NNI to automated machine learning (AutoML) experiments. The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
automl feature-engineering hyperparameter-optimization hyperparameter-tuning microsoft-nni model-compression neural-architecture-search neural-network-intelligence nni nnictl
Last synced: 23 Jul 2025
https://github.com/chouaib-629/quantileregression
Quantile regression for delivery time and some scenarios.
data-science data-visualization delivery-time jupyter-notebook linear-regression machine-learning matplotlib model-compression numpy pandas predictive-modeling python python3 quantile-regression regression-models scikit-learn scipy statistical-analysis statsmodels
Last synced: 11 Feb 2026
https://github.com/r-papso/torch-optimizer
PyTorch models optimization by neural network pruning
deep-learning model-compression neural-network-pruning optimization pruning pytorch
Last synced: 07 Oct 2025
https://github.com/18520339/unstructured-local-search-pruning
Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance
artificial-intelligence genetic-algorithm local-search-algoirthms model-compression neural-network simulated-annealing unstructured-pruning
Last synced: 15 Jul 2025
https://github.com/elphinkuo/distiller
The original experiments code for AAAI 2020 paper, "AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates"
auto-ml computer-vision deep-learning deep-reinforcement-learning model-compression model-pruning
Last synced: 17 Jul 2025
https://github.com/blue-no1/quantization-experiments
Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.
inference llm model-compression quantization
Last synced: 22 Nov 2025
https://github.com/ksm26/quantization-fundamentals-with-hugging-face
Learn linear quantization techniques using the Quanto library and downcasting methods with the Transformers library to compress and optimize generative AI models effectively.
compression downcasting generative-ai hugging-face linear-quantization model-compression model-deployment model-optimization optimize quantization quantization-fundamentals quanto-library transformers-library
Last synced: 28 Mar 2025
https://github.com/rakutentech/iterative_training
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization
deep-learning machine-learning model-compression neural-network
Last synced: 04 Jul 2025
https://github.com/jaketae/nn-svd
Neural network compression with SVD
model-compression neural-network neural-network-compression singular-value-decomposition svd
Last synced: 23 Mar 2025