Projects in Awesome Lists tagged with model-compression

https://github.com/microsoft/nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

automated-machine-learning automl bayesian-optimization data-science deep-learning deep-neural-network distributed feature-engineering hyperparameter-optimization hyperparameter-tuning machine-learning machine-learning-algorithms mlops model-compression nas neural-architecture-search neural-network python pytorch tensorflow

Last synced: 05 Oct 2025

https://github.com/Microsoft/nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

automated-machine-learning automl bayesian-optimization data-science deep-learning deep-neural-network distributed feature-engineering hyperparameter-optimization hyperparameter-tuning machine-learning machine-learning-algorithms mlops model-compression nas neural-architecture-search neural-network python pytorch tensorflow

Last synced: 18 Apr 2025

https://github.com/huawei-noah/efficient-ai-backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer

Last synced: 13 May 2025

https://github.com/huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer

Last synced: 20 Mar 2025

https://github.com/huawei-noah/pretrained-language-model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation large-scale-distributed model-compression pretrained-models quantization

Last synced: 14 May 2025

https://github.com/huawei-noah/Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation large-scale-distributed model-compression pretrained-models quantization

Last synced: 16 Mar 2025

https://github.com/vainf/torch-pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning

channel-pruning cvpr2023 depgraph efficient-deep-learning model-compression network-pruning pruning structural-pruning structured-pruning

Last synced: 12 May 2025

https://github.com/VainF/Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning

channel-pruning cvpr2023 depgraph efficient-deep-learning model-compression network-pruning pruning structural-pruning structured-pruning

Last synced: 20 Mar 2025

https://github.com/tencent/pocketflow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

automl computer-vision deep-learning mobile-app model-compression

Last synced: 14 Apr 2025

https://github.com/Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

automl computer-vision deep-learning mobile-app model-compression

Last synced: 20 Mar 2025

https://github.com/666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

batch-normalization-fuse bnn convolutional-networks dorefa group-convolution integer-arithmetic-only model-compression network-in-network network-slimming neuromorphic-computing onnx post-training-quantization pruning pytorch quantization quantization-aware-training tensorrt tensorrt-int8-python twn xnor-net

Last synced: 20 Mar 2025

https://github.com/haitongli/knowledge-distillation-pytorch

A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

cifar10 computer-vision dark-knowledge deep-neural-networks knowledge-distillation model-compression pytorch

Last synced: 15 May 2025

https://github.com/tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

compression deep-learning keras machine-learning ml model-compression optimization pruning quantization quantized-networks quantized-neural-networks quantized-training sparsity tensorflow

Last synced: 12 May 2025

https://github.com/microsoft/NeuronBlocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

artificial-intelligence deep-learning dnn knowledge-distillation model-compression natural-language-processing pytorch qna question-answering sequence-labeling text-classification text-matching

Last synced: 07 Apr 2025

https://github.com/huawei-noah/efficient-computing

Efficient computing methods developed by Huawei Noah's Ark Lab

binary-neural-networks knowledge-distillation model-compression pruning quantization self-supervised

Last synced: 14 May 2025

https://github.com/huawei-noah/Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

binary-neural-networks knowledge-distillation model-compression pruning quantization self-supervised

Last synced: 20 Mar 2025

https://github.com/ethanhe42/channel-pruning

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

acceleration channel-pruning deep-neural-networks image-classification image-recognition model-compression object-detection

Last synced: 16 May 2025

https://github.com/MingSun-Tse/EfficientDNNs

Collection of recent methods on (deep) neural network compression and acceleration.

deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning

Last synced: 27 Apr 2025

https://github.com/MingSun-Tse/Efficient-Deep-Learning

Collection of recent methods on (deep) neural network compression and acceleration.

deep-learning deep-neural-networks efficient-deep-learning knowledge-distillation model-compression network-pruning

Last synced: 16 Mar 2025

https://github.com/horseee/deepcache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

diffusion-models efficient-inference model-compression stable-diffusion training-free

Last synced: 15 May 2025

https://github.com/alibaba/tinyneuralnetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

deep-learning deep-neural-networks model-compression model-converter post-training-quantization pruning pytorch quantization-aware-training

Last synced: 14 Oct 2025

https://github.com/lhyfst/knowledge-distillation-papers

knowledge distillation papers

dark-knowledge knowledge-distillation model-compression paper reading-list

Last synced: 05 May 2025

https://github.com/horseee/DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

diffusion-models efficient-inference model-compression stable-diffusion training-free

Last synced: 25 Aug 2025

https://github.com/squeezeailab/squeezellm

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

efficient-inference large-language-models llama llm localllm model-compression natural-language-processing post-training-quantization quantization small-models text-generation transformer

Last synced: 13 Apr 2025

https://github.com/cnkuangshi/LightCTR

Lightweight and Scalable framework that combines mainstream algorithms of Click-Through-Rate prediction based computational DAG, philosophy of Parameter Server and Ring-AllReduce collective communication.

computational-graphs deep-learning distributed-systems factorization-machines machine-learning model-compression parameter-server

Last synced: 15 Mar 2025

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 16 May 2025

https://github.com/he-y/filter-pruning-geometric-median

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)

model-compression pruning pytorch

Last synced: 04 Apr 2025

https://github.com/microsoft/archai

Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.

automated-machine-learning automl darts deep-learning hyperparameter-optimization machine-learning model-compression nas neural-architecture-search petridish python pytorch

Last synced: 16 May 2025

https://github.com/mit-han-lab/amc

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

automl automl-for-compression channel-pruning efficient-model model-compression on-device-ai

Last synced: 13 May 2025

https://github.com/pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

gpt-j interpretability laser llama2 llm llms model-compression transformers

Last synced: 17 Nov 2025

https://github.com/he-y/soft-filter-pruning

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

model-compression pruning pytorch

Last synced: 06 Apr 2025

https://github.com/xiuyu-li/q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

ddim diffusion-models model-compression post-training-quantization pytorch quantization stable-diffusion

Last synced: 06 Apr 2025

https://github.com/SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer

Last synced: 08 May 2025

https://github.com/JetRunner/BERT-of-Theseus

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

bert glue model-compression nlp transformers

Last synced: 02 Apr 2025

https://github.com/czg1225/SlimSAM

[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim

knowledge-distillation model-compression model-pruning segment-anything-model

Last synced: 24 Jul 2025

https://github.com/squeezeailab/kvquant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer

Last synced: 07 Apr 2025

https://github.com/vinhkhuc/JFastText

Java interface for fastText

java jni machine-learning model-compression nlp text-classification word-embeddings

Last synced: 05 Jan 2026

https://github.com/kssteven418/i-bert

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

bert efficient-model efficient-neural-networks model-compression natural-language-processing quantization transformer

Last synced: 06 Apr 2025

https://github.com/picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llm llm-inference llms mistral mixtral model-compression natural-language-processing quantization self-hosted

Last synced: 23 Oct 2025

https://github.com/Sharpiless/Yolov5-distillation-train-inference

Yolov5 distillation training | Yolov5知识蒸馏训练，支持训练自己的数据

distillation konwledge-distillation model-compression object-detection yolov5

Last synced: 20 Apr 2025

https://github.com/princeton-nlp/cofipruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

bert model-compression nlp pruning

Last synced: 27 Apr 2025

https://github.com/Peterisfar/YOLOV3

yolov3 by pytorch

mobilenetv2 model-compression object-detection pytorch voc yolov3

Last synced: 20 Apr 2025

https://github.com/vainf/diff-pruning

[NeurIPS 2023] Structural Pruning for Diffusion Models

diffusion-models efficient-deep-learning model-compression network-pruning pytorch structured-pruning

Last synced: 30 Jun 2025

https://github.com/mit-han-lab/amc-models

[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices

automl efficient-model model-compression on-device-ai

Last synced: 13 May 2025

https://github.com/DwangoMediaVillage/keras_compressor

Model Compression CLI Tool for Keras.

deep-learning keras machine-learning model-compression

Last synced: 22 Jul 2025

https://github.com/NVlabs/condensa

Programmable Neural Network Compression

deep-neural-networks model-compression model-pruning

Last synced: 03 May 2025

https://github.com/liyuanlucasliu/ld-net

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

contextualized-representation language-model model-compression named-entity-recognition ner pytorch sequence-labeling

Last synced: 15 Apr 2025

https://github.com/VainF/Diff-Pruning

[NeurIPS 2023] Structural Pruning for Diffusion Models

diffusion-models efficient-deep-learning model-compression network-pruning pytorch structured-pruning

Last synced: 13 Mar 2025

https://github.com/jim-schwoebel/allie

🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

autokeras automl autopytorch data-augmentation data-cleaning data-cleaning-pipeline data-transformation data-visualization datasets deep-learning ludwig machine-learning machine-learning-api machine-learning-library machine-learning-models model-compression model-deployment tpot voice-computing

Last synced: 21 Aug 2025

https://datawhalechina.github.io/awesome-compression/

模型压缩的小白入门教程

compression kd knowledge-distillation model-compression model-pruning model-quantization neural-architecture-search prune quantization tinyml

Last synced: 24 Sep 2025

https://github.com/kssteven418/ltp

[KDD'22] Learned Token Pruning for Transformers

bert efficient-model efficient-neural-networks model-compression natural-language-processing pruning transformer

Last synced: 03 Sep 2025

https://github.com/thu-nics/MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

large-language-models model-compression sparse-attention

Last synced: 19 Jul 2025

https://github.com/archsyscall/aquvitae

Knowledge Distillation Toolkit

deep-learning knowledge-distillation light-weight machine-learning model-compression pytorch tensorflow

Last synced: 02 Apr 2026

https://github.com/vita-group/svite

[NeurIPS'21] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

dynamic-sparsity efficient-transformers model-compression pruning sparse-training token-slimming vision-transformers

Last synced: 19 Apr 2025

https://github.com/microsoft/moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

inference-efficiency model-compression neural-architecture-search token-pruning

Last synced: 07 Apr 2025

https://github.com/iamhankai/Versatile-Filters

Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)

convolutional-neural-networks model-compression

Last synced: 18 Nov 2025

https://github.com/iamhankai/versatile-filters

Pytorch code for paper: Learning Versatile Filters for Efficient Convolutional Neural Networks (NeurIPS 2018)

convolutional-neural-networks model-compression

Last synced: 25 Mar 2025

https://github.com/onnx/neural-compressor

Model compression for ONNX

deep-learning model-compression model-pruning onnx onnxruntime quantization

Last synced: 29 Jul 2025

https://github.com/musco-ai/musco-pytorch

MUSCO: MUlti-Stage COmpression of neural networks

cp-decomposition deep-neural-networks low-rank model-acceleration model-compression network-acceleration network-compression pytorch tensor-decomposition truncated-svd tucker vbmf

Last synced: 15 Apr 2025

https://github.com/bloomberg/minilmv2.bb

Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)

distillation language-model model-compression model-distillation python pytorch transformers

Last synced: 07 May 2025

https://github.com/CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

deep-learning large-language-models machine-learning mixture-of-experts model-compression natural-language-processing

Last synced: 11 May 2025

https://github.com/kxytechnologies/kxy-python

A toolkit to boost the productivity of machine learning engineers.

feature-engineering feature-selection information-theory machine-learning machine-learning-library model-compression python

Last synced: 30 Oct 2025

https://github.com/vita-group/atmc

[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”

model-compression pruning quantization robustness unified-optimization-framework

Last synced: 07 Aug 2025

https://github.com/asahi417/lm-vocab-trimmer

Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.

bert gpt language-model model-compression nlp t5

Last synced: 17 Jul 2025

https://github.com/kssteven418/q-asr

[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition

automatic-speech-recognition deep-learning efficient-model efficient-neural-networks jasper model-compression quantization quartznet speech speech-recognition

Last synced: 31 Jul 2025

https://github.com/esceptico/squeezer

Lightweight knowledge distillation pipeline

distillation knowledge-distillation model-compression pytorch

Last synced: 22 Apr 2025

https://github.com/frankaging/causal-distill

The Codebase for Causal Distillation for Language Models (NAACL '22)

bert-model distilbert language-model model-compression model-distillation

Last synced: 29 Apr 2025

https://github.com/vita-group/prac-lth

[ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang

coreset efficiency lottery-tickets-hypothesis model-compression pruning-aware-critical-set winning-tickets

Last synced: 11 Sep 2025

https://github.com/z7zuqer/model-compression-and-acceleration-4-dnn

model-compression-and-acceleration-4-DNN

decomposition distillation model-compression pruning quantization

Last synced: 04 Jan 2026

https://github.com/huangcongqing/model-compression-optimization

model compression and optimization for deployment for Pytorch, including knowledge distillation, quantization and pruning.(知识蒸馏，量化，剪枝)

knowledge-distillation model-compression nas pruning pytorch quantization quantized-networks sparsity sparsity-optimization

Last synced: 05 May 2025

https://github.com/linkedin/quantease

QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.

generative-ai large-language-models model-compression publication-code quantization

Last synced: 17 Aug 2025

https://github.com/iamhankai/full-stack-filters

Pytorch code for paper: Full-Stack Filters to Build Minimum Viable CNNs

convolutional-neural-networks model-compression

Last synced: 27 Jul 2025

https://github.com/eullm/eullm

Open-source platform for creating, distributing and running sovereign EU-compliant LLMs. Verticalize any model for your domain, language and brand. AI Act ready.

ai-sovereignty data-sovereignty eu-ai-act europe fine-tuning gdpr gguf knowledge-distillation llm local-llm mlops model-compression ollama open-source privacy python quantization rust self-hosted sovereign-ai

Last synced: 11 Apr 2026

https://github.com/changwoolee/blast

[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

efficient-inference large-language-models llama matrix-factorization matrix-multiplication model-compression

Last synced: 11 Apr 2025

https://github.com/densechen/eve-mli

eve-mli: making learning interesting

deep-learning deep-reinforcement-learning eve-mli model-compression network-architecture pruning pypi pytorch quantization-efficient-network reinforcement-learning spiking-neural-networks

Last synced: 16 Jan 2026

https://github.com/hkuds/lightgnn

[WSDM'25] "LightGNN: Simple Graph Neural Network for Recommendation"

graph-learning graph-neural-networks knowledge-distillation model-compression recommendation

Last synced: 04 Jul 2025

https://github.com/stonesjtu/basis-embedding

basis embedding: a product quantization based model compression method for language models.

language-model model-compression product-quantization pytorch

Last synced: 24 Jun 2025

https://github.com/shuai-xie/bisenet-compression

10 variants of original BiSeNet with performance comparison, the faster, the better.

model-compression resnet sematic-segmentation

Last synced: 28 Feb 2025

https://github.com/m-pektas/bfas

Brute Force Architecture Search

deep-learning deep-neural-networks hyperparameter-optimization mlops model-comparison model-compression nas neural-architecture-search python pytorch random-search

Last synced: 08 Oct 2025

https://github.com/labouteille/torchprune

Deep learning compression framework in Pytorch [WIP]

deep-learning knowledge-distillation model-compression pruning quantization

Last synced: 03 Feb 2026

https://github.com/anishacharya/online-embedding-compression-aaai-2019

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.

compression deep-learning low-rank-approximation low-rank-representaion matrix-decompositions model-compression tensorflow text-classification word-embedding word-embeddings

Last synced: 04 Mar 2026

https://github.com/msadeqsirjani/adaptive_edge_ai

Optimizing deep learning models for edge devices through intelligent compression and knowledge distillation. Achieve up to 90% model size reduction while maintaining performance, enabling efficient AI deployment on resource-constrained devices.

deep-learning edge-ai knowledge-distillation model-compression onnx-optimization pytorch

Last synced: 12 Apr 2026

https://github.com/mohd-faizy/hyperparameter-tuning-with-microsoft-network-intelligence-toolkit-nni

Hyperparameter Tuning with Microsoft NNI to automated machine learning (AutoML) experiments. The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.

automl feature-engineering hyperparameter-optimization hyperparameter-tuning microsoft-nni model-compression neural-architecture-search neural-network-intelligence nni nnictl

Last synced: 23 Jul 2025

https://github.com/r-papso/torch-optimizer

PyTorch models optimization by neural network pruning

deep-learning model-compression neural-network-pruning optimization pruning pytorch

Last synced: 07 Oct 2025

https://github.com/memgonzales/mirror-segmentation

Presented at the 2023 International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2023). Lightweight mirror segmentation CNN that uses an EfficientNet backbone, employs parallel convolutional layers to capture edge features, and applies filter pruning for model compression

cnn cnn-compression computer-vision convolutional-neural-networks deep-learning efficientnet model-compression model-pruning object-detection object-segmentation pruning pytorch segmentation

Last synced: 25 Dec 2025

https://github.com/chouaib-629/quantileregression

Quantile regression for delivery time and some scenarios.

data-science data-visualization delivery-time jupyter-notebook linear-regression machine-learning matplotlib model-compression numpy pandas predictive-modeling python python3 quantile-regression regression-models scikit-learn scipy statistical-analysis statsmodels

Last synced: 11 Feb 2026

https://github.com/elphinkuo/distiller

The original experiments code for AAAI 2020 paper, "AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates"

auto-ml computer-vision deep-learning deep-reinforcement-learning model-compression model-pruning

Last synced: 17 Jul 2025

https://github.com/18520339/unstructured-local-search-pruning

Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance

artificial-intelligence genetic-algorithm local-search-algoirthms model-compression neural-network simulated-annealing unstructured-pruning

Last synced: 15 Jul 2025

https://github.com/jaketae/nn-svd

Neural network compression with SVD

model-compression neural-network neural-network-compression singular-value-decomposition svd

Last synced: 23 Mar 2025

https://github.com/rakutentech/iterative_training

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

deep-learning machine-learning model-compression neural-network

Last synced: 04 Jul 2025

https://github.com/blue-no1/quantization-experiments

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.

inference llm model-compression quantization

Last synced: 22 Nov 2025

https://github.com/ksm26/quantization-fundamentals-with-hugging-face

Learn linear quantization techniques using the Quanto library and downcasting methods with the Transformers library to compress and optimize generative AI models effectively.

compression downcasting generative-ai hugging-face linear-quantization model-compression model-deployment model-optimization optimize quantization quantization-fundamentals quanto-library transformers-library

Last synced: 28 Mar 2025