Projects in Awesome Lists tagged with efficient-inference
A curated list of projects in awesome lists tagged with efficient-inference .
https://github.com/huawei-noah/efficient-ai-backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer
Last synced: 10 Apr 2025
https://github.com/huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer
Last synced: 20 Mar 2025
https://github.com/squeezeailab/llmcompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
efficient-inference function-calling large-language-models llama llama2 llm llm-agent llm-agents llm-framework llms natural-language-processing nlp parallel-function-call transformer
Last synced: 08 Apr 2025
https://github.com/SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
efficient-inference function-calling large-language-models llama llama2 llm llm-agent llm-agents llm-framework llms natural-language-processing nlp parallel-function-call transformer
Last synced: 16 Apr 2025
https://github.com/snap-research/efficientformer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
deep-learning detection efficient-inference efficient-neural-networks imagenet mobile-devices pytorch semantic-segmentation transformer transformers
Last synced: 12 Apr 2025
https://github.com/snap-research/EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
deep-learning detection efficient-inference efficient-neural-networks imagenet mobile-devices pytorch semantic-segmentation transformer transformers
Last synced: 15 Nov 2024
https://github.com/huawei-noah/addernet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
convolutional-neural-networks cvpr2020 efficient-inference imagenet pytorch
Last synced: 13 Apr 2025
https://github.com/horseee/deepcache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
diffusion-models efficient-inference model-compression stable-diffusion training-free
Last synced: 11 Apr 2025
https://github.com/horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
diffusion-models efficient-inference model-compression stable-diffusion training-free
Last synced: 23 Dec 2024
https://github.com/squeezeailab/squeezellm
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
efficient-inference large-language-models llama llm localllm model-compression natural-language-processing post-training-quantization quantization small-models text-generation transformer
Last synced: 13 Apr 2025
https://github.com/vita-group/lightgaussian
[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
3d-reconstruction efficient-inference gaussian-splatting neurips-2024 nurips
Last synced: 14 Apr 2025
https://github.com/liuzhuang13/slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
convolutional-neural-networks deep-learning efficient-inference
Last synced: 05 Apr 2025
https://github.com/squeezeailab/kvquant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer
Last synced: 07 Apr 2025
https://github.com/lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
artificial-intelligence deep-learning efficient-inference transformers
Last synced: 08 Apr 2025
https://github.com/picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llm llm-inference llms mistral mixtral model-compression natural-language-processing quantization self-hosted
Last synced: 08 Apr 2025
https://github.com/cure-lab/deciwatch
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
2d-human-pose 3d-body-recovery 3d-pose-estimation body-reconstruction deep-learning eccv eccv2022 efficiency efficient-inference efficient-neural-networks human-pose-estimation pose-estimation pytorch
Last synced: 20 Dec 2024
https://github.com/czg1225/AsyncDiff
Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"
diffusion-models distributed-computing efficient-inference inference-acceleration stable-diffusion text-to-image text-to-video training-free
Last synced: 13 Mar 2025
https://github.com/snap-research/graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
deep-learning distillation efficient-inference gnn graph-algorithm graph-neural-networks knowledge-distillation pytorch scalability
Last synced: 06 Apr 2025
https://github.com/horseee/learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
diffusion-models efficient-inference
Last synced: 15 Jan 2025
https://github.com/kssteven418/biglittledecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
decoding efficient-inference fast-inference llm speculative-decoding speculative-execution
Last synced: 05 Dec 2024
https://github.com/Alpha-Innovator/AdaptiveDiffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free
Last synced: 13 Mar 2025
https://github.com/alpha-innovator/adaptivediffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free
Last synced: 09 Apr 2025
https://github.com/franxyao/partially-observed-treecrfs
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
crf efficient-inference named-entity-recognition nested-named-entity-recognition sum-product sum-product-algorithm tree-crf tree-structure
Last synced: 12 Nov 2024
https://github.com/bharathsudharsan/cnn_on_mcu
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
c-code-generator cmsis-nn edge-computing efficient-inference graph-optimization neuralnetworks optimization quantization quantization-aware-training tflite tflite-conversion tinyml
Last synced: 17 Nov 2024
https://github.com/vita-group/triple-wins
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
adversarial-attacks adversarial-robustness efficiency efficient-inference robustness triple-wins
Last synced: 19 Apr 2025
https://github.com/snap-research/linkless-link-prediction
[ICML 2023] Linkless Link Prediction via Relational Distillation
deep-learning distillation efficient-inference gnn graph-neural-networks knowledge-distillation link-prediction scalability
Last synced: 14 Apr 2025
https://github.com/bharathsudharsan/tinyml-benchmark-nns-on-mcus
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
arduinio armcortexm0 armcortexm4 armcortexm7 c-code-generator cmsis-nn efficient-inference machine-learning mcu-boards raspberry-pi-pico tflite tfmicro tinyml tinyml-benchmark
Last synced: 17 Nov 2024
https://github.com/unimodal4reasoning/adaptivediffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free
Last synced: 09 Dec 2024
https://github.com/franxyao/rdp
Randomized Dynamic Programming
dynamic-programming efficient-inference graphical-models randomized-algorithms
Last synced: 12 Nov 2024
https://github.com/bharathsudharsan/ml-classifiers-on-mcus
Supplementary material for IEEE Services Computing paper 'An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware'
adafruit-feather arduino arm-cortex-m0 code-generation decision-tree-classifier efficient-inference esp32 microcontroller optimization random-forest-classifier stm32 tinyml
Last synced: 17 Nov 2024
https://github.com/changwoolee/blast
[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference
efficient-inference large-language-models llama matrix-factorization matrix-multiplication model-compression
Last synced: 11 Apr 2025
https://github.com/deeplite/activ-sparse
Official PyTorch training code of Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity (ICCV2023-RCV)
deep-neural-networks efficient-deep-learning efficient-inference low-latency raspberry-pi sparsity tinyml
Last synced: 09 Apr 2025
https://github.com/bharathsudharsan/edge2train
Code for IoT paper 'Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices'
arm-cortex-m0 arm-cortex-m4 edge-computing efficient-inference iot-devices microcontroller online-learning optimization svm-training tinyml
Last synced: 11 Mar 2025
https://github.com/bharathsudharsan/ecml-tutorial-ml-meets-iot
Repository of the ECML PKDD 2021 tutorial title 'Machine Learning Meets Internet of Things: From Theory to Practice'
arm-cortex-m0 arm-cortex-m4 cmsis-nn edge-computing efficient-inference graph-optimization iot-devices machine-learning optimization pruning quantization self-learning tflite tinyml training-algorithms
Last synced: 11 Mar 2025
https://github.com/ashrafhamied/ef
ef is a lightweight and efficient command-line tool that simplifies interacting with Ethereum smart contracts. It provides a user-friendly interface for deploying, testing, and interacting with smart contracts on the Ethereum blockchain.
database detection dotnet-standard efficient-inference entity-framework epoll feature-extraction keras-efficientdet kqueue neural-network orm pretrained-models pytorch vision-transformer
Last synced: 21 Feb 2025
https://github.com/edward62740/edgetpu-mot
edgetpu efficient-inference multi-object-tracking
Last synced: 17 Mar 2025