Projects in Awesome Lists tagged with efficient-inference

[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

3d-reconstruction efficient-inference gaussian-splatting neurips-2024 nurips

Last synced: 14 Apr 2025

https://github.com/liuzhuang13/slimming

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

convolutional-neural-networks deep-learning efficient-inference

Last synced: 05 Apr 2025

https://github.com/squeezeailab/kvquant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer

Last synced: 07 Apr 2025

https://github.com/lucidrains/speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

artificial-intelligence deep-learning efficient-inference transformers

Last synced: 08 Apr 2025

https://github.com/picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llm llm-inference llms mistral mixtral model-compression natural-language-processing quantization self-hosted

Last synced: 08 Apr 2025

https://github.com/cure-lab/deciwatch

[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

2d-human-pose 3d-body-recovery 3d-pose-estimation body-reconstruction deep-learning eccv eccv2022 efficiency efficient-inference efficient-neural-networks human-pose-estimation pose-estimation pytorch

Last synced: 20 Dec 2024

https://github.com/czg1225/AsyncDiff

Official implementation of "AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising"

diffusion-models distributed-computing efficient-inference inference-acceleration stable-diffusion text-to-image text-to-video training-free

Last synced: 13 Mar 2025

https://github.com/snap-research/graphless-neural-networks

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)

deep-learning distillation efficient-inference gnn graph-algorithm graph-neural-networks knowledge-distillation pytorch scalability

Last synced: 06 Apr 2025

https://github.com/horseee/learning-to-cache

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

diffusion-models efficient-inference

Last synced: 15 Jan 2025

https://github.com/kssteven418/biglittledecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference fast-inference llm speculative-decoding speculative-execution

Last synced: 05 Dec 2024

https://github.com/Alpha-Innovator/AdaptiveDiffusion

[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free

Last synced: 13 Mar 2025

https://github.com/alpha-innovator/adaptivediffusion

[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free

Last synced: 09 Apr 2025

https://github.com/franxyao/partially-observed-treecrfs

Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs

crf efficient-inference named-entity-recognition nested-named-entity-recognition sum-product sum-product-algorithm tree-crf tree-structure

Last synced: 12 Nov 2024

https://github.com/bharathsudharsan/cnn_on_mcu

Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'

c-code-generator cmsis-nn edge-computing efficient-inference graph-optimization neuralnetworks optimization quantization quantization-aware-training tflite tflite-conversion tinyml

Last synced: 17 Nov 2024

https://github.com/vita-group/triple-wins

[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“

adversarial-attacks adversarial-robustness efficiency efficient-inference robustness triple-wins

Last synced: 19 Apr 2025

https://github.com/snap-research/linkless-link-prediction

[ICML 2023] Linkless Link Prediction via Relational Distillation

deep-learning distillation efficient-inference gnn graph-neural-networks knowledge-distillation link-prediction scalability

Last synced: 14 Apr 2025

https://github.com/bharathsudharsan/tinyml-benchmark-nns-on-mcus

Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'

arduinio armcortexm0 armcortexm4 armcortexm7 c-code-generator cmsis-nn efficient-inference machine-learning mcu-boards raspberry-pi-pico tflite tfmicro tinyml tinyml-benchmark

Last synced: 17 Nov 2024

https://github.com/unimodal4reasoning/adaptivediffusion

[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

adaptive-inference diffusion-models efficient-inference model-acceleration stable-diffusion training-free

Last synced: 09 Dec 2024

https://github.com/franxyao/rdp

Randomized Dynamic Programming

dynamic-programming efficient-inference graphical-models randomized-algorithms

Last synced: 12 Nov 2024

https://github.com/bharathsudharsan/ml-classifiers-on-mcus

Supplementary material for IEEE Services Computing paper 'An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware'

adafruit-feather arduino arm-cortex-m0 code-generation decision-tree-classifier efficient-inference esp32 microcontroller optimization random-forest-classifier stm32 tinyml

Last synced: 17 Nov 2024

https://github.com/changwoolee/blast

[NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference

efficient-inference large-language-models llama matrix-factorization matrix-multiplication model-compression

Last synced: 11 Apr 2025

https://github.com/deeplite/activ-sparse

Official PyTorch training code of Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity (ICCV2023-RCV)

deep-neural-networks efficient-deep-learning efficient-inference low-latency raspberry-pi sparsity tinyml

Last synced: 09 Apr 2025

https://github.com/bharathsudharsan/edge2train

Code for IoT paper 'Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices'

arm-cortex-m0 arm-cortex-m4 edge-computing efficient-inference iot-devices microcontroller online-learning optimization svm-training tinyml

Last synced: 11 Mar 2025

https://github.com/bharathsudharsan/ecml-tutorial-ml-meets-iot

Repository of the ECML PKDD 2021 tutorial title 'Machine Learning Meets Internet of Things: From Theory to Practice'

arm-cortex-m0 arm-cortex-m4 cmsis-nn edge-computing efficient-inference graph-optimization iot-devices machine-learning optimization pruning quantization self-learning tflite tinyml training-algorithms

Last synced: 11 Mar 2025

https://github.com/ashrafhamied/ef

ef is a lightweight and efficient command-line tool that simplifies interacting with Ethereum smart contracts. It provides a user-friendly interface for deploying, testing, and interacting with smart contracts on the Ethereum blockchain.

database detection dotnet-standard efficient-inference entity-framework epoll feature-extraction keras-efficientdet kqueue neural-network orm pretrained-models pytorch vision-transformer

Last synced: 21 Feb 2025

https://github.com/edward62740/edgetpu-mot

edgetpu efficient-inference multi-object-tracking

Last synced: 17 Mar 2025