Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-emdl
Embedded and mobile deep learning research resources
https://github.com/csarron/awesome-emdl
Last synced: 4 days ago
JSON representation
-
Papers
-
Survey
- TinyML Platforms Benchmarking
- TinyML: A Systematic Review and Synthesis of Existing Research
- TinyML Meets IoT: A Comprehensive Survey
- A review on TinyML: State-of-the-art and prospects
- TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers
- Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
- Benchmarking TinyML Systems: Challenges and Direction
- Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
- The Deep Learning Compiler: A Comprehensive Survey
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Awesome ML Model Compression
- TinyML Papers and Projects
- EfficientDNNs
-
Model
- EtinyNet: Extremely Tiny Network for TinyML
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
- SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
- Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
- MCUNet: Tiny Deep Learning on IoT Devices
- GhostNet: More Features from Cheap Operations
- MicroNet for Efficient Language Modeling
- Searching for MobileNetV3
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
- NasNet: Learning Transferable Architectures for Scalable Image Recognition
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions
- NasNet: Learning Transferable Architectures for Scalable Image Recognition
-
System
- BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML - DAC '22]
- CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs
- UDC: Unified DNAS for Compressible TinyML Models
- AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator
- TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
- Once for All: Train One Network and Specialize it for Efficient Deployment
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs
- DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
- EIE: Efficient Inference Engine on Compressed Deep Neural Network
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML - DAC '22]
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices - App ’15]
- BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML - DAC '22]
-
Quantization
- Quantizing deep convolutional networks for efficient inference: A whitepaper
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Training and Inference with Integers in Deep Neural Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Loss-aware Binarization of Deep Networks
- Towards the Limit of Network Quantization
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Quantized Convolutional Neural Networks for Mobile Devices
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Compressing Deep Convolutional Networks using Vector Quantization
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
-
Pruning
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
- To prune, or not to prune: exploring the efficacy of pruning for model compression
- Pruning Filters for Efficient ConvNets
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Soft Weight-Sharing for Neural Network Compression
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Dynamic Network Surgery for Efficient DNNs
- Learning both Weights and Connections for Efficient Neural Networks
-
Approximation
- High performance ultra-low-precision convolutions on mobile devices
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
-
Characterization
-
-
Libraries
-
Inference Framework
- Apple - CoreML - is integrate machine learning models into your app. [BERT and GPT-2 on iPhone](https://github.com/huggingface/swift-coreml-transformers)
- Edge Impulse - Interactive platform to generate models that can run in microcontrollers. They are also quite active on social netwoks talking about recent news on EdgeAI/TinyML.
- Google - TensorFlow Lite - is an open source deep learning framework for on-device inference.
- Meta - PyTorch Mobile - is a new framework for helping mobile developers and machine learning engineers embed PyTorch ML models on-device.
- xmartlabs - Bender - Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
-
Optimization Tools
- Neural Network Distiller - Python package for neural network compression research.
-
-
General
-
Web
-
Edge / Tiny MLOps
- Tiny-MLOps: a framework for orchestrating ML applications at the far edge of IoT systems
- MLOps for TinyML: Challenges & Directions in Operationalizing TinyML at Scale
- TinyMLOps: Operational Challenges for Widespread Edge AI Adoption
- A TinyMLaaS Ecosystem for Machine Learning in IoT: Overview and Research Challenges - DAT '21]
- SOLIS: The MLOps journey from data acquisition to actionable insights
- Edge MLOps: An Automation Framework for AIoT Applications
- SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices
- TinyMLOps: Operational Challenges for Widespread Edge AI Adoption
-
-
Tutorials
-
General
-
NEON
-
OpenCL
-
-
Courses
-
OpenCL
-
-
Tools
-
GPU
- Bifrost GPU architecture and ARM Mali-G71 GPU
- Midgard GPU Architecture - T880 GPU](https://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.50-GPU-Epub/HC27.25.531-Mali-T880-Bratt-ARM-2015_08_23.pdf)
- Mobile GPU market share
-
Driver
-
Programming Languages
Categories
Sub Categories
Keywords
machine-learning
3
deep-learning
2
deep-neural-networks
2
neural-networks
2
model-compression
2
apple
1
convolutional-neural-networks
1
ios
1
iphone
1
metal
1
residual-networks
1
swift
1
awesome-list
1
pruning
1
quantization
1
computer-vision
1
embedded-systems
1
neural-architecture-search
1
tinyml
1
wake-word
1
efficient-deep-learning
1
knowledge-distillation
1
network-pruning
1