Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
https://github.com/NexaAI/Awesome-LLMs-on-device
Last synced: 4 days ago
JSON representation
-
Tutorials and Learning Resources
-
Hardware Acceleration
- Machine Learning Systems
- Introduction to on-device AI
-  |
- Octopus v2: On-device language model for super agent - device Sub-billion Multimodal AI Agent](https://arxiv.org/pdf/2404.11459.pdf)<br>[Octopus v4: Graph of language models](https://arxiv.org/pdf/2404.19296.pdf)<br>[Octopus: On-device language model for function calling of software APIs](https://arxiv.org/pdf/2404.01549.pdf) |
-
-
Foundations and Preliminaries
-
Evolution of On-Device LLMs
-
LLM Architecture Foundations
-
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
-
The Performance Indicator of On-Device LLMs
-
-
Efficient Architectures for On-Device LLMs
-
The Performance Indicator of On-Device LLMs
- Any-Precision LLM - training quantization, memory-efficient design | Substantial memory savings with versatile model precisions |
- Breakthrough Memory
- JetMoE - Chat with fewer parameters | Reduces inference computation by 70% using sparse activation | 8B total parameters, only 2B activated per input token |
- Pangu-$`\pi`$ Pro - level parameter models | Embedding sharing, tokenizer compression | Reduced model size via architecture tweaking |
-
General Efficiency and Performance Improvements
-
Model Compression and Parameter Sharing
-
Collaborative and Hierarchical Model Approaches
-
Memory and Computational Efficiency
- [Paper - experiments/MELT-public)
-
Mixture-of-Experts (MoE) Architectures
-
Hybrid Architectures
- [Zamba2-2.7B - 1.2B]](https://www.zyphra.com/post/zamba2-mini)
-
-
Hardware Acceleration and Deployment Strategies
-
Applications
-
Hardware Acceleration
- Gboard smart reply
- BioMistral-7B
- Octopus v3 - nano-google-pixel/)
- DriveVLM
- LLMCad
- Gboard smart reply
-
-
Model Compression and Optimization Techniques for On-Device LLMs
Programming Languages
Categories
Sub Categories
Hardware Acceleration
26
Popular On-Device LLMs Framework
9
The Performance Indicator of On-Device LLMs
6
Evolution of On-Device LLMs
5
Quantization
4
Mixture-of-Experts (MoE) Architectures
3
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
2
General Efficiency and Performance Improvements
2
Low-Rank Factorization
2
Pruning
1
Knowledge Distillation
1
Hybrid Architectures
1
Collaborative and Hierarchical Model Approaches
1
Memory and Computational Efficiency
1
LLM Architecture Foundations
1
Model Compression and Parameter Sharing
1
Keywords
deep-learning
4
machine-learning
4
llama
3
llm
3
large-language-models
2
inference
2
hpu
1
inferentia
1
llm-serving
1
llmops
1
mlops
1
model-serving
1
pytorch
1
qwen
1
rocm
1
tpu
1
trainium
1
transformer
1
xpu
1
android
1
gpt
1
deepseek
1
cuda
1
amd
1
tvm
1
machine-learning-compilation
1
language-model
1
winograd-algorithm
1
vulkan
1
mnn
1
ml
1
embedded-devices
1
deep-neural-networks
1
convolution
1
local-inference
1
embedded
1
gpu
1
mobile
1
neural-network
1
tensor
1
multimodal
1
artificial-intelligence
1
cloud-ml
1
computer-systems
1
courseware
1
edge-machine-learning
1
embedded-ml
1
machine-learning-systems
1
mobile-ml
1
textbook
1