Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
https://github.com/NexaAI/Awesome-LLMs-on-device
Last synced: 1 day ago
JSON representation
-
Model Compression and Optimization Techniques for On-Device LLMs
-
Foundations and Preliminaries
-
LLM Architecture Foundations
-
The Performance Indicator of On-Device LLMs
-
Evolution of On-Device LLMs
-
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
-
-
Applications
-
Hardware Acceleration
- LLMCad
- DriveVLM
- BioMistral-7B
- Gboard smart reply
- Octopus v3 - nano-google-pixel/)
-
-
Model Reference
-
Hardware Acceleration
- Qwen Technical Report
- Gemini: A Family of Highly Capable Multimodal Models
- OpenELM is a significant large language model integrated within iOS to enhance application functionalities. - v2 significantly improves upon its predecessor, introducing enhanced visual processing capabilities and an advanced training regimen.](https://arxiv.org/abs/2404.07973) |
- Gemma 2: Improving Open Language Models at a Practical Size
- A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
- A GPT-4V Level Multimodal LLM on Your Phone
- Gemini: A Family of Highly Capable Multimodal Models
- Qwen Technical Report
- GLM-Edge Github Page
- Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
- Octopus v2: On-device language model for super agent - device Sub-billion Multimodal AI Agent](https://arxiv.org/pdf/2404.11459.pdf)<br>[Octopus v4: Graph of language models](https://arxiv.org/pdf/2404.19296.pdf)<br>[Octopus: On-device language model for function calling of software APIs](https://arxiv.org/pdf/2404.01549.pdf) |
-
-
Efficient Architectures for On-Device LLMs
-
The Performance Indicator of On-Device LLMs
- Any-Precision LLM - training quantization, memory-efficient design | Substantial memory savings with versatile model precisions |
- JetMoE - Chat with fewer parameters | Reduces inference computation by 70% using sparse activation | 8B total parameters, only 2B activated per input token |
- Pangu-$`\pi`$ Pro - level parameter models | Embedding sharing, tokenizer compression | Reduced model size via architecture tweaking |
- Breakthrough Memory
-
Model Compression and Parameter Sharing
-
Memory and Computational Efficiency
- [Paper - experiments/MELT-public)
-
Mixture-of-Experts (MoE) Architectures
-
Collaborative and Hierarchical Model Approaches
-
Hybrid Architectures
- [Zamba2-2.7B - 1.2B]](https://www.zyphra.com/post/zamba2-mini)
-
General Efficiency and Performance Improvements
-
-
Tutorials and Learning Resources
-
Hardware Acceleration
- Introduction to on-device AI
- here
- Machine Learning Systems
- ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
- ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
- Machine Learning Systems
- Machine Learning Systems
-
-
Hardware Acceleration and Deployment Strategies
Programming Languages
Categories
Sub Categories
Hardware Acceleration
25
Popular On-Device LLMs Framework
8
The Performance Indicator of On-Device LLMs
6
Evolution of On-Device LLMs
5
Quantization
4
Mixture-of-Experts (MoE) Architectures
3
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
2
General Efficiency and Performance Improvements
2
Low-Rank Factorization
2
Pruning
1
Knowledge Distillation
1
Hybrid Architectures
1
Collaborative and Hierarchical Model Approaches
1
Memory and Computational Efficiency
1
LLM Architecture Foundations
1
Model Compression and Parameter Sharing
1
Keywords
deep-learning
4
machine-learning
4
llama
3
llm
3
large-language-models
2
inference
2
hpu
1
inferentia
1
llm-serving
1
llmops
1
mlops
1
model-serving
1
pytorch
1
qwen
1
rocm
1
tpu
1
trainium
1
transformer
1
xpu
1
android
1
gpt
1
deepseek
1
cuda
1
amd
1
tvm
1
machine-learning-compilation
1
language-model
1
winograd-algorithm
1
vulkan
1
mnn
1
ml
1
embedded-devices
1
deep-neural-networks
1
convolution
1
local-inference
1
embedded
1
gpu
1
mobile
1
neural-network
1
tensor
1
multimodal
1
artificial-intelligence
1
cloud-ml
1
computer-systems
1
courseware
1
edge-machine-learning
1
embedded-ml
1
machine-learning-systems
1
mobile-ml
1
textbook
1