Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
https://github.com/NexaAI/Awesome-LLMs-on-device
Last synced: 4 days ago
JSON representation
-
Efficient Architectures for On-Device LLMs
-
Memory and Computational Efficiency
- [Paper - experiments/MELT-public)
-
Model Compression and Parameter Sharing
-
The Performance Indicator of On-Device LLMs
- Any-Precision LLM - training quantization, memory-efficient design | Substantial memory savings with versatile model precisions |
- Breakthrough Memory
- JetMoE - Chat with fewer parameters | Reduces inference computation by 70% using sparse activation | 8B total parameters, only 2B activated per input token |
- Pangu-$`\pi`$ Pro - level parameter models | Embedding sharing, tokenizer compression | Reduced model size via architecture tweaking |
-
General Efficiency and Performance Improvements
-
Mixture-of-Experts (MoE) Architectures
-
Collaborative and Hierarchical Model Approaches
-
Hybrid Architectures
- [Zamba2-2.7B - 1.2B]](https://www.zyphra.com/post/zamba2-mini)
-
-
Model Compression and Optimization Techniques for On-Device LLMs
-
Hardware Acceleration and Deployment Strategies
-
Model Reference
-
Hardware Acceleration
- Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
- OpenELM is a significant large language model integrated within iOS to enhance application functionalities. - v2 significantly improves upon its predecessor, introducing enhanced visual processing capabilities and an advanced training regimen.](https://arxiv.org/abs/2404.07973) |
- A GPT-4V Level Multimodal LLM on Your Phone
- Gemma 2: Improving Open Language Models at a Practical Size
- Qwen Technical Report
- Gemini: A Family of Highly Capable Multimodal Models
- Octopus v2: On-device language model for super agent - device Sub-billion Multimodal AI Agent](https://arxiv.org/pdf/2404.11459.pdf)<br>[Octopus v4: Graph of language models](https://arxiv.org/pdf/2404.19296.pdf)<br>[Octopus: On-device language model for function calling of software APIs](https://arxiv.org/pdf/2404.01549.pdf) |
- Gemini: A Family of Highly Capable Multimodal Models
- Qwen Technical Report
-
-
Tutorials and Learning Resources
-
Hardware Acceleration
- Machine Learning Systems
- Introduction to on-device AI
- ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
- here
- ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
-
-
Foundations and Preliminaries
-
Evolution of On-Device LLMs
-
LLM Architecture Foundations
-
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
-
The Performance Indicator of On-Device LLMs
-
-
Applications
-
Hardware Acceleration
- Gboard smart reply
- BioMistral-7B
- Octopus v3 - nano-google-pixel/)
- DriveVLM
- LLMCad
-
Categories
Sub Categories
Hardware Acceleration
21
Popular On-Device LLMs Framework
9
The Performance Indicator of On-Device LLMs
6
Evolution of On-Device LLMs
5
Quantization
4
Mixture-of-Experts (MoE) Architectures
3
Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
2
General Efficiency and Performance Improvements
2
Low-Rank Factorization
2
Pruning
1
Knowledge Distillation
1
Hybrid Architectures
1
Collaborative and Hierarchical Model Approaches
1
Memory and Computational Efficiency
1
LLM Architecture Foundations
1
Model Compression and Parameter Sharing
1
Keywords
llama
4
llm
3
deep-learning
3
machine-learning
3
large-language-models
2
inference
2
local-inference
1
embedded
1
gpu
1
mobile
1
neural-network
1
tensor
1
android
1
audio-processing
1
c-plus-plus
1
calculator
1
computer-vision
1
llm-inference
1
falcon
1
bamboo-7b
1
winograd-algorithm
1
vulkan
1
mnn
1
ml
1
embedded-devices
1
deep-neural-networks
1
convolution
1
arm
1
ggml
1
multimodal
1
xpu
1
transformer
1
trainium
1
tpu
1
rocm
1
pytorch
1
model-serving
1
mlops
1
llmops
1
llm-serving
1
inferentia
1
hpu
1
gpt
1
cuda
1
amd
1
tvm
1
machine-learning-compilation
1
language-model
1
video-processing
1
stream-processing
1