Awesome-LLMs-on-device

Awesome LLMs on Device: A Comprehensive Survey
https://github.com/NexaAI/Awesome-LLMs-on-device

Last synced: 4 days ago
JSON representation

Tutorials and Learning Resources
- Hardware Acceleration
  - Machine Learning Systems
  - Introduction to on-device AI
  - ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
  - here
  - ![Star History Chart - history.com/#NexaAI/Awesome-LLMs-on-device&Timeline)
  - TinyML and Efficient Deep Learning Computing
  - Machine Learning Systems
Model Reference
- Hardware Acceleration
  - A GPT-4V Level Multimodal LLM on Your Phone
  - Gemma 2: Improving Open Language Models at a Practical Size
  - Qwen Technical Report
  - Gemini: A Family of Highly Capable Multimodal Models
  - Gemini: A Family of Highly Capable Multimodal Models
  - Qwen Technical Report
  - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
  - A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
  - GLM-Edge Github Page
  - OpenELM is a significant large language model integrated within iOS to enhance application functionalities. - v2 significantly improves upon its predecessor, introducing enhanced visual processing capabilities and an advanced training regimen.](https://arxiv.org/abs/2404.07973) |
  - Octopus v2: On-device language model for super agent - device Sub-billion Multimodal AI Agent](https://arxiv.org/pdf/2404.11459.pdf)<br>[Octopus v4: Graph of language models](https://arxiv.org/pdf/2404.19296.pdf)<br>[Octopus: On-device language model for function calling of software APIs](https://arxiv.org/pdf/2404.01549.pdf) |
Foundations and Preliminaries
- Evolution of On-Device LLMs
  - [Paper
  - [Paper
  - [Paper - AutoML/MobileVLM)
  - [Paper
  - [Octopus
- LLM Architecture Foundations
  - [Paper
- Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference
  - [Paper
  - [Paper
- The Performance Indicator of On-Device LLMs
  - [Paper - IPADS/PowerInfer)
  - [Paper - IPADS/PowerInfer)
Efficient Architectures for On-Device LLMs
- The Performance Indicator of On-Device LLMs
  - Any-Precision LLM - training quantization, memory-efficient design | Substantial memory savings with versatile model precisions |
  - Breakthrough Memory
  - JetMoE - Chat with fewer parameters | Reduces inference computation by 70% using sparse activation | 8B total parameters, only 2B activated per input token |
  - Pangu-$`\pi`$ Pro - level parameter models | Embedding sharing, tokenizer compression | Reduced model size via architecture tweaking |
- General Efficiency and Performance Improvements
  - [Paper - ARC/any-precision-llm)
  - [Paper
- Model Compression and Parameter Sharing
  - [Paper
- Collaborative and Hierarchical Model Approaches
  - [Paper
- Memory and Computational Efficiency
  - [Paper - experiments/MELT-public)
- Mixture-of-Experts (MoE) Architectures
  - [Paper
  - [Paper
  - [Paper
- Hybrid Architectures
  - [Zamba2-2.7B - 1.2B]](https://www.zyphra.com/post/zamba2-mini)
Hardware Acceleration and Deployment Strategies
- Popular On-Device LLMs Framework
  - [Github
  - [Github
  - [Github
  - [Github
  - [Github
  - [Github
  - [Github
  - [Github
  - [Github
- Hardware Acceleration
  - [Paper
  - [Paper
Applications
- Hardware Acceleration
  - Gboard smart reply
  - BioMistral-7B
  - Octopus v3 - nano-google-pixel/)
  - DriveVLM
  - LLMCad
  - Gboard smart reply
Model Compression and Optimization Techniques for On-Device LLMs
- Quantization
  - [Paper - han-lab/llm-awq)
  - [Paper
  - [Paper - DASLab/gptq)
  - [Paper
- Low-Rank Factorization
  - [Paper - compressor)
  - [Paper
- Pruning
  - [Paper
- Knowledge Distillation
  - [Paper

Programming Languages

Categories

Efficient Architectures for On-Device LLMs 13 Hardware Acceleration and Deployment Strategies 11 Model Reference 11 Foundations and Preliminaries 10 Model Compression and Optimization Techniques for On-Device LLMs 8 Tutorials and Learning Resources 7 Applications 6

Sub Categories

Hardware Acceleration 26 Popular On-Device LLMs Framework 9 The Performance Indicator of On-Device LLMs 6 Evolution of On-Device LLMs 5 Quantization 4 Mixture-of-Experts (MoE) Architectures 3 Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference 2 General Efficiency and Performance Improvements 2 Low-Rank Factorization 2 Pruning 1 Knowledge Distillation 1 Hybrid Architectures 1 Collaborative and Hierarchical Model Approaches 1 Memory and Computational Efficiency 1 LLM Architecture Foundations 1 Model Compression and Parameter Sharing 1

Keywords

deep-learning 4 machine-learning 4 llama 3 llm 3 large-language-models 2 inference 2 hpu 1 inferentia 1 llm-serving 1 llmops 1 mlops 1 model-serving 1 pytorch 1 qwen 1 rocm 1 tpu 1 trainium 1 transformer 1 xpu 1 android 1 gpt 1 deepseek 1 cuda 1 amd 1 tvm 1 machine-learning-compilation 1 language-model 1 winograd-algorithm 1 vulkan 1 mnn 1 ml 1 embedded-devices 1 deep-neural-networks 1 convolution 1 local-inference 1 embedded 1 gpu 1 mobile 1 neural-network 1 tensor 1 multimodal 1 artificial-intelligence 1 cloud-ml 1 computer-systems 1 courseware 1 edge-machine-learning 1 embedded-ml 1 machine-learning-systems 1 mobile-ml 1 textbook 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-LLMs-on-device

Tutorials and Learning Resources

Hardware Acceleration

Model Reference

Hardware Acceleration

Foundations and Preliminaries

Evolution of On-Device LLMs

LLM Architecture Foundations

Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference

The Performance Indicator of On-Device LLMs

Efficient Architectures for On-Device LLMs

The Performance Indicator of On-Device LLMs

General Efficiency and Performance Improvements

Collaborative and Hierarchical Model Approaches

Memory and Computational Efficiency

Mixture-of-Experts (MoE) Architectures

Hybrid Architectures

Hardware Acceleration and Deployment Strategies

Popular On-Device LLMs Framework

Hardware Acceleration

Applications

Hardware Acceleration

Model Compression and Optimization Techniques for On-Device LLMs

Quantization

Low-Rank Factorization

Pruning

Knowledge Distillation