awesomemlsys

An ML Systems Onboarding list
https://github.com/gpu-mode/awesomemlsys

Last synced: 3 days ago
JSON representation

Attention Mechanism
Distributed
Linear attention
- Flash linear attention - of-the-art linear attention models (and their papers)
Long context length
Performance Optimizations
Quantization
Sparsity
Speculative decoding

Programming Languages

Categories

Distributed 13 Performance Optimizations 10 Attention Mechanism 7 Quantization 5 Long context length 3 Sparsity 3 Speculative decoding 3 Linear attention 1

Sub Categories

Keywords

large-language-models 2 natural-language-processing 1 machine-learning-systems 1 speculative-decoding 1 llm-inference 1