Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesomemlsys
An ML Systems Onboarding list
https://github.com/gpu-mode/awesomemlsys
Last synced: about 6 hours ago
JSON representation
-
Distributed
- jit checkpointing
- Singularity
- torchtitan
- pipedream
- torchtitan
- Breaking the computation and communication abstraction barrier
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Megatron-LM
- Singularity
- OpenDiloco
- Local SGD
- OpenDiloco
- Reducing Activation Recomputation in Large Transformer models
- Breaking the computation and communication abstraction barrier
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Megatron-LM
-
Attention Mechanism
- Llama 2 paper
- gpt-fast
- Attention is all you need
- Online normalizer calculation for softmax
- Self Attention does not need O(n^2) memory
- Flash Attention 2
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
- Online normalizer calculation for softmax
- Self Attention does not need O(n^2) memory
- Llama 2 paper
- gpt-fast
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
-
Performance Optimizations
- Efficiently Scaling transformer inference
- Making Deep Learning go Brrr from First Principles
- Group Query Attention
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Sarathi LLM
- Fast Inference from Transformers via Speculative Decoding
- Group Query Attention
- Orca: A Distributed Serving System for Transformer-Based Generative Models - read for the PagedAttention paper).
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Colfax Research Blog
- Sarathi LLM
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
- Efficiently Scaling transformer inference
- Making Deep Learning go Brrr from First Principles
- Epilogue Visitor Tree
-
Quantization
-
Long context length
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- YaRN: Efficient Context Window Extension of Large Language Models
- Ring Attention with Blockwise Transformers for Near-Infinite Context
- YaRN: Efficient Context Window Extension of Large Language Models
- Ring Attention with Blockwise Transformers for Near-Infinite Context
-
Sparsity
Programming Languages
Categories
Sub Categories