awesome-adaptive-computation

A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
https://github.com/koayon/awesome-adaptive-computation

Last synced: about 1 hour ago
JSON representation

Mixture of Experts (Sparse MoE)
- c-BTM
- Mixtral-8x7B
- pdf
- pdf
- pdf
- pdf
- pdf
- pdf
- pdf
- PyTorch code
- DeMix pdf
- Task-MoE pdf
- pdf
- official PyTorch code
- pdf
- pdf
- review paper
- PyTorch code
- model
- pdf
- One Wide Feedforward paper
- pdf
- code
- code
- pdf
- official code
- code
- official Jax code
- HydraMoE
- official PyTorch code
- c-BTM code
- blog
- pdf
- pdf
- pdf
- pdf
- models
- MoE-Infinity - Infinity)
- Branch, Train, Mix (BTX)
- pdf
- JetMoE
- Branch, Train, Mix (BTX)
- pdf
- MoEfication - to-MoE converted models by: 1) showing that the efficiency of the resulting model can be significantly enhanced by enforcement of activation sparsity in the base model; 2) proposing Expert Contribution Routing, a novel objective for the training of the gating networks, which are now tasked to predict the output norm of each expert for the given input, enabling approximation of each expert’s relative contribution; 3) introducing dynamic-k gating, which allows the model to appropriately distribute its computational budget between easy and hard inputs; 4) extending the proposed conversion scheme to any linear layers such as multi-head attention projections.
- pdf
- Task-MoE pdf
- ELMForest - Branch, Train, Merge (BTM)
- c-BTM
- pdf
- Multi-gate
- pdf
- Gemini 1.5 Pro - dbrx-new-state-art-open-llm) is another powerful MoE model and it seems that MoE is now the go-to architecture for large models.
- pdf
- pdf
- here
- pdf
- code
- pdf
- Dynamic Routing in MoEs
- pdf
- pdf
- pdf
- pdf
- pdf
- MoE-Infinity - Infinity)
- pdf
- pdf
- pdf
- pdf
- pdf
- DeMix pdf
- pdf
- review paper
- pdf
- One Wide Feedforward paper
- pdf
About
- System 2 - ->
- The Bitter Lesson - trained models focus on `learning` at _train time_, Adaptive Computation is about spending more compute at inference time with mechanisms similar to `Search`. -->
Other Modular Architectures
- pdf1
- pdf2
- Hardware Lottery
- pdf
- pytorch code
- repo
- official pytorch code1
- official pytorch code2
- pytorch code
- pdf
- pdf
- LayerDrop
- LayerDrop
- pdf
- pdf
- pytorch code
- pdf1
- pdf2
- Hardware Lottery
Early Exit: End-to-End Adaptive Computation
- pdf
- pdf
- official jax code
- pdf
- pdf
- PyTorch code
- pdf
- F-PaBEE
- pdf
- blog
- pdf
- F-PaBEE
- pdf
- pdf
- pdf
- pytorch code
- pdf
- pdf
- pdf
Adaptive Computation for Black-box models
- Reflexion
- Debate
- Chain of Thought
- Tree of Thought
- Chain of Verification
- pdf
- pdf2
- pdf
- pdf2
- blog
- Online Speculative Decoding
- pdf
- inverse scaling
- pdf
- PyTorch code
- pdf
- pdf2
- PyTorch code
- Online Speculative Decoding
- pdf
- pdf
- Recurrent Drafter
- large n-gram models
- pdf
- pdf
- blog
- model mapping
- here
- pdf
- Recurrent Drafter
- large n-gram models
- REST - on tokens from the web for the speculative decoding head.
- Reflexion
- Debate
- Chain of Thought
- Tree of Thought
- Chain of Verification
- blog
- pdf2
- pdf
- pytorch code
- pdf
- Accelerated Speculative Sampling (ASpS) with Tree Monte Carlo
- pdf
- pdf
- blog
- pdf
- PyTorch blog
- pdf
- pdf
- pdf
- inverse scaling
Continual Learning
- pdf
- pdf
- pdf2
- pdf3
- pdf4
- official jax code
- pdf2
- pdf
- pdf
- pdf
- pdf
- pdf3
- pdf4
Tools & Agents
- blog
- blog
- demo
- pdf
- pdf2
- demo
- demo
- code
- AutoGPT
- GPT-Engineer
- pdf
- pdf2
Games
- pdf
- pdf2
- video
- pdf
- pdf2
- film
- blog
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
- pdf2
Pre-cursors to Adaptive Computation
- pdf
- pdf
- pdf
- pdf
- blog
- ARC challenge - interview-jack-cole/)
- pdf2
Open Source Libraries
- pdf
- blog
- pdf
- official PyTorch code
- pdf
- code
- official framework-agnostic code
- pytorch code
- pdf
- pdf
- pdf
- pdf
- pdf
- pdf
AI Safety
Scaling Laws
More Compute Per Output Token
- pytorch code
- pdf
- pdf
- pdf
- video
Other
- blog
- pdf
- pdf
- pdf
- pdf

Programming Languages

Python 19 Jupyter Notebook 2

Categories

Mixture of Experts (Sparse MoE) 76 Games 54 Adaptive Computation for Black-box models 52 Other Modular Architectures 19 Early Exit: End-to-End Adaptive Computation 19 Open Source Libraries 14 Scaling Laws 13 Continual Learning 13 Tools & Agents 12 Pre-cursors to Adaptive Computation 7 More Compute Per Output Token 5 Other 5 AI Safety 4 About 2

Sub Categories

Keywords

mixture-of-experts 2 language-model 2 llm 2 large-vision-language-model 1 moe 1 multi-modal 1 large-language-models 1 llm-inference 1 speculative-decoding 1 colab-notebook 1 deep-learning 1 google-colab 1 offloading 1 pytorch 1 quantization 1 diffusion-models 1 text 1 agent 1 agent-based-model 1 ai 1 cybersecurity 1 developer-tools 1 lms 1