awesome-adaptive-computation
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
https://github.com/koayon/awesome-adaptive-computation
Last synced: about 1 hour ago
JSON representation
-
Mixture of Experts (Sparse MoE)
- c-BTM
- Mixtral-8x7B
- PyTorch code
- DeMix pdf
- Task-MoE pdf
- official PyTorch code
- review paper
- PyTorch code
- model
- One Wide Feedforward paper
- code
- code
- official code
- code
- official Jax code
- HydraMoE
- official PyTorch code
- c-BTM code
- blog
- models
- MoE-Infinity - Infinity)
- Branch, Train, Mix (BTX)
- JetMoE
- Branch, Train, Mix (BTX)
- MoEfication - to-MoE converted models by: 1) showing that the efficiency of the resulting model can be significantly enhanced by enforcement of activation sparsity in the base model; 2) proposing Expert Contribution Routing, a novel objective for the training of the gating networks, which are now tasked to predict the output norm of each expert for the given input, enabling approximation of each expert’s relative contribution; 3) introducing dynamic-k gating, which allows the model to appropriately distribute its computational budget between easy and hard inputs; 4) extending the proposed conversion scheme to any linear layers such as multi-head attention projections.
- Task-MoE pdf
- ELMForest - Branch, Train, Merge (BTM)
- c-BTM
- Multi-gate
- Gemini 1.5 Pro - dbrx-new-state-art-open-llm) is another powerful MoE model and it seems that MoE is now the go-to architecture for large models.
- here
- code
- Dynamic Routing in MoEs
- MoE-Infinity - Infinity)
- DeMix pdf
- review paper
- One Wide Feedforward paper
-
About
- System 2 - ->
- The Bitter Lesson - trained models focus on `learning` at _train time_, Adaptive Computation is about spending more compute at inference time with mechanisms similar to `Search`. -->
-
Other Modular Architectures
-
Early Exit: End-to-End Adaptive Computation
-
Adaptive Computation for Black-box models
- Reflexion
- Debate
- Chain of Thought
- Tree of Thought
- Chain of Verification
- pdf2
- pdf2
- blog
- Online Speculative Decoding
- inverse scaling
- PyTorch code
- pdf2
- PyTorch code
- Online Speculative Decoding
- Recurrent Drafter
- large n-gram models
- blog
- model mapping
- here
- Recurrent Drafter
- large n-gram models
- REST - on tokens from the web for the speculative decoding head.
- Reflexion
- Debate
- Chain of Thought
- Tree of Thought
- Chain of Verification
- blog
- pdf2
- pytorch code
- Accelerated Speculative Sampling (ASpS) with Tree Monte Carlo
- blog
- PyTorch blog
- inverse scaling
-
Continual Learning
-
Tools & Agents
-
Games
-
Pre-cursors to Adaptive Computation
-
Open Source Libraries
-
AI Safety
-
Scaling Laws
-
More Compute Per Output Token
-
Other
Programming Languages
Categories
Mixture of Experts (Sparse MoE)
76
Games
54
Adaptive Computation for Black-box models
52
Other Modular Architectures
19
Early Exit: End-to-End Adaptive Computation
19
Open Source Libraries
14
Scaling Laws
13
Continual Learning
13
Tools & Agents
12
Pre-cursors to Adaptive Computation
7
More Compute Per Output Token
5
Other
5
AI Safety
4
About
2
Sub Categories
Keywords
mixture-of-experts
2
language-model
2
llm
2
large-vision-language-model
1
moe
1
multi-modal
1
large-language-models
1
llm-inference
1
speculative-decoding
1
colab-notebook
1
deep-learning
1
google-colab
1
offloading
1
pytorch
1
quantization
1
diffusion-models
1
text
1
agent
1
agent-based-model
1
ai
1
cybersecurity
1
developer-tools
1
lms
1