awesome-adaptive-computation
  
  
    A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE). 
    https://github.com/koayon/awesome-adaptive-computation
  
        Last synced: 3 days ago 
        JSON representation
    
- 
            
Mixture of Experts (Sparse MoE)
- c-BTM
 - Mixtral-8x7B
 - PyTorch code
 - DeMix pdf
 - Task-MoE pdf
 - official PyTorch code
 - review paper
 - PyTorch code
 - model
 - One Wide Feedforward paper
 - code
 - code
 - official code
 - code
 - official Jax code
 - HydraMoE
 - official PyTorch code
 - c-BTM code
 - models
 - MoE-Infinity - Infinity)
 - Branch, Train, Mix (BTX)
 - JetMoE
 - Branch, Train, Mix (BTX)
 - Multi-gate
 - Gemini 1.5 Pro - dbrx-new-state-art-open-llm) is another powerful MoE model and it seems that MoE is now the go-to architecture for large models.
 - here
 - code
 - Dynamic Routing in MoEs
 - MoE-Infinity - Infinity)
 - DeMix pdf
 - review paper
 - One Wide Feedforward paper
 
 - 
            
About
- System 2 - ->
 
 - 
            
Other Modular Architectures
 - 
            
Early Exit: End-to-End Adaptive Computation
 - 
            
Adaptive Computation for Black-box models
- Reflexion
 - Debate
 - Chain of Thought
 - Tree of Thought
 - Chain of Verification
 - pdf2
 - pdf2
 - blog
 - Online Speculative Decoding
 - inverse scaling
 - PyTorch code
 - pdf2
 - PyTorch code
 - Online Speculative Decoding
 - Recurrent Drafter
 - large n-gram models
 - blog
 - model mapping
 - here
 - Recurrent Drafter
 - large n-gram models
 - REST - on tokens from the web for the speculative decoding head.
 - pytorch code
 - Accelerated Speculative Sampling (ASpS) with Tree Monte Carlo
 - blog
 - PyTorch blog
 - inverse scaling
 
 - 
            
Continual Learning
 - 
            
Tools & Agents
 - 
            
Games
 - 
            
Pre-cursors to Adaptive Computation
 - 
            
Open Source Libraries
 - 
            
AI Safety
 - 
            
Scaling Laws
 - 
            
More Compute Per Output Token
 - 
            
Other
 
            Programming Languages
          
          
        
            Categories
          
          
              
                Mixture of Experts (Sparse MoE)
                68
              
              
                Games
                53
              
              
                Adaptive Computation for Black-box models
                45
              
              
                Other Modular Architectures
                18
              
              
                Early Exit: End-to-End Adaptive Computation
                15
              
              
                Continual Learning
                12
              
              
                Open Source Libraries
                11
              
              
                Scaling Laws
                9
              
              
                Tools & Agents
                9
              
              
                Pre-cursors to Adaptive Computation
                6
              
              
                More Compute Per Output Token
                5
              
              
                Other
                4
              
              
                AI Safety
                3
              
              
                About
                1
              
          
        
            Sub Categories