Awesome_Efficient_LRM_Reasoning
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
https://github.com/XiaoYee/Awesome_Efficient_LRM_Reasoning
Last synced: 5 days ago
JSON representation
-
🔔 News
-
🚀 Papers
-
💠Efficient Reasoning during Inference
- How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - 2025.03-red)
- Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching - 2025.03-red)
- SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning - 2025.04-red)
- Reasoning Models Can Be Effective Without Thinking - 2025.04-red)
- Fast-Slow-Thinking: Complex Task Solving with Large Language Models - 2025.04-red)
- Scalable Best-of-N Selection for Large Language Models via Self-Certainty - 2025.02-red)
- Chain of Draft: Thinking Faster by Writing Less - 2025.02-red)
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities - 2025.02-red)
- s1: Simple test-time scaling - 2025.01-red)
- Token-budget-aware llm reasoning - 2024.12-red)
- Efficiently Serving LLM Reasoning Programs with Certaindex - 2024.12-red)
- Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning - 2024.08-red)
- Scaling llm test-time compute optimally can be more effective than scaling model parameters - 2024.08-red)
- Concise thoughts: Impact of output length on llm reasoning and cost - 2024.07-red)
- The impact of reasoning step length on large language models - 2024.01-red)
- The benefits of a concise chain of thought on problem-solving in large language models - 2024.01-red)
- Guiding language model reasoning with planning tokens - 2023.10-red)
- Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking - 2025.01-red)
- Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces - 2024.10-red)
- Visual Agents as Fast and Slow Thinkers - 2024.08-red)
- System-1.x: Learning to Balance Fast and Slow Planning with Language Models - 2024.07-red)
- DynaThink: Fast or slow? A dynamic decision-making framework for large language models - 2024.07-red)
- MixLLM: Dynamic Routing in Mixed Large Language Models - 2025.02-red)
- Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding - 2024.11-red)
- EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees - 2024.06-red)
- RouteLLM: Learning to Route LLMs with Preference Data - 2024.06-red)
- LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding - 2024.04-red)
- EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty - 2024.01-red)
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads - 2024.01-red)
- Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models - 2023.11-red)
- Speculative Decoding with Big Little Decoder - 2023.02-red)
- Unlocking efficient long-to-short llm reasoning with model merging - 2025.03-red)
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding - 2025.03-red)
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models - 2025.02-red)
- Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback - 2025.01-red)
- Fast Best-of-N Decoding via Speculative Rejection - 2024.10-red)
- TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling - 2024.10-red)
-
💫 Efficient Reasoning with SFT
- Z1: Efficient Test-time Scaling with Code - 2025.04-red)
- Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models - 2025.04-red)
- From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step - 2024.05-red)
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation - 2025.02-red)
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning - 2025.02-red)
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs - 2025.02-red)
- LightThinker: Thinking Step-by-Step Compression - 2025.02-red)
- Efficient Reasoning with Hidden Thinking - 2025.01-red)
- Training Large Language Models to Reason in a Continuous Latent Space - 2024.12-red)
- Compressed Chain of Thought: Efficient Reasoning Through Dense Representations - 2024.12-red)
- Self-Training Elicits Concise Reasoning in Large Language Models - 2025.02-red)
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs - 2025.02-red)
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models - 2025.02-red)
- C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness - 2024.12-red)
- Can Language Models Learn to Skip Steps? - 2024.11-red)
- Distilling System 2 into System 1 - 2024.07-red)
-
🧩 Efficient Reasoning with Reinforcement Learning
- Concise Reasoning via Reinforcement Learning - 2025.04-red)
- HAWKEYE: Efficient Reasoning with Model Collaboration - 2025.04-red)
- ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning - 2025.04-red)
- Think When You Need: Self-Adaptive Chain-of-Thought Learning - 2025.04-red)
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models - 2025.03-red)
- Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization - 2025.01-red)
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning - 2025.03-red)
- Demystifying Long Chain-of-Thought Reasoning in LLMs - 2025.02-red)
- Training Language Models to Reason Efficiently - 2025.02-red)
- O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning - 2025.01-red)
- Kimi k1.5: Scaling Reinforcement Learning with LLMs - 2025.01-red)
- Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning - 2025.03-red)
-
💬 Efficient Reasoning during Pre-training
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models - 2025.04-red)
- LLM Pretraining with Continuous Concepts - 2025.02-red)
- Scalable Language Models with Posterior Inference of Latent Thought Vectors - 2025.02-red)
- Byte latent transformer: Patches scale better than tokens - 2024.12-red)
- Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models - 2025.04-red)
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought - 2025.03-red)
- Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning - 2025.03-red)
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners - 2025.02-red)
- Large Concept Models: Language Modeling in a Sentence Representation Space - 2024.12-red)
- RWKV-7 "Goose" with Expressive Dynamic State Evolution - 2025.03-red)
- LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid - 2025.02-red)
- Native sparse attention: Hardware-aligned and natively trainable sparse attention - 2025.02-red)
- MoBA: Mixture of Block Attention for Long-Context LLMs - 2025.02-red)
- MoM: Linear Sequence Modeling with Mixture-of-Memories - 2025.02-red)
- Gated Delta Networks: Improving Mamba2 with Delta Rule - 2024.12-red)
- Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality - 2024.05-red)
- Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention - 2024.05-red)
- Gated linear attention transformers with hardware-efficient training - 2023.12-red)
- Liger: Linearizing Large Language Models to Gated Recurrent Structures - 2025.03-red)
- Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing - 2025.02-red)
- LoLCATs: On Low-Rank Linearizing of Large Language Models - 2024.10-red)
- The Mamba in the Llama: Distilling and Accelerating Hybrid Models - 2024.08-red)
-
🔖 Future Directions
- Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems - 2024.10-red)
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models - 2025.04-red)
- MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency - 2025.02-red)
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs - 2024.12-red)
- Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? - 2025.03-red)
- Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
- Efficient Test-Time Scaling via Self-Calibration - 2025.03-red)
- Dynamic self-consistency: Leveraging reasoning paths for efficient llm sampling - 2024.08-red)
- X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability - 2025.02-red)
- Deliberative alignment: Reasoning enables safer language models - 2024.12-red)
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks - 2025.02-red)
- Chain-of-Retrieval Augmented Generation - 2025.01-red)
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs - 2025.03-red)
-
-
Resources
-
🎉 Contribution
-
Contributors
- ![Star History Chart - history.com/#XiaoYee/Awesome_Efficient_LRM_Reasoning&Date) -->
-
Programming Languages
Categories
Sub Categories