Awesome_Efficient_LRM_Reasoning

😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
https://github.com/XiaoYee/Awesome_Efficient_LRM_Reasoning

Last synced: 33 minutes ago
JSON representation

🔔 News
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-05 - following ability. Therefore, efficient reasoning may also be important for **instruction following** in LRMs.
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
- 2025-04 - yiKjoY2yp-kAmUNw).
- 2025-03
🚀 Papers
- 💭 Efficient Reasoning during Inference
  - How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - 2025.03-red)
  - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching - 2025.03-red)
  - Learning Adaptive Parallel Reasoning with Language Models - 2025.04-red)
  - SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning - 2025.04-red)
  - Reasoning Models Can Be Effective Without Thinking - 2025.04-red)
  - Fast-Slow-Thinking: Complex Task Solving with Large Language Models - 2025.04-red)
  - Scalable Best-of-N Selection for Large Language Models via Self-Certainty - 2025.02-red)
  - Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition - 2025.05-red)
  - DynamicMind: A Tri-Mode Thinking System for Large Language Models - 2025.06-red)
  - Accelerated Test-Time Scaling with Model-Free Speculative Sampling - 2025.06-red)
  - Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens - 2025.05-red)
  - Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting - 2025.05-red)
  - AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting - 2025.05-red)
  - The benefits of a concise chain of thought on problem-solving in large language models - 2024.01-red)
  - Guiding language model reasoning with planning tokens - 2023.10-red)
  - Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking - 2025.01-red)
  - Dynamic Early Exit in Reasoning Models - 2025.04-red)
  - Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models - 2025.04-red)
  - SplitReason: Learning To Offload Reasoning - 2025.04-red)
  - Chain of Draft: Thinking Faster by Writing Less - 2025.02-red)
  - SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities - 2025.02-red)
  - s1: Simple test-time scaling - 2025.01-red)
  - Token-budget-aware llm reasoning - 2024.12-red)
  - Efficiently Serving LLM Reasoning Programs with Certaindex - 2024.12-red)
  - Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning - 2024.08-red)
  - Scaling llm test-time compute optimally can be more effective than scaling model parameters - 2024.08-red)
  - Concise thoughts: Impact of output length on llm reasoning and cost - 2024.07-red)
  - The impact of reasoning step length on large language models - 2024.01-red)
  - Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces - 2024.10-red)
  - Visual Agents as Fast and Slow Thinkers - 2024.08-red)
  - System-1.x: Learning to Balance Fast and Slow Planning with Language Models - 2024.07-red)
  - DynaThink: Fast or slow? A dynamic decision-making framework for large language models - 2024.07-red)
  - MixLLM: Dynamic Routing in Mixed Large Language Models - 2025.02-red)
  - Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding - 2024.11-red)
  - EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees - 2024.06-red)
  - RouteLLM: Learning to Route LLMs with Preference Data - 2024.06-red)
  - LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding - 2024.04-red)
  - EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty - 2024.01-red)
  - Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads - 2024.01-red)
  - Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models - 2023.11-red)
  - Speculative Decoding with Big Little Decoder - 2023.02-red)
  - Unlocking efficient long-to-short llm reasoning with model merging - 2025.03-red)
  - Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding - 2025.03-red)
  - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models - 2025.02-red)
  - Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback - 2025.01-red)
  - Fast Best-of-N Decoding via Speculative Rejection - 2024.10-red)
  - TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling - 2024.10-red)
- 🧩 Efficient Reasoning with Reinforcement Learning
  - ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning - 2025.04-red)
  - HAWKEYE: Efficient Reasoning with Model Collaboration - 2025.04-red)
  - Concise Reasoning via Reinforcement Learning - 2025.04-red)
  - How Far Are We from Optimal Reasoning Efficiency? - 2025.06-red)
  - ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models - 2025.05-red)
  - HAWKEYE: Efficient Reasoning with Model Collaboration - 2025.04-red)
  - ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning - 2025.04-red)
  - Think When You Need: Self-Adaptive Chain-of-Thought Learning - 2025.04-red)
  - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models - 2025.03-red)
  - L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning - 2025.03-red)
  - When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning - 2025.05-red)
  - Learn to Reason Efficiently with Adaptive Length-based Reward Shaping - 2025.05-red)
  - Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning - 2025.05-red)
  - ARM: Adaptive Reasoning Model - 2025.05-red)
  - Demystifying Long Chain-of-Thought Reasoning in LLMs - 2025.02-red)
  - Training Language Models to Reason Efficiently - 2025.02-red)
  - O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning - 2025.01-red)
  - Kimi k1.5: Scaling Reinforcement Learning with LLMs - 2025.01-red)
  - Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning - 2025.03-red)
  - Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization - 2025.01-red)
- 🔖 Future Directions
  - Fast-Slow Thinking for Large Vision-Language Model Reasoning - 2025.04-red)
  - Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems - 2024.10-red)
  - S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models - 2025.04-red)
  - MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency - 2025.02-red)
  - Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tunin - 2025.05-red)
  - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs - 2024.12-red)
  - Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models - 2025.05-red)
  - Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? - 2025.03-red)
  - Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
  - Value-Guided Search for Efficient Chain-of-Thought Reasoning - 2025.05-red)
  - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling - 2025.05-red)
  - Efficient Test-Time Scaling via Self-Calibration - 2025.03-red)
  - Dynamic self-consistency: Leveraging reasoning paths for efficient llm sampling - 2024.08-red)
  - X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability - 2025.02-red)
  - Deliberative alignment: Reasoning enables safer language models - 2024.12-red)
  - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks - 2025.02-red)
  - Chain-of-Retrieval Augmented Generation - 2025.01-red)
  - DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs - 2025.03-red)
  - THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models - 2025.04-red)
- 💫 Efficient Reasoning with SFT
  - Z1: Efficient Test-time Scaling with Code - 2025.04-red)
  - Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models - 2025.05-red)
  - Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models - 2025.04-red)
  - DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models - 2025.05-red)
  - From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step - 2024.05-red)
  - CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation - 2025.02-red)
  - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning - 2025.02-red)
  - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs - 2025.02-red)
  - LightThinker: Thinking Step-by-Step Compression - 2025.02-red)
  - Efficient Reasoning with Hidden Thinking - 2025.01-red)
  - Training Large Language Models to Reason in a Continuous Latent Space - 2024.12-red)
  - Compressed Chain of Thought: Efficient Reasoning Through Dense Representations - 2024.12-red)
  - Self-Training Elicits Concise Reasoning in Large Language Models - 2025.02-red)
  - TokenSkip: Controllable Chain-of-Thought Compression in LLMs - 2025.02-red)
  - Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models - 2025.02-red)
  - C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness - 2024.12-red)
  - Can Language Models Learn to Skip Steps? - 2024.11-red)
  - Distilling System 2 into System 1 - 2024.07-red)
- 💬 Efficient Reasoning during Pre-training
  - LLM Pretraining with Continuous Concepts - 2025.02-red)
  - Scalable Language Models with Posterior Inference of Latent Thought Vectors - 2025.02-red)
  - Byte latent transformer: Patches scale better than tokens - 2024.12-red)
  - M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models - 2025.04-red)
  - Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models - 2025.04-red)
  - Compositional Reasoning with Transformers, RNNs, and Chain of Thought - 2025.03-red)
  - Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning - 2025.03-red)
  - Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners - 2025.02-red)
  - Large Concept Models: Language Modeling in a Sentence Representation Space - 2024.12-red)
  - RWKV-7 "Goose" with Expressive Dynamic State Evolution - 2025.03-red)
  - LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid - 2025.02-red)
  - Native sparse attention: Hardware-aligned and natively trainable sparse attention - 2025.02-red)
  - MoBA: Mixture of Block Attention for Long-Context LLMs - 2025.02-red)
  - MoM: Linear Sequence Modeling with Mixture-of-Memories - 2025.02-red)
  - Gated Delta Networks: Improving Mamba2 with Delta Rule - 2024.12-red)
  - Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality - 2024.05-red)
  - Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention - 2024.05-red)
  - Gated linear attention transformers with hardware-efficient training - 2023.12-red)
  - Liger: Linearizing Large Language Models to Gated Recurrent Structures - 2025.03-red)
  - Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing - 2025.02-red)
  - LoLCATs: On Low-Rank Linearizing of Large Language Models - 2024.10-red)
  - The Mamba in the Llama: Distilling and Accelerating Hybrid Models - 2024.08-red)
Resources
- 🔖 Future Directions
🎉 Contribution
- Contributors
  - ![Star History Chart - history.com/#XiaoYee/Awesome_Efficient_LRM_Reasoning&Date) -->

Programming Languages

Categories

🚀 Papers 126 🔔 News 18 Resources 7 🎉 Contribution 1

Sub Categories

💭 Efficient Reasoning during Inference 47 🔖 Future Directions 26 💬 Efficient Reasoning during Pre-training 22 🧩 Efficient Reasoning with Reinforcement Learning 20 💫 Efficient Reasoning with SFT 18 Contributors 1

Keywords

large-reasoning-models 1 large-language-models 1 efficiency 1 efficient-reasoning 1 chain-of-thought 1