An open API service indexing awesome lists of open source software.

Awesome-Efficient-Reasoning-Models

[Arxiv 2025] Efficient Reasoning Models: A Survey
https://github.com/fscdc/Awesome-Efficient-Reasoning-Models

Last synced: 3 days ago
JSON representation

  • Full list

    • Build SLM with Strong Reasoning Ability

      • Llama-Nemotron: Efficient Reasoning Models
      • Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math - Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen |<img width="1002" alt="image" src="figures/phi_4_mini_reasoning.png"> |[Paper](https://arxiv.org/abs/2504.21233)|[//]: #05/02
      • Phi-4-reasoning Technical Report
      • Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
      • Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
      • ![Publish
      • ![Publish - main.333/) <br> Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang |<img width="1002" alt="image" src="figures/counterfactual_distillation.png"> |[Paper](https://aclanthology.org/2024.emnlp-main.333/)| [//]: #04/08
      • ![Star - Model-Gap/Small-Model-Learnability-Gap)<br>[Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
      • Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
      • ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
      • Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
      • ![Publish - main.1140.pdf) <br> Yichun Zhao, Shuheng Zhou, Huijia Zhu |<img width="1002" alt="image" src="figures/prr.png"> |[Paper](https://aclanthology.org/2024.lrec-main.1140.pdf)| [//]: #04/08
      • ![Publish
      • ![Publish
      • ![Star - Model-Gap/Small-Model-Learnability-Gap)<br>[Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
      • Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
      • ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
      • Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
      • ![Publish - main.1140.pdf) <br> Yichun Zhao, Shuheng Zhou, Huijia Zhu |<img width="1002" alt="image" src="figures/prr.png"> |[Paper](https://aclanthology.org/2024.lrec-main.1140.pdf)| [//]: #04/08
      • Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
      • Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
      • Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
      • Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
      • ![Star - NLP/Distilling-CoT-Reasoning)<br>[Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
      • ![Star - NLP/Distilling-CoT-Reasoning)<br>[Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
      • Towards Reasoning Ability of Small Language Models
      • ![Star - Reasoning-Models)<br>[Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
      • When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
      • ![Star - rs)<br>[Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
      • ![Star - nlp/simpleRL-reason)<br>[SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
      • DeepScaleR - project.com/)
      • Towards Reasoning Ability of Small Language Models
      • ![Star - Reasoning-Models)<br>[Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
      • When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
      • ![Star - rs)<br>[Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
      • ![Star - nlp/simpleRL-reason)<br>[SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
      • DeepScaleR - project.com/)
      • ![Star - wang/Tina)<br>[Tina: Tiny Reasoning Models via LoRA](https://arxiv.org/abs/2504.15777) <br> Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15777v1/x4.png"> |[Github](https://github.com/shangshang-wang/Tina) <br> [Paper](https://arxiv.org/abs/2504.15777)| [//]: #04/25
    • Make Long CoT Short

    • Background Papers

      • Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
      • Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
      • ![Star - NLPIR/WebThinker)<br>[WebThinker: Empowering Large Reasoning Models with Deep Research Capability](https://arxiv.org/abs/2504.21776) <br> Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou |<img width="1002" alt="image" src="figures/webthinker.png"> |[Github](https://github.com/RUC-NLPIR/WebThinker) <br> [Paper](https://arxiv.org/abs/2504.21776)|[//]: #05/02
      • ![Star
      • ![Star - Shot-RLVR)<br>[Reinforcement Learning for Reasoning in Large Language Models with One Training Example](https://arxiv.org/abs/2504.20571) <br> Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2504.20571v1/x3.png"> |[Github](https://github.com/ypwang61/One-Shot-RLVR) <br> [Paper](https://arxiv.org/abs/2504.20571)|[//]: #04/30
      • Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision - Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu |<img width="1002" alt="image" src="figures/eorm.png"> |[Paper](https://arxiv.org/abs/2505.14999)| [//]: #05/22
      • AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
      • ![Star - coai/BARREL)<br>[BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs](https://arxiv.org/abs/2505.13529) <br> Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang |<img width="1002" alt="image" src="figures/BARREL.png"> |[Github](https://github.com/thu-coai/BARREL) <br> [Paper](https://arxiv.org/abs/2505.13529)| [//]: #05/23
      • ![Star - tango)<br>[RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning](https://arxiv.org/abs/2505.15034) <br> Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi |<img width="1002" alt="image" src="figures/Tango.png"> |[Github](https://github.com/kaiwenzha/rl-tango) <br> [Paper](https://arxiv.org/abs/2505.15034)| [//]: #05/23
      • Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
      • Reasoning Models Better Express Their Confidence
      • Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
      • ![Star - Enhanced Reinforcement Learning](https://arxiv.org/abs/2505.12996) <br> Jiaan Wang, Fandong Meng, Jie Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12996v1/x4.png"> |[Github](https://github.com/krystalan/DRT) <br> [Paper](https://arxiv.org/abs/2505.12996)| [//]: #05/20
      • Absolute Zero: Reinforced Self-play Reasoning with Zero Data
      • ![Star
      • ![Star - Ability-Alignment)<br>[Beyond Aha!: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models](https://arxiv.org/abs/2505.10554) <br> Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10554v1/x2.png"> |[Github](https://github.com/zhiyuanhubj/Meta-Ability-Alignment) <br> [Paper](https://arxiv.org/abs/2505.10554)| [//]: #05/19
      • ![Star
      • The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
      • Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models - Jun Qi |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10446v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10446)| [//]: #05/18
      • J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
      • INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
      • AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
      • ![Star - of-Thought Tokens are Computer Program Variables](https://arxiv.org/abs/2505.04955) <br> Fangwei Zhu, Peiyi Wang, Zhifang Sui |<img width="1002" alt="image" src="https://arxiv.org/html/2505.04955v1/x2.png"> |[Github](https://github.com/solitaryzero/CoTs_are_Variables) <br> [Paper](https://arxiv.org/abs/2505.04955)| [//]: #05/17
      • ![Star - - From Pretraining to Posttraining](https://arxiv.org/abs/2505.07608) <br> Xiaomi LLM-Core Team |<img width="1002" alt="image" src="https://arxiv.org/html/2505.07608v1/x1.png"> |[Github](https://github.com/xiaomimimo/MiMo) <br> [Paper](https://arxiv.org/abs/2505.07608)| [//]: #05/17
      • ![Star - of-RLVR)<br>[Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) <br> Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.13837v1/x1.png"> |[Github](https://github.com/LeapLabTHU/limit-of-RLVR) <br> [Paper](https://arxiv.org/abs/2504.13837)| [//]: #04/22
      • ![Publish - of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
      • ![Star - nlp/tree-of-thought-llm) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2023-blue)]()<br>[Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) <br> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) <br> [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
      • ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-AAAI_2024-blue)]()<br>[Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) <br> Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) <br> [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
      • ![Publish - Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) <br> Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou |<img width="1002" alt="image" src="figures/sc.png"> |[Paper](https://arxiv.org/abs/2203.11171)| [//]: #04/08
      • ![Publish - of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
      • ![Star - nlp/tree-of-thought-llm) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2023-blue)]()<br>[Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) <br> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) <br> [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
      • ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-AAAI_2024-blue)]()<br>[Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) <br> Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) <br> [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
      • ![Star - of-symbol-planning) [![Publish](https://img.shields.io/badge/Conference-COLM_2024-blue)]()<br>[Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) <br> Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) <br> [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
      • Thinking Machines: A Survey of LLM based Reasoning Strategies
      • ![Star - System2-Reasoning-LLM)<br>[From System 1 to System 2: A Survey of Reasoning Large Language Models](https://arxiv.org/abs/2502.17419) <br> Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.17419v2/extracted/6232702/images/timeline.png"> |[Github](https://github.com/zzli2022/Awesome-System2-Reasoning-LLM) <br> [Paper](https://arxiv.org/abs/2502.17419)| [//]: #04/08
      • ![Star - AI-Lab/Program-of-Thoughts) [![Publish](https://img.shields.io/badge/Conference-TMLR_2023-blue)]()<br>[Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588) <br> Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen |<img width="1002" alt="image" src="figures/pot.png"> |[Github](https://github.com/TIGER-AI-Lab/Program-of-Thoughts) <br> [Paper](https://arxiv.org/abs/2211.12588)| [//]: #04/08
      • ![Star - of-symbol-planning) [![Publish](https://img.shields.io/badge/Conference-COLM_2024-blue)]()<br>[Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) <br> Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) <br> [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
      • Thinking Machines: A Survey of LLM based Reasoning Strategies
      • Resa: Transparent Reasoning Models via SAEs
    • Let Decoding More Efficient

      • Reward Reasoning Model
      • Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
      • Control-R: Towards controllable test-time scaling
      • Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
      • First Finish Search: Efficient Test-Time Scaling in Large Language Models
      • LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
      • Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones - Adsera |<img width="1002" alt="image" src="https://arxiv.org/html/2505.21825v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.21825)| [//]: #06/11
      • Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
      • ![Star - guided-search)<br>[Value-Guided Search for Efficient Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.17373) <br> Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17373v1/x2.png"> |[Github](https://github.com/kaiwenw/value-guided-search) <br> [Paper](https://arxiv.org/abs/2505.17373)| [//]: #06/11
      • Accelerated Test-Time Scaling with Model-Free Speculative Sampling
      • Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
      • Fractured Chain-of-Thought Reasoning
      • Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
      • Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
      • ![Star - Group/AlphaOne)<br>[AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time](https://arxiv.org/abs/2505.24863) <br> Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.24863v1/x27.png"> |[Github](https://github.com/ASTRAL-Group/AlphaOne) <br> [Paper](https://arxiv.org/abs/2505.24863)| [//]: #06/13
      • ProxyThinker: Test-Time Guidance through Small Visual Reasoners
      • A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
      • Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
      • Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
      • ![Star - dev/ReasoningPathCompression)<br>[Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning](https://arxiv.org/abs/2505.13866) <br> Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim |<img width="1002" alt="image" src="figures/rpc_new.png"> |[Github](https://github.com/jiwonsong-dev/ReasoningPathCompression) <br> [Paper](https://arxiv.org/abs/2505.13866)| [//]: #05/22
      • RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
      • Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity - Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.11107v1/extracted/6445446/figures/gt_main_new.png"> |[Paper](https://arxiv.org/abs/2505.11107)| [//]: #05/19
      • ![Star - ACL_main_2025-blue)]()<br>[Rethinking Repetition Problems of LLMs in Code Generation](https://arxiv.org/abs/2505.10402) <br> Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |<img width="1002" alt="image" src="figures/code_repeat.png"> |[Github](https://github.com/LYC127/RPG) <br> [Paper](https://arxiv.org/abs/2505.10402)| [//]: #05/18
      • Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
      • ![Star - IJCAI-blue)]()<br>[Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning](https://arxiv.org/abs/2505.06321) <br> Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu |<img width="1002" alt="image" src="figures/learn2think.png"> |[Github](https://github.com/zch65458525/L2T) <br> [Paper](https://arxiv.org/abs/2505.06321)| [//]: #05/17
      • ![Star - Shanghai/xVerify)<br>[xVerify: Efficient Answer Verifier for Reasoning Model Evaluations](https://arxiv.org/abs/2504.10481) <br> Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) <br> [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
      • ![Star - Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) <br> Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) <br> [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
      • ![Star - ICLR_2024-blue)]()<br>[Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) <br> [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
      • Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
      • ![Star - Shanghai/xVerify)<br>[xVerify: Efficient Answer Verifier for Reasoning Model Evaluations](https://arxiv.org/abs/2504.10481) <br> Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) <br> [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
      • ![Star - Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) <br> Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) <br> [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
      • ![Star - ICLR_2024-blue)]()<br>[Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) <br> [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
      • Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
      • ![Star - Guided Speculative Decoding for Efficient LLM Reasoning](https://arxiv.org/abs/2501.19324) <br> Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) <br> [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
      • Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
      • ![Star - Time Scaling](https://arxiv.org/abs/2502.12018) <br> Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) <br> [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
      • ![Star - NAACL_Findings_2025-blue)]()<br>[Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) <br> Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) <br> [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
      • Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
      • Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
      • Confidence Improves Self-Consistency in LLMs
      • ![Star - Huang/Self-Calibration)<br>[Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031) <br> Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) <br> [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
      • ![Star - Labs/SpeculativeRejection) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]()<br>[Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) <br> Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) <br> [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
      • Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
      • FastMCTS: A Simple Sampling Strategy for Data Synthesis
      • ![Star - github-00/LLM-Predictive-Decoding) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) <br> Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) <br> [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
      • ![Star - NAACL_Findings_2025-blue)]()<br>[Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) <br> Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) <br> [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
      • Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
      • Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
      • Confidence Improves Self-Consistency in LLMs
      • ![Star - Huang/Self-Calibration)<br>[Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031) <br> Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) <br> [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
      • ![Star - Labs/SpeculativeRejection) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]()<br>[Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) <br> Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) <br> [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
      • Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
      • FastMCTS: A Simple Sampling Strategy for Data Synthesis
      • ![Star - github-00/LLM-Predictive-Decoding) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) <br> Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) <br> [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
      • ![Star - taught-lookahead)<br>[Language Models can Self-Improve at State-Value Estimation for Better Search](https://arxiv.org/abs/2503.02878) <br> Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) <br> [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
      • ![Star - Decoding)<br>[ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation](https://arxiv.org/abs/2503.13288) <br> Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) <br> [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
      • Dynamic Parallel Tree Search for Efficient LLM Reasoning
      • ![Star - taught-lookahead)<br>[Language Models can Self-Improve at State-Value Estimation for Better Search](https://arxiv.org/abs/2503.02878) <br> Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) <br> [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
      • ![Star - Decoding)<br>[ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation](https://arxiv.org/abs/2503.13288) <br> Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) <br> [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
      • Dynamic Parallel Tree Search for Efficient LLM Reasoning
      • ![Star
      • ![Star - Reasoning/APR)<br>[Learning Adaptive Parallel Reasoning with Language Models](https://arxiv.org/abs/2504.15466) <br> Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) <br> [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
      • ![Star - research/sot) [![Publish](https://img.shields.io/badge/Conference-ICLR_2024-blue)]()<br>[Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) <br> Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) <br> [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
      • Adaptive Skeleton Graph Decoding
      • ![Star
      • ![Star - Reasoning/APR)<br>[Learning Adaptive Parallel Reasoning with Language Models](https://arxiv.org/abs/2504.15466) <br> Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) <br> [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
      • THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
      • ![Star - research/sot) [![Publish](https://img.shields.io/badge/Conference-ICLR_2024-blue)]()<br>[Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) <br> Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) <br> [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
      • Adaptive Skeleton Graph Decoding
      • ![Star - Guided Speculative Decoding for Efficient LLM Reasoning](https://arxiv.org/abs/2501.19324) <br> Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) <br> [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
      • Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
      • ![Star - Time Scaling](https://arxiv.org/abs/2502.12018) <br> Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) <br> [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
      • DISC: Dynamic Decomposition Improves LLM Inference Scaling
      • From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
      • DISC: Dynamic Decomposition Improves LLM Inference Scaling
      • From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
      • ![Star - structured Reasoning of Multimodal Large Models?](https://arxiv.org/abs/2503.06252) <br> Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) <br> [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
      • ![Star - wyz/inference_scaling) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) <br> Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) <br> [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
      • ![Star - AIRe/MRT)<br>[Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning](https://arxiv.org/abs/2503.07572) <br> Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) <br> [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
      • ![Star - Time Compute via Speculative Reasoning](https://arxiv.org/abs/2504.07891) <br> Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) <br> [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
      • ![Star - structured Reasoning of Multimodal Large Models?](https://arxiv.org/abs/2503.06252) <br> Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) <br> [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
      • ![Star - Time Compute via Speculative Reasoning](https://arxiv.org/abs/2504.07891) <br> Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) <br> [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
      • ![Star - wyz/inference_scaling) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) <br> Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) <br> [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
      • ![Star - AIRe/MRT)<br>[Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning](https://arxiv.org/abs/2503.07572) <br> Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) <br> [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
      • Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models
      • ![Star - of-Thought](https://arxiv.org/abs/2504.19095) <br> Jikai Wang, Juntao Li, Lijun Wu, Min Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19095v1/extracted/6392438/images/scot.png"> |[Github](https://github.com/Jikai0Wang/Speculative_CoT) <br> [Paper](https://arxiv.org/abs/2504.19095)| [//]: #04/29
      • Dynamic Early Exit in Reasoning Models
      • ![Publish - Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314) <br> Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar |<img width="1002" alt="image" src="figures/tts_effective.png"> |[Paper](https://arxiv.org/abs/2408.03314)| [//]: #04/08
      • Inference-Time Hyper-Scaling with KV Cache Compression
      • Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
      • ![Star - fib-lab/Token_Signature)<br>[Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models](https://arxiv.org/abs/2506.06008) <br> Peijie Liu, Fengli Xu, Yong Li |<img width="1002" alt="image" src="https://arxiv.org/html/2506.06008v1/x2.png"> |[Github](https://github.com/tsinghua-fib-lab/Token_Signature) <br> [Paper](https://arxiv.org/abs/2506.06008)| [//]: #06/16
    • Efficient Multimodal Reasoning

    • Evaluation and Benchmarks

      • ![Star - Grained Visual Reasoning from Transit Maps](https://arxiv.org/abs/2505.18675) <br> Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.18675v2/x1.png"> |[Github](https://github.com/fscdc/ReasonMap) <br> [Paper](https://arxiv.org/abs/2505.18675)| [//]: #06/11
      • ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
      • ![Star - Language Models](https://arxiv.org/abs/2505.13444) <br> Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett |<img width="1002" alt="image" src="figures/chartmuseum.png"> |[Github](https://github.com/Liyan06/ChartMuseum) <br> [Paper](https://arxiv.org/abs/2505.13444)| [//]: #05/20
      • Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations - v2.png"> |[Paper](https://arxiv.org/abs/2505.10937)| [//]: #05/19
      • StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
      • THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
      • ![Star - stability)<br>[Non-Determinism of "Deterministic" LLM Settings](https://arxiv.org/abs/2408.04667) <br> Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) <br> [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
      • The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
      • Evaluating Large Language Models Trained on Code
      • τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
      • ![Star - compass/GPassK)<br>[Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147) <br> Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) <br> [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
      • LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
      • ![Star - Time Computations for LLM Reasoning and Planning: A Benchmark and Insights](https://arxiv.org/abs/2502.12521) <br> Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) <br> [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
      • Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
      • ![Star - Valve)<br>[CoT-Valve: Length-Compressible Chain-of-Thought Tuning](https://arxiv.org/abs/2502.09601) <br> Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) <br> [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
      • ![Star - stability)<br>[Non-Determinism of "Deterministic" LLM Settings](https://arxiv.org/abs/2408.04667) <br> Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) <br> [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
      • The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
      • Evaluating Large Language Models Trained on Code
      • τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
      • ![Star - compass/GPassK)<br>[Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147) <br> Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) <br> [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
      • LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
      • ![Star - Time Computations for LLM Reasoning and Planning: A Benchmark and Insights](https://arxiv.org/abs/2502.12521) <br> Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) <br> [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
      • ![Star - hkust/benchmark_inference_time_computation_LLM)<br>[Bag of Tricks for Inference-time Computation of LLM Reasoning](https://arxiv.org/abs/2502.07191) <br> Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.07191v4/x1.png"> |[Github](https://github.com/usail-hkust/benchmark_inference_time_computation_LLM) <br> [Paper](https://arxiv.org/abs/2502.07191)| [//]: #04/08
      • ![Star - optimal-tts)<br>[Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling](https://arxiv.org/abs/2502.06703) <br> Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) <br> [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
      • DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
      • S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
      • ![Star - Bench)<br>[VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning](https://arxiv.org/abs/2504.07956) <br> Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) <br> [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
      • S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
      • ![Star - Bench)<br>[VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning](https://arxiv.org/abs/2504.07956) <br> Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) <br> [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
      • ![Star - optimal-tts)<br>[Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling](https://arxiv.org/abs/2502.06703) <br> Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) <br> [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
      • DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
      • ![Star - damo-academy/VCBench)<br>[Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency](https://arxiv.org/abs/2504.18589) <br> Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli Zhao |<img width="1002" alt="image" src="https://arxiv.org/html/2504.18589v1/x1.png"> |[Github](https://github.com/alibaba-damo-academy/VCBench) <br> [Paper](https://arxiv.org/abs/2504.18589)| [//]: #04/29
      • ![Star - liyu/CipherBank)<br>[CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges](https://arxiv.org/abs/2504.19093) <br> Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19093v1/x2.png"> |[Github](https://github.com/Goodman-liyu/CipherBank) <br> [Paper](https://arxiv.org/abs/2504.19093)| [//]: #04/29
      • ![Star - Benchmark/VisuLogic-Eval)<br>[VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models](https://arxiv.org/abs/2504.15279) <br> Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15279v1/x1.png"> |[Github](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) <br> [Paper](https://arxiv.org/abs/2504.15279)| [//]: #04/25
    • Competition

      • ![Publish - Skills.svg?style=social&label=Star)](https://github.com/NVIDIA/NeMo-Skills) [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891). Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman. [[Paper]](https://arxiv.org/abs/2504.16891)[[Github]](https://github.com/NVIDIA/NeMo-Skills)
  • Updates

    • ReasonMap
    • VainF - Walker](https://github.com/ZhenyuSun-Walker), [xianzuwu](https://github.com/xianzuwu)!
    • arXiv