Awesome-Efficient-Reasoning-Models

[Arxiv 2025] Efficient Reasoning Models: A Survey
https://github.com/fscdc/Awesome-Efficient-Reasoning-Models

Last synced: 3 days ago
JSON representation

Full list
- Build SLM with Strong Reasoning Ability
 - Llama-Nemotron: Efficient Reasoning Models
 - Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math - Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen |<img width="1002" alt="image" src="figures/phi_4_mini_reasoning.png"> |[Paper](https://arxiv.org/abs/2504.21233)|[//]: #05/02
 - Phi-4-reasoning Technical Report
 - Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
 - Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
 - ![Publish
 - ![Publish - main.333/) Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang |<img width="1002" alt="image" src="figures/counterfactual_distillation.png"> |[Paper](https://aclanthology.org/2024.emnlp-main.333/)| [//]: #04/08
 - ![Star - Model-Gap/Small-Model-Learnability-Gap) [Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
 - Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
 - ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]() [Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
 - Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
 - ![Publish - main.1140.pdf) Yichun Zhao, Shuheng Zhou, Huijia Zhu |<img width="1002" alt="image" src="figures/prr.png"> |[Paper](https://aclanthology.org/2024.lrec-main.1140.pdf)| [//]: #04/08
 - ![Publish
 - ![Publish
 - ![Star - Model-Gap/Small-Model-Learnability-Gap) [Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
 - Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
 - ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]() [Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
 - Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
 - ![Publish - main.1140.pdf) Yichun Zhao, Shuheng Zhou, Huijia Zhu |<img width="1002" alt="image" src="figures/prr.png"> |[Paper](https://aclanthology.org/2024.lrec-main.1140.pdf)| [//]: #04/08
 - Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
 - Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
 - Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
 - Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
 - ![Star - NLP/Distilling-CoT-Reasoning) [Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
 - ![Star - NLP/Distilling-CoT-Reasoning) [Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
 - Towards Reasoning Ability of Small Language Models
 - ![Star - Reasoning-Models) [Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
 - When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
 - ![Star - rs) [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
 - ![Star - nlp/simpleRL-reason) [SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
 - DeepScaleR - project.com/)
 - Towards Reasoning Ability of Small Language Models
 - ![Star - Reasoning-Models) [Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
 - When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
 - ![Star - rs) [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
 - ![Star - nlp/simpleRL-reason) [SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
 - DeepScaleR - project.com/)
 - ![Star - wang/Tina) [Tina: Tiny Reasoning Models via LoRA](https://arxiv.org/abs/2504.15777) Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15777v1/x4.png"> |[Github](https://github.com/shangshang-wang/Tina) [Paper](https://arxiv.org/abs/2504.15777)| [//]: #04/25
- Make Long CoT Short
 - ![Star - CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization](https://arxiv.org/abs/2504.21659) Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen |<img width="1002" alt="image" src="figures/AdaR1.png"> |[Github](https://github.com/StarDewXXX/AdaR1) [Paper](https://arxiv.org/abs/2504.21659)|[//]: #05/02
 - OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
 - ![Star - AI4Edu/LS-Mixture) [Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models](https://arxiv.org/abs/2505.03469) Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen |<img width="1002" alt="image" src="figures/mix-sft.png"> |[Github](https://github.com/ZGCA-AI4Edu/LS-Mixture) [Paper](https://arxiv.org/abs/2505.03469)| [//]: #05/17
 - Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
 - Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
 - Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
 - ![Star - aware Step Decomposition for Efficient Large Reasoning Models](https://arxiv.org/abs/2505.13975) Yuxuan Jiang, Dawei Li, Frank Ferraro |<img width="1002" alt="image" src="https://github.com/YuxuanJiang1/DRP/blob/main/resources/overview.png"> |[Github](https://github.com/YuxuanJiang1/DRP) [Paper](https://arxiv.org/abs/2505.13975)| [//]: #05/26
 - ![Star
 - Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
 - ![Star - REAL/Self-Braking-Tuning) [Let LLMs Break Free from Overthinking via Self-Braking Tuning](https://arxiv.org/abs/2505.14604) Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14604v1/x2.png"> |[Github](https://github.com/ZJU-REAL/Self-Braking-Tuning) [Paper](https://arxiv.org/abs/2505.14604)| [//]: #05/22
 - ![Star - yibo/R1-Compress) [R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search](https://arxiv.org/abs/2505.16838) Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao |<img width="1002" alt="image" src="figures/r1-compress.png"> |[Github](https://github.com/w-yibo/R1-Compress) [Paper](https://arxiv.org/abs/2505.16838)| [//]: #05/24
 - Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
 - ![Star
 - PATS: Process-Level Adaptive Thinking Mode Switching
 - AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting
 - Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
 - ARM: Adaptive Reasoning Model
 - When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
 - ![Star - nlp/Laser) [Learn to Reason Efficiently with Adaptive Length-based Reward Shaping](https://arxiv.org/abs/2505.15612) Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15612v1/x1.png"> |[Github](https://github.com/hkust-nlp/Laser) [Paper](https://arxiv.org/abs/2505.15612)| [//]: #05/23
 - Think Only When You Need with Large Hybrid-Reasoning Models
 - ![Star - KEG/AdaptThink) [AdaptThink: Reasoning Models Can Learn When to Think](https://arxiv.org/abs/2505.13417) Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.13417v1/x1.png"> |[Github](https://github.com/THU-KEG/AdaptThink) [Paper](https://arxiv.org/abs/2505.13417)| [//]: #05/20
 - Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
 - Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt
 - Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
 - ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
 - AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
 - Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
 - Scalable Chain of Thoughts via Elastic Reasoning
 - Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
 - Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers
 - System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
 - Hybrid Latent Reasoning via Reinforcement Learning
 - SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
 - Continuous Chain of Thought Enables Parallel Exploration and Reasoning
 - ![Star - GRH/CoUT) [Efficient Reasoning via Chain of Unconscious Thought](https://arxiv.org/abs/2505.19756) Ruihan Gong, Yue Liu, Wenjie Qu, Mingzhe Du, Yufei He, Yingwei Ma, Yulin Chen, Xiang Liu, Yi Wen, Xinfeng Li, Ruidong Wang, Xinzhong Zhu, Bryan Hooi, Jiaheng Zhang |<img width="1002" alt="image" src="figures/cout.png"> |[Github](https://github.com/Rohan-GRH/CoUT) [Paper](https://arxiv.org/abs/2505.19756)| [//]: #06/11
 - Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
 - ![Star - Time Scaling with Soft Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.11484) Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao |<img width="1002" alt="image" src="https://arxiv.org/html/2505.11484v1/x1.png"> |[Github](https://github.com/xuyige/SoftCoT) [Paper](https://arxiv.org/abs/2505.11484)| [//]: #05/19
 - ![Star - NLP/Awesome-Latent-CoT) [Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.16782) Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16782v1/x1.png"> |[Github](https://github.com/EIT-NLP/Awesome-Latent-CoT) [Paper](https://arxiv.org/abs/2505.16782)| [//]: #05/24
 - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
 - ![Star - ai-lab/Soft-Thinking) [Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space](https://arxiv.org/abs/2505.15778) Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15778v1/x1.png"> |[Github](https://github.com/eric-ai-lab/Soft-Thinking) [Paper](https://arxiv.org/abs/2505.15778)| [//]: #05/22
 - Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
 - Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space - Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng |<img width="1002" alt="image" src="https://arxiv.org/html/2505.13308v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.13308)| [//]: #05/20
 - Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
 - CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models
 - Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
 - ![Star - concise-cot) [![Publish](https://img.shields.io/badge/Conference-FLLM_2024-blue)]() [The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models](https://arxiv.org/abs/2401.05618) Matthew Renze, Erhan Guven |<img width="1002" alt="image" src="https://arxiv.org/html/2401.05618v3/x1.png"> |[Github](https://github.com/matthewrenze/jhu-concise-cot) [Paper](https://arxiv.org/abs/2401.05618)| [//]: #04/08
 - Break the Chain: Large Language Models Can be Shortcut Reasoners
 - ![Publish - of-Thought without Compromising Effectiveness](https://arxiv.org/abs/2412.11664) Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou |<img width="1002" alt="image" src="figures/co3t.png"> |[Paper](https://arxiv.org/abs/2412.11664)|[//]: #03/16
 - ![Star - NeurIPS_2024-blue)]() [Can Language Models Learn to Skip Steps?](https://arxiv.org/abs/2411.01855) Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang |<img width="1002" alt="image" src="figures/skip_step.png"> |[Github](https://github.com/tengxiaoliu/LM_skip) [Paper](https://arxiv.org/abs/2411.01855)|[//]: #03/16
 - ![Star - Budget-Aware LLM Reasoning](https://arxiv.org/abs/2412.18547) Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.18547v4/x10.png"> |[Github](https://github.com/GeniusHTX/TALE) [Paper](https://arxiv.org/abs/2412.18547)| [//]: #04/08
 - ![Star - Pruner) [O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning](https://arxiv.org/abs/2501.12570) Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao |<img width="1002" alt="image" src="figures/o1_pruner.png"> |[Github](https://github.com/StarDewXXX/O1-Pruner) [Paper](https://arxiv.org/abs/2501.12570)|[//]: #03/16
 - Kimi k1.5: Scaling Reinforcement Learning with LLMs
 - ![Star - long-cot) [Demystifying Long Chain-of-Thought Reasoning in LLMs](https://arxiv.org/abs/2502.03373) Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2502.03373v1/x1.png"> |[Github](https://github.com/eddycmu/demystify-long-cot) [Paper](https://arxiv.org/abs/2502.03373)| [//]: #04/08
 - ![Star - Labs/efficient-reasoning) [Training Language Models to Reason Efficiently](https://arxiv.org/abs/2502.04463) Daman Arora, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04463v2/x3.png"> |[Github](https://github.com/Zanette-Labs/efficient-reasoning) [Paper](https://arxiv.org/abs/2502.04463)| [//]: #04/08
 - ![Star - l3/l1) [L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning](https://www.arxiv.org/abs/2503.04697) Pranjal Aggarwal, Sean Welleck |<img width="1002" alt="image" src="https://arxiv.org/html/2503.04697v1/x2.png"> |[Github](https://github.com/cmu-l3/l1) [Paper](https://www.arxiv.org/abs/2503.04697)| [//]: #04/08
 - Distilling System 2 into System 1
 - ![Star - of-Thought Compression in LLMs](https://arxiv.org/abs/2502.12067) Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li |<img width="1002" alt="image" src="figures/TokenSkip.png"> |[Github](https://github.com/hemingkx/TokenSkip) [Paper](https://arxiv.org/abs/2502.12067)|[//]: #03/20
 - Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
 - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
 - ![Star - reasoning) [Self-Training Elicits Concise Reasoning in Large Language Models](https://arxiv.org/abs/2502.20122) Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun |<img width="1002" alt="image" src="https://arxiv.org/html/2502.20122v2/x1.png"> |[Github](https://github.com/TergelMunkhbat/concise-reasoning) [Paper](https://arxiv.org/abs/2502.20122)| [//]: #04/08
 - ![Star - concise-cot) [![Publish](https://img.shields.io/badge/Conference-FLLM_2024-blue)]() [The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models](https://arxiv.org/abs/2401.05618) Matthew Renze, Erhan Guven |<img width="1002" alt="image" src="https://arxiv.org/html/2401.05618v3/x1.png"> |[Github](https://github.com/matthewrenze/jhu-concise-cot) [Paper](https://arxiv.org/abs/2401.05618)| [//]: #04/08
 - Break the Chain: Large Language Models Can be Shortcut Reasoners
 - ![Star - of-draft) [Chain of Draft: Thinking Faster by Writing Less](https://arxiv.org/abs/2502.18600) Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18600v2/extracted/6244873/plot.png"> |[Github](https://github.com/sileix/chain-of-draft) [Paper](https://arxiv.org/abs/2502.18600)| [//]: #04/08
 - ![Star - boundary) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]() [Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought](https://arxiv.org/abs/2410.05695) Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che |<img width="1002" alt="image" src="https://arxiv.org/html/2410.05695v2/x1.png"> |[Github](https://github.com/LightChen233/reasoning-boundary) [Paper](https://arxiv.org/abs/2410.05695)| [//]: #04/08
 - How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - pro-legend.png" width="45%"> <img src="https://arxiv.org/html/2503.01141v2/extracted/6325669/plot/Anthropic/claude-3-5-sonnet-20241022-mmlu-main.png" width="45%"> |[Paper](https://arxiv.org/abs/2503.01141)| [//]: #04/08
 - ![Star - of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching](https://arxiv.org/abs/2503.05179) Simon A. Aytes, Jinheon Baek, Sung Ju Hwang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.05179v1/x1.png"> |[Github](https://github.com/SimonAytes/SoT) [Paper](https://arxiv.org/abs/2503.05179)| [//]: #04/08
 - Learning to Route LLMs with Confidence Tokens - Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2410.13284v2/x1.png"> |[Paper](https://arxiv.org/abs/2410.13284)| [//]: #04/08
 - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization - Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04428v1/x1.png"> |[Paper](https://arxiv.org/abs/2502.04428)| [//]: #04/08
 - Claude 3.7 Sonnet - 3-7-sonnet)
 - Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
 - ![Star
 - ![Star
 - Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
 - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
 - ![Star - regressive Chain-of-Thought through Loop-Aligned Reasoning](https://arxiv.org/abs/2502.08482) Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.08482v1/x1.png"> |[Github](https://github.com/qifanyu/RELAY) [Paper](https://arxiv.org/abs/2502.08482)| [//]: #04/08
 - ![Star - of-Thought Compression in LLMs](https://arxiv.org/abs/2502.12067) Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li |<img width="1002" alt="image" src="figures/TokenSkip.png"> |[Github](https://github.com/hemingkx/TokenSkip) [Paper](https://arxiv.org/abs/2502.12067)|[//]: #03/20
 - Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
 - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
 - ![Star - reasoning) [Self-Training Elicits Concise Reasoning in Large Language Models](https://arxiv.org/abs/2502.20122) Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun |<img width="1002" alt="image" src="https://arxiv.org/html/2502.20122v2/x1.png"> |[Github](https://github.com/TergelMunkhbat/concise-reasoning) [Paper](https://arxiv.org/abs/2502.20122)| [//]: #04/08
 - ![Star - Budget-Aware LLM Reasoning](https://arxiv.org/abs/2412.18547) Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.18547v4/x10.png"> |[Github](https://github.com/GeniusHTX/TALE) [Paper](https://arxiv.org/abs/2412.18547)| [//]: #04/08
 - ![Star - Pruner) [O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning](https://arxiv.org/abs/2501.12570) Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao |<img width="1002" alt="image" src="figures/o1_pruner.png"> |[Github](https://github.com/StarDewXXX/O1-Pruner) [Paper](https://arxiv.org/abs/2501.12570)|[//]: #03/16
 - Kimi k1.5: Scaling Reinforcement Learning with LLMs
 - ![Star - long-cot) [Demystifying Long Chain-of-Thought Reasoning in LLMs](https://arxiv.org/abs/2502.03373) Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2502.03373v1/x1.png"> |[Github](https://github.com/eddycmu/demystify-long-cot) [Paper](https://arxiv.org/abs/2502.03373)| [//]: #04/08
 - ![Star - Labs/efficient-reasoning) [Training Language Models to Reason Efficiently](https://arxiv.org/abs/2502.04463) Daman Arora, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04463v2/x3.png"> |[Github](https://github.com/Zanette-Labs/efficient-reasoning) [Paper](https://arxiv.org/abs/2502.04463)| [//]: #04/08
 - ![Star - Valve) [CoT-Valve: Length-Compressible Chain-of-Thought Tuning](https://arxiv.org/abs/2502.09601) Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
 - ![Publish - of-Thought without Compromising Effectiveness](https://arxiv.org/abs/2412.11664) Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou |<img width="1002" alt="image" src="figures/co3t.png"> |[Paper](https://arxiv.org/abs/2412.11664)|[//]: #03/16
 - ![Star - NeurIPS_2024-blue)]() [Can Language Models Learn to Skip Steps?](https://arxiv.org/abs/2411.01855) Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang |<img width="1002" alt="image" src="figures/skip_step.png"> |[Github](https://github.com/tengxiaoliu/LM_skip) [Paper](https://arxiv.org/abs/2411.01855)|[//]: #03/16
 - Distilling System 2 into System 1
 - ![Star - l3/l1) [L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning](https://www.arxiv.org/abs/2503.04697) Pranjal Aggarwal, Sean Welleck |<img width="1002" alt="image" src="https://arxiv.org/html/2503.04697v1/x2.png"> |[Github](https://github.com/cmu-l3/l1) [Paper](https://www.arxiv.org/abs/2503.04697)| [//]: #04/08
 - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
 - Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
 - ![Star - NLP-Chang/ThinkPrune) [ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning](https://arxiv.org/abs/2504.01296) Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.01296v1/x1.png"> |[Github](https://github.com/UCSB-NLP-Chang/ThinkPrune) [Paper](https://arxiv.org/abs/2504.01296)| [//]: #04/08
 - Think When You Need: Self-Adaptive Chain-of-Thought Learning
 - Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
 - CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models
 - ![Star - of-draft) [Chain of Draft: Thinking Faster by Writing Less](https://arxiv.org/abs/2502.18600) Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18600v2/extracted/6244873/plot.png"> |[Github](https://github.com/sileix/chain-of-draft) [Paper](https://arxiv.org/abs/2502.18600)| [//]: #04/08
 - How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - pro-legend.png" width="45%"> <img src="https://arxiv.org/html/2503.01141v2/extracted/6325669/plot/Anthropic/claude-3-5-sonnet-20241022-mmlu-main.png" width="45%"> |[Paper](https://arxiv.org/abs/2503.01141)| [//]: #04/08
 - ![Star - of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching](https://arxiv.org/abs/2503.05179) Simon A. Aytes, Jinheon Baek, Sung Ju Hwang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.05179v1/x1.png"> |[Github](https://github.com/SimonAytes/SoT) [Paper](https://arxiv.org/abs/2503.05179)| [//]: #04/08
 - Learning to Route LLMs with Confidence Tokens - Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2410.13284v2/x1.png"> |[Paper](https://arxiv.org/abs/2410.13284)| [//]: #04/08
 - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization - Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04428v1/x1.png"> |[Paper](https://arxiv.org/abs/2502.04428)| [//]: #04/08
 - DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
 - Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
 - ![Star - NLP-Chang/ThinkPrune) [ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning](https://arxiv.org/abs/2504.01296) Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.01296v1/x1.png"> |[Github](https://github.com/UCSB-NLP-Chang/ThinkPrune) [Paper](https://arxiv.org/abs/2504.01296)| [//]: #04/08
 - Think When You Need: Self-Adaptive Chain-of-Thought Learning
 - ![Star - rg/recurrent-pretraining) [Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein |<img width="1002" alt="image" src="https://arxiv.org/html/2502.05171v2/x2.png"> |[Github](https://github.com/seal-rg/recurrent-pretraining) [Paper](https://arxiv.org/abs/2502.05171)| [//]: #04/08
 - Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
 - Claude 3.7 Sonnet - 3-7-sonnet)
 - Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
 - ![Star
 - Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
 - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
 - ![Star
 - ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]() [Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models](https://arxiv.org/abs/2402.07754) Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong |<img width="1002" alt="image" src="figures/diffusion_thought.png"> |[Github](https://github.com/HKUNLP/diffusion-of-thoughts) [Paper](https://arxiv.org/abs/2402.07754)| [//]: #04/08
 - ![Star - regressive Chain-of-Thought through Loop-Aligned Reasoning](https://arxiv.org/abs/2502.08482) Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.08482v1/x1.png"> |[Github](https://github.com/qifanyu/RELAY) [Paper](https://arxiv.org/abs/2502.08482)| [//]: #04/08
 - CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
 - CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
 - ![Star - by-Step Compression](https://arxiv.org/abs/2502.15589) Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2502.15589v1/x1.png"> |[Github](https://github.com/zjunlp/LightThinker) [Paper](https://arxiv.org/abs/2502.15589)| [//]: #04/08
 - ![Star - COLM_2024-blue)]() [Guiding Language Model Reasoning with Planning Tokens](https://arxiv.org/abs/2310.05707) Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni |<img width="1002" alt="image" src="https://arxiv.org/html/2310.05707v4/extracted/5777851/img/overview.png"> |[Github](https://github.com/WANGXinyiLinda/planning_tokens) [Paper](https://arxiv.org/abs/2310.05707)| [//]: #04/08
 - ![Star - COLM_2024-blue)]() [Let's Think Dot by Dot: Hidden Computation in Transformer Language Models](https://arxiv.org/abs/2404.15758) Jacob Pfau, William Merrill, Samuel R. Bowman |<img width="1002" alt="image" src="https://arxiv.org/html/2404.15758v1/extracted/2404.15758v1/figs/scale_len.png"> |[Github](https://github.com/JacobPfau/fillerTokens) [Paper](https://arxiv.org/abs/2404.15758)| [//]: #04/08
 - ![Star - Memory-and-Reasoning) [Disentangling Memory and Reasoning Ability in Large Language Models](https://arxiv.org/abs/2411.13504) Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2411.13504v2/x1.png"> |[Github](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning) [Paper](https://arxiv.org/abs/2411.13504)| [//]: #04/08
 - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
 - Training Large Language Models to Reason in a Continuous Latent Space
 - ![Star
 - ![Publish
 - ![Star - rg/recurrent-pretraining) [Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein |<img width="1002" alt="image" src="https://arxiv.org/html/2502.05171v2/x2.png"> |[Github](https://github.com/seal-rg/recurrent-pretraining) [Paper](https://arxiv.org/abs/2502.05171)| [//]: #04/08
 - Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
 - Training Large Language Models to Reason in a Continuous Latent Space
 - ![Star
 - ![Publish
 - Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
 - ![Star - by-Step Compression](https://arxiv.org/abs/2502.15589) Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2502.15589v1/x1.png"> |[Github](https://github.com/zjunlp/LightThinker) [Paper](https://arxiv.org/abs/2502.15589)| [//]: #04/08
 - ![Star - COLM_2024-blue)]() [Guiding Language Model Reasoning with Planning Tokens](https://arxiv.org/abs/2310.05707) Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni |<img width="1002" alt="image" src="https://arxiv.org/html/2310.05707v4/extracted/5777851/img/overview.png"> |[Github](https://github.com/WANGXinyiLinda/planning_tokens) [Paper](https://arxiv.org/abs/2310.05707)| [//]: #04/08
 - ![Star - Memory-and-Reasoning) [Disentangling Memory and Reasoning Ability in Large Language Models](https://arxiv.org/abs/2411.13504) Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2411.13504v2/x1.png"> |[Github](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning) [Paper](https://arxiv.org/abs/2411.13504)| [//]: #04/08
- Background Papers
 - Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
 - Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
 - ![Star - NLPIR/WebThinker) [WebThinker: Empowering Large Reasoning Models with Deep Research Capability](https://arxiv.org/abs/2504.21776) Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou |<img width="1002" alt="image" src="figures/webthinker.png"> |[Github](https://github.com/RUC-NLPIR/WebThinker) [Paper](https://arxiv.org/abs/2504.21776)|[//]: #05/02
 - ![Star
 - ![Star - Shot-RLVR) [Reinforcement Learning for Reasoning in Large Language Models with One Training Example](https://arxiv.org/abs/2504.20571) Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2504.20571v1/x3.png"> |[Github](https://github.com/ypwang61/One-Shot-RLVR) [Paper](https://arxiv.org/abs/2504.20571)|[//]: #04/30
 - Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision - Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu |<img width="1002" alt="image" src="figures/eorm.png"> |[Paper](https://arxiv.org/abs/2505.14999)| [//]: #05/22
 - AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
 - ![Star - coai/BARREL) [BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs](https://arxiv.org/abs/2505.13529) Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang |<img width="1002" alt="image" src="figures/BARREL.png"> |[Github](https://github.com/thu-coai/BARREL) [Paper](https://arxiv.org/abs/2505.13529)| [//]: #05/23
 - ![Star - tango) [RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning](https://arxiv.org/abs/2505.15034) Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi |<img width="1002" alt="image" src="figures/Tango.png"> |[Github](https://github.com/kaiwenzha/rl-tango) [Paper](https://arxiv.org/abs/2505.15034)| [//]: #05/23
 - Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
 - Reasoning Models Better Express Their Confidence
 - Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
 - ![Star - Enhanced Reinforcement Learning](https://arxiv.org/abs/2505.12996) Jiaan Wang, Fandong Meng, Jie Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12996v1/x4.png"> |[Github](https://github.com/krystalan/DRT) [Paper](https://arxiv.org/abs/2505.12996)| [//]: #05/20
 - Absolute Zero: Reinforced Self-play Reasoning with Zero Data
 - ![Star
 - ![Star - Ability-Alignment) [Beyond Aha!: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models](https://arxiv.org/abs/2505.10554) Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10554v1/x2.png"> |[Github](https://github.com/zhiyuanhubj/Meta-Ability-Alignment) [Paper](https://arxiv.org/abs/2505.10554)| [//]: #05/19
 - ![Star
 - The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
 - Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models - Jun Qi |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10446v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10446)| [//]: #05/18
 - J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
 - INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
 - AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
 - ![Star - of-Thought Tokens are Computer Program Variables](https://arxiv.org/abs/2505.04955) Fangwei Zhu, Peiyi Wang, Zhifang Sui |<img width="1002" alt="image" src="https://arxiv.org/html/2505.04955v1/x2.png"> |[Github](https://github.com/solitaryzero/CoTs_are_Variables) [Paper](https://arxiv.org/abs/2505.04955)| [//]: #05/17
 - ![Star - - From Pretraining to Posttraining](https://arxiv.org/abs/2505.07608) Xiaomi LLM-Core Team |<img width="1002" alt="image" src="https://arxiv.org/html/2505.07608v1/x1.png"> |[Github](https://github.com/xiaomimimo/MiMo) [Paper](https://arxiv.org/abs/2505.07608)| [//]: #05/17
 - ![Star - of-RLVR) [Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.13837v1/x1.png"> |[Github](https://github.com/LeapLabTHU/limit-of-RLVR) [Paper](https://arxiv.org/abs/2504.13837)| [//]: #04/22
 - ![Publish - of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
 - ![Star - nlp/tree-of-thought-llm) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2023-blue)]() [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
 - ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-AAAI_2024-blue)]() [Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
 - ![Publish - Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou |<img width="1002" alt="image" src="figures/sc.png"> |[Paper](https://arxiv.org/abs/2203.11171)| [//]: #04/08
 - ![Publish - of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
 - ![Star - nlp/tree-of-thought-llm) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2023-blue)]() [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
 - ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-AAAI_2024-blue)]() [Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
 - ![Star - of-symbol-planning) [![Publish](https://img.shields.io/badge/Conference-COLM_2024-blue)]() [Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
 - Thinking Machines: A Survey of LLM based Reasoning Strategies
 - ![Star - System2-Reasoning-LLM) [From System 1 to System 2: A Survey of Reasoning Large Language Models](https://arxiv.org/abs/2502.17419) Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.17419v2/extracted/6232702/images/timeline.png"> |[Github](https://github.com/zzli2022/Awesome-System2-Reasoning-LLM) [Paper](https://arxiv.org/abs/2502.17419)| [//]: #04/08
 - ![Star - AI-Lab/Program-of-Thoughts) [![Publish](https://img.shields.io/badge/Conference-TMLR_2023-blue)]() [Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588) Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen |<img width="1002" alt="image" src="figures/pot.png"> |[Github](https://github.com/TIGER-AI-Lab/Program-of-Thoughts) [Paper](https://arxiv.org/abs/2211.12588)| [//]: #04/08
 - ![Star - of-symbol-planning) [![Publish](https://img.shields.io/badge/Conference-COLM_2024-blue)]() [Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
 - Thinking Machines: A Survey of LLM based Reasoning Strategies
 - Resa: Transparent Reasoning Models via SAEs
- Let Decoding More Efficient
 - Reward Reasoning Model
 - Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
 - Control-R: Towards controllable test-time scaling
 - Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
 - First Finish Search: Efficient Test-Time Scaling in Large Language Models
 - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
 - Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones - Adsera |<img width="1002" alt="image" src="https://arxiv.org/html/2505.21825v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.21825)| [//]: #06/11
 - Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
 - ![Star - guided-search) [Value-Guided Search for Efficient Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.17373) Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17373v1/x2.png"> |[Github](https://github.com/kaiwenw/value-guided-search) [Paper](https://arxiv.org/abs/2505.17373)| [//]: #06/11
 - Accelerated Test-Time Scaling with Model-Free Speculative Sampling
 - Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
 - Fractured Chain-of-Thought Reasoning
 - Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
 - Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
 - ![Star - Group/AlphaOne) [AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time](https://arxiv.org/abs/2505.24863) Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.24863v1/x27.png"> |[Github](https://github.com/ASTRAL-Group/AlphaOne) [Paper](https://arxiv.org/abs/2505.24863)| [//]: #06/13
 - ProxyThinker: Test-Time Guidance through Small Visual Reasoners
 - A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
 - Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
 - Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
 - ![Star - dev/ReasoningPathCompression) [Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning](https://arxiv.org/abs/2505.13866) Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim |<img width="1002" alt="image" src="figures/rpc_new.png"> |[Github](https://github.com/jiwonsong-dev/ReasoningPathCompression) [Paper](https://arxiv.org/abs/2505.13866)| [//]: #05/22
 - RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
 - Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity - Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.11107v1/extracted/6445446/figures/gt_main_new.png"> |[Paper](https://arxiv.org/abs/2505.11107)| [//]: #05/19
 - ![Star - ACL_main_2025-blue)]() [Rethinking Repetition Problems of LLMs in Code Generation](https://arxiv.org/abs/2505.10402) Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |<img width="1002" alt="image" src="figures/code_repeat.png"> |[Github](https://github.com/LYC127/RPG) [Paper](https://arxiv.org/abs/2505.10402)| [//]: #05/18
 - Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
 - ![Star - IJCAI-blue)]() [Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning](https://arxiv.org/abs/2505.06321) Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu |<img width="1002" alt="image" src="figures/learn2think.png"> |[Github](https://github.com/zch65458525/L2T) [Paper](https://arxiv.org/abs/2505.06321)| [//]: #05/17
 - ![Star - Shanghai/xVerify) [xVerify: Efficient Answer Verifier for Reasoning Model Evaluations](https://arxiv.org/abs/2504.10481) Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
 - ![Star - Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
 - ![Star - ICLR_2024-blue)]() [Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
 - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
 - ![Star - Shanghai/xVerify) [xVerify: Efficient Answer Verifier for Reasoning Model Evaluations](https://arxiv.org/abs/2504.10481) Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
 - ![Star - Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
 - ![Star - ICLR_2024-blue)]() [Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
 - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
 - ![Star - Guided Speculative Decoding for Efficient LLM Reasoning](https://arxiv.org/abs/2501.19324) Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
 - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
 - ![Star - Time Scaling](https://arxiv.org/abs/2502.12018) Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
 - ![Star - NAACL_Findings_2025-blue)]() [Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
 - Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
 - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
 - Confidence Improves Self-Consistency in LLMs
 - ![Star - Huang/Self-Calibration) [Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031) Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
 - ![Star - Labs/SpeculativeRejection) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]() [Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
 - Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
 - FastMCTS: A Simple Sampling Strategy for Data Synthesis
 - ![Star - github-00/LLM-Predictive-Decoding) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]() [Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
 - ![Star - NAACL_Findings_2025-blue)]() [Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
 - Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
 - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
 - Confidence Improves Self-Consistency in LLMs
 - ![Star - Huang/Self-Calibration) [Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031) Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
 - ![Star - Labs/SpeculativeRejection) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]() [Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
 - Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
 - FastMCTS: A Simple Sampling Strategy for Data Synthesis
 - ![Star - github-00/LLM-Predictive-Decoding) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]() [Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
 - ![Star - taught-lookahead) [Language Models can Self-Improve at State-Value Estimation for Better Search](https://arxiv.org/abs/2503.02878) Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
 - ![Star - Decoding) [ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation](https://arxiv.org/abs/2503.13288) Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
 - Dynamic Parallel Tree Search for Efficient LLM Reasoning
 - ![Star - taught-lookahead) [Language Models can Self-Improve at State-Value Estimation for Better Search](https://arxiv.org/abs/2503.02878) Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
 - ![Star - Decoding) [ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation](https://arxiv.org/abs/2503.13288) Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
 - Dynamic Parallel Tree Search for Efficient LLM Reasoning
 - ![Star
 - ![Star - Reasoning/APR) [Learning Adaptive Parallel Reasoning with Language Models](https://arxiv.org/abs/2504.15466) Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
 - ![Star - research/sot) [![Publish](https://img.shields.io/badge/Conference-ICLR_2024-blue)]() [Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
 - Adaptive Skeleton Graph Decoding
 - ![Star
 - ![Star - Reasoning/APR) [Learning Adaptive Parallel Reasoning with Language Models](https://arxiv.org/abs/2504.15466) Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
 - THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
 - ![Star - research/sot) [![Publish](https://img.shields.io/badge/Conference-ICLR_2024-blue)]() [Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
 - Adaptive Skeleton Graph Decoding
 - ![Star - Guided Speculative Decoding for Efficient LLM Reasoning](https://arxiv.org/abs/2501.19324) Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
 - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
 - ![Star - Time Scaling](https://arxiv.org/abs/2502.12018) Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
 - DISC: Dynamic Decomposition Improves LLM Inference Scaling
 - From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
 - DISC: Dynamic Decomposition Improves LLM Inference Scaling
 - From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
 - ![Star - structured Reasoning of Multimodal Large Models?](https://arxiv.org/abs/2503.06252) Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
 - ![Star - wyz/inference_scaling) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]() [Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
 - ![Star - AIRe/MRT) [Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning](https://arxiv.org/abs/2503.07572) Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
 - ![Star - Time Compute via Speculative Reasoning](https://arxiv.org/abs/2504.07891) Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
 - ![Star - structured Reasoning of Multimodal Large Models?](https://arxiv.org/abs/2503.06252) Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
 - ![Star - Time Compute via Speculative Reasoning](https://arxiv.org/abs/2504.07891) Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
 - ![Star - wyz/inference_scaling) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]() [Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
 - ![Star - AIRe/MRT) [Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning](https://arxiv.org/abs/2503.07572) Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
 - Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models
 - ![Star - of-Thought](https://arxiv.org/abs/2504.19095) Jikai Wang, Juntao Li, Lijun Wu, Min Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19095v1/extracted/6392438/images/scot.png"> |[Github](https://github.com/Jikai0Wang/Speculative_CoT) [Paper](https://arxiv.org/abs/2504.19095)| [//]: #04/29
 - Dynamic Early Exit in Reasoning Models
 - ![Publish - Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314) Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar |<img width="1002" alt="image" src="figures/tts_effective.png"> |[Paper](https://arxiv.org/abs/2408.03314)| [//]: #04/08
 - Inference-Time Hyper-Scaling with KV Cache Compression
 - Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
 - ![Star - fib-lab/Token_Signature) [Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models](https://arxiv.org/abs/2506.06008) Peijie Liu, Fengli Xu, Yong Li |<img width="1002" alt="image" src="https://arxiv.org/html/2506.06008v1/x2.png"> |[Github](https://github.com/tsinghua-fib-lab/Token_Signature) [Paper](https://arxiv.org/abs/2506.06008)| [//]: #06/16
- Efficient Multimodal Reasoning
 - MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
 - ![Star - of-Thought Reward Model through Reinforcement Fine-Tuning](https://arxiv.org/abs/2505.03318) Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang |<img width="1002" alt="image" src="figures/umrf.png"> |[Github](https://github.com/CodeGoat24/UnifiedReward) [Paper](https://arxiv.org/abs/2505.03318)| [//]: #05/17
 - Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
 - Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
 - ![Star - zju/PixelThink) [PixelThink: Towards Efficient Chain-of-Pixel Reasoning](https://arxiv.org/abs/2505.23727) Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23727v1/x2.png"> |[Github](https://github.com/songw-zju/PixelThink) [Paper](https://arxiv.org/abs/2505.23727)| [//]: #06/06
 - ![Star - Language Models](https://arxiv.org/abs/2505.16854) Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16854v1/x1.png"> |[Github](https://github.com/kokolerk/TON) [Paper](https://arxiv.org/abs/2505.16854)| [//]: #05/24
 - Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
 - One RL to See Them All: Visual Triple Unified Reinforcement Learning
 - MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
 - GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
 - Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
 - Grounded Reinforcement Learning for Visual Reasoning
 - Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models - Chi Cheung,Shengyu Zhang,Fei Wu,Hongxia Yang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23091v2/extracted/6518453/images_folder/mmr1_framework_update.png"> |[Paper](https://arxiv.org/abs/2505.23091)| [//]: #06/11
 - ![Star - Language Reasoning Models to Re-attention Visual Information](https://arxiv.org/abs/2505.23558) Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23558v2/x5.png"> |[Github](https://github.com/Liar406/Look_Again) [Paper](https://arxiv.org/abs/2505.23558)| [//]: #06/11
 - Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought - An Huang,Guilin Liu,Shiwei Sheng,Shilong Liu,Liang-Yan Gui,Jan Kautz,Yu-Xiong Wang,Zhiding Yu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23766v1/x2.png"> |[Paper](https://arxiv.org/abs/2505.23766)| [//]: #06/11
 - Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
 - Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning - Ching Lin, Kevin Lin, Wangmeng Zuo, Lijuan Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.19702v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.19702)| [//]: #06/11
 - Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
 - SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
 - Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
 - Visual Abstract Thinking Empowers Multimodal Reasoning
 - VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
 - DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
 - FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
 - ![Star
 - Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
 - ![Star - R1) [Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning](https://arxiv.org/abs/2505.14677) Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14677v1/x3.png"> |[Github](https://github.com/maifoundations/Visionary-R1) [Paper](https://arxiv.org/abs/2505.14677)| [//]: #05/22
 - ![Star - research/VisionReasoner) [VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning](https://arxiv.org/abs/2505.12081) Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12081v1/x1.png"> |[Github](https://github.com/dvlab-research/VisionReasoner) [Paper](https://arxiv.org/abs/2505.12081)| [//]: #05/20
 - ![Star - PRM) [MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision](https://arxiv.org/abs/2505.13427) Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao |<img width="1002" alt="image" src="figures/mmprm.png"> |[Github](https://github.com/ModalMinds/MM-PRM) [Paper](https://arxiv.org/abs/2505.13427)| [//]: #05/20
 - CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
 - Visual Planning: Let's Think Only with Images
 - ![Star - ZERO-Inference) [Training-Free Reasoning and Reflection in MLLMs](https://arxiv.org/abs/2505.16151) Hongchen Wei, Zhenzhong Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16151v1/x1.png"> |[Github](https://github.com/hcwei13/FRANK-ZERO-Inference) [Paper](https://arxiv.org/abs/2505.16151)| [//]: #05/24
 - ![Star - ShareVL) [R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO](https://arxiv.org/abs/2505.16673) Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16673v1/x2.png"> |[Github](https://github.com/HJYao00/R1-ShareVL) [Paper](https://arxiv.org/abs/2505.16673)| [//]: #05/24
 - ![Star - R1) [SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward](https://arxiv.org/abs/2505.17018) Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17018v1/x1.png"> |[Github](https://github.com/kxfan2002/SophiaVL-R1) [Paper](https://arxiv.org/abs/2505.17018)| [//]: #05/24
 - VLM-R3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
 - ![Star - Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning](https://arxiv.org/abs/2505.16579) Siqu Ou, Hongcheng Liu, Pingjie Wang, Yusheng Liao, Chuan Xuan, Yanfeng Wang, Yu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16579v1/x1.png"> |[Github](https://github.com/Cratileo/D2R) [Paper](https://arxiv.org/abs/2505.16579)| [//]: #05/24
 - GRIT: Teaching MLLMs to Think with Images - Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju, Xinze Guan, Xin Eric Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15879v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.15879)| [//]: #05/24
 - UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
 - ![Star - reasoner) [X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains](https://arxiv.org/abs/2505.03981) Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon |<img width="1002" alt="image" src="https://arxiv.org/html/2505.03981v1/x1.png"> |[Github](https://github.com/microsoft/x-reasoner) [Paper](https://arxiv.org/abs/2505.03981)| [//]: #05/18
 - Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
 - ![Publish - VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning](https://arxiv.org/abs/2505.10557) Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10557v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10557)| [//]: #05/18
 - ![Star
 - ![Star - ICML_2025-blue)]() [Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging](https://arxiv.org/abs/2505.05464) Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He |<img width="1002" alt="image" src="https://arxiv.org/html/2505.05464v1/x1.png"> |[Github](https://github.com/shiqichen17/VLM_Merging) [Paper](https://arxiv.org/abs/2505.05464)| [//]: #05/18
 - Seed1.5-VL Technical Report
 - ![Star - ssl/RAP) [Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning](https://arxiv.org/abs/2506.04755) Shenshen Li, Kaiyuan Deng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Heng Tao Shen, Xing Xu |<img width="1002" alt="image" src="https://arxiv.org/html/2506.04755v1/x3.png"> |[Github](https://github.com/Leo-ssl/RAP) [Paper](https://arxiv.org/abs/2506.04755)| [//]: #06/16
 - ![Star - Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965) Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan |<img width="1002" alt="image" src="https://arxiv.org/html/2506.09965v1/x4.png"> |[Github](https://github.com/AntResearchNLP/ViLaSR) [Paper](https://arxiv.org/abs/2506.09965)| [//]: #06/16
 - Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
- Evaluation and Benchmarks
 - ![Star - Grained Visual Reasoning from Transit Maps](https://arxiv.org/abs/2505.18675) Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.18675v2/x1.png"> |[Github](https://github.com/fscdc/ReasonMap) [Paper](https://arxiv.org/abs/2505.18675)| [//]: #06/11
 - ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
 - ![Star - Language Models](https://arxiv.org/abs/2505.13444) Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett |<img width="1002" alt="image" src="figures/chartmuseum.png"> |[Github](https://github.com/Liyan06/ChartMuseum) [Paper](https://arxiv.org/abs/2505.13444)| [//]: #05/20
 - Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations - v2.png"> |[Paper](https://arxiv.org/abs/2505.10937)| [//]: #05/19
 - StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
 - THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
 - ![Star - stability) [Non-Determinism of "Deterministic" LLM Settings](https://arxiv.org/abs/2408.04667) Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
 - Evaluating Large Language Models Trained on Code
 - τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
 - ![Star - compass/GPassK) [Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147) Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
 - LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
 - ![Star - Time Computations for LLM Reasoning and Planning: A Benchmark and Insights](https://arxiv.org/abs/2502.12521) Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
 - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
 - ![Star - Valve) [CoT-Valve: Length-Compressible Chain-of-Thought Tuning](https://arxiv.org/abs/2502.09601) Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
 - ![Star - stability) [Non-Determinism of "Deterministic" LLM Settings](https://arxiv.org/abs/2408.04667) Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
 - Evaluating Large Language Models Trained on Code
 - τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
 - ![Star - compass/GPassK) [Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147) Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
 - LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
 - ![Star - Time Computations for LLM Reasoning and Planning: A Benchmark and Insights](https://arxiv.org/abs/2502.12521) Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
 - ![Star - hkust/benchmark_inference_time_computation_LLM) [Bag of Tricks for Inference-time Computation of LLM Reasoning](https://arxiv.org/abs/2502.07191) Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.07191v4/x1.png"> |[Github](https://github.com/usail-hkust/benchmark_inference_time_computation_LLM) [Paper](https://arxiv.org/abs/2502.07191)| [//]: #04/08
 - ![Star - optimal-tts) [Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling](https://arxiv.org/abs/2502.06703) Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
 - DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
 - S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
 - ![Star - Bench) [VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning](https://arxiv.org/abs/2504.07956) Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
 - S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
 - ![Star - Bench) [VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning](https://arxiv.org/abs/2504.07956) Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
 - ![Star - optimal-tts) [Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling](https://arxiv.org/abs/2502.06703) Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
 - DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
 - ![Star - damo-academy/VCBench) [Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency](https://arxiv.org/abs/2504.18589) Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli Zhao |<img width="1002" alt="image" src="https://arxiv.org/html/2504.18589v1/x1.png"> |[Github](https://github.com/alibaba-damo-academy/VCBench) [Paper](https://arxiv.org/abs/2504.18589)| [//]: #04/29
 - ![Star - liyu/CipherBank) [CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges](https://arxiv.org/abs/2504.19093) Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19093v1/x2.png"> |[Github](https://github.com/Goodman-liyu/CipherBank) [Paper](https://arxiv.org/abs/2504.19093)| [//]: #04/29
 - ![Star - Benchmark/VisuLogic-Eval) [VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models](https://arxiv.org/abs/2504.15279) Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15279v1/x1.png"> |[Github](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) [Paper](https://arxiv.org/abs/2504.15279)| [//]: #04/25
- Competition
 - ![Publish - Skills.svg?style=social&label=Star)](https://github.com/NVIDIA/NeMo-Skills) [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891). Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman. [[Paper]](https://arxiv.org/abs/2504.16891)[[Github]](https://github.com/NVIDIA/NeMo-Skills)
Updates
- ReasonMap
- VainF - Walker](https://github.com/ZhenyuSun-Walker), [xianzuwu](https://github.com/xianzuwu)!
- arXiv

Programming Languages

Python 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-Efficient-Reasoning-Models

Full list

Build SLM with Strong Reasoning Ability

Make Long CoT Short

Background Papers

Let Decoding More Efficient

Efficient Multimodal Reasoning

Evaluation and Benchmarks

Competition

Updates