Awesome-Efficient-Reasoning-Models
[Arxiv 2025] Efficient Reasoning Models: A Survey
https://github.com/fscdc/Awesome-Efficient-Reasoning-Models
Last synced: 3 days ago
JSON representation
-
Full list
-
Build SLM with Strong Reasoning Ability
- Llama-Nemotron: Efficient Reasoning Models
- Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math - Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen |<img width="1002" alt="image" src="figures/phi_4_mini_reasoning.png"> |[Paper](https://arxiv.org/abs/2504.21233)|[//]: #05/02
- Phi-4-reasoning Technical Report
- Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
- Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
- | [//]: #04/08
-  <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
- Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
- ]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
- Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
- | [//]: #04/08
-  <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
- Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
- ]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
- Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
- | [//]: #04/08
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
- Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
- Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
-  <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
-  <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
- Towards Reasoning Ability of Small Language Models
-  <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
- When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
-  <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
-  <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
- DeepScaleR - project.com/)
- Towards Reasoning Ability of Small Language Models
-  <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
- When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
-  <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
-  <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
- DeepScaleR - project.com/)
-  <br> Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15777v1/x4.png"> |[Github](https://github.com/shangshang-wang/Tina) <br> [Paper](https://arxiv.org/abs/2504.15777)| [//]: #04/25
-
Make Long CoT Short
-  <br> Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen |<img width="1002" alt="image" src="figures/AdaR1.png"> |[Github](https://github.com/StarDewXXX/AdaR1) <br> [Paper](https://arxiv.org/abs/2504.21659)|[//]: #05/02
- OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
-  <br> Bin Yu, Hang Yuan, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen |<img width="1002" alt="image" src="figures/mix-sft.png"> |[Github](https://github.com/ZGCA-AI4Edu/LS-Mixture) <br> [Paper](https://arxiv.org/abs/2505.03469)| [//]: #05/17
- Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
- Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition
- Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
-  <br> Yuxuan Jiang, Dawei Li, Frank Ferraro |<img width="1002" alt="image" src="https://github.com/YuxuanJiang1/DRP/blob/main/resources/overview.png"> |[Github](https://github.com/YuxuanJiang1/DRP) <br> [Paper](https://arxiv.org/abs/2505.13975)| [//]: #05/26
-  <br> Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14604v1/x2.png"> |[Github](https://github.com/ZJU-REAL/Self-Braking-Tuning) <br> [Paper](https://arxiv.org/abs/2505.14604)| [//]: #05/22
-  <br> Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, Dacheng Tao |<img width="1002" alt="image" src="figures/r1-compress.png"> |[Github](https://github.com/w-yibo/R1-Compress) <br> [Paper](https://arxiv.org/abs/2505.16838)| [//]: #05/24
- Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
-  <br> Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15612v1/x1.png"> |[Github](https://github.com/hkust-nlp/Laser) <br> [Paper](https://arxiv.org/abs/2505.15612)| [//]: #05/23
- Think Only When You Need with Large Hybrid-Reasoning Models
-  <br> Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.13417v1/x1.png"> |[Github](https://github.com/THU-KEG/AdaptThink) <br> [Paper](https://arxiv.org/abs/2505.13417)| [//]: #05/20
- Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
- Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt
- Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
- ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
- AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
- Scalable Chain of Thoughts via Elastic Reasoning
- Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
- Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers
- System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
- Hybrid Latent Reasoning via Reinforcement Learning
- SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
- Continuous Chain of Thought Enables Parallel Exploration and Reasoning
-  <br> Ruihan Gong, Yue Liu, Wenjie Qu, Mingzhe Du, Yufei He, Yingwei Ma, Yulin Chen, Xiang Liu, Yi Wen, Xinfeng Li, Ruidong Wang, Xinzhong Zhu, Bryan Hooi, Jiaheng Zhang |<img width="1002" alt="image" src="figures/cout.png"> |[Github](https://github.com/Rohan-GRH/CoUT) <br> [Paper](https://arxiv.org/abs/2505.19756)| [//]: #06/11
- Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
-  <br> Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao |<img width="1002" alt="image" src="https://arxiv.org/html/2505.11484v1/x1.png"> |[Github](https://github.com/xuyige/SoftCoT) <br> [Paper](https://arxiv.org/abs/2505.11484)| [//]: #05/19
-  <br> Xinghao Chen, Anhao Zhao, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16782v1/x1.png"> |[Github](https://github.com/EIT-NLP/Awesome-Latent-CoT) <br> [Paper](https://arxiv.org/abs/2505.16782)| [//]: #05/24
- Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
-  <br> Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15778v1/x1.png"> |[Github](https://github.com/eric-ai-lab/Soft-Thinking) <br> [Paper](https://arxiv.org/abs/2505.15778)| [//]: #05/22
- Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
- Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space - Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng |<img width="1002" alt="image" src="https://arxiv.org/html/2505.13308v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.13308)| [//]: #05/20
- Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
- CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models
- Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
- ]()<br>[The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models](https://arxiv.org/abs/2401.05618) <br> Matthew Renze, Erhan Guven |<img width="1002" alt="image" src="https://arxiv.org/html/2401.05618v3/x1.png"> |[Github](https://github.com/matthewrenze/jhu-concise-cot) <br> [Paper](https://arxiv.org/abs/2401.05618)| [//]: #04/08
- Break the Chain: Large Language Models Can be Shortcut Reasoners
-  <br> Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou |<img width="1002" alt="image" src="figures/co3t.png"> |[Paper](https://arxiv.org/abs/2412.11664)|[//]: #03/16
- ![Star - NeurIPS_2024-blue)]()<br>[Can Language Models Learn to Skip Steps?](https://arxiv.org/abs/2411.01855) <br> Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang |<img width="1002" alt="image" src="figures/skip_step.png"> |[Github](https://github.com/tengxiaoliu/LM_skip) <br> [Paper](https://arxiv.org/abs/2411.01855)|[//]: #03/16
-  <br> Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.18547v4/x10.png"> |[Github](https://github.com/GeniusHTX/TALE) <br> [Paper](https://arxiv.org/abs/2412.18547)| [//]: #04/08
-  <br> Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao |<img width="1002" alt="image" src="figures/o1_pruner.png"> |[Github](https://github.com/StarDewXXX/O1-Pruner) <br> [Paper](https://arxiv.org/abs/2501.12570)|[//]: #03/16
- Kimi k1.5: Scaling Reinforcement Learning with LLMs
-  <br> Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2502.03373v1/x1.png"> |[Github](https://github.com/eddycmu/demystify-long-cot) <br> [Paper](https://arxiv.org/abs/2502.03373)| [//]: #04/08
-  <br> Daman Arora, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04463v2/x3.png"> |[Github](https://github.com/Zanette-Labs/efficient-reasoning) <br> [Paper](https://arxiv.org/abs/2502.04463)| [//]: #04/08
-  <br> Pranjal Aggarwal, Sean Welleck |<img width="1002" alt="image" src="https://arxiv.org/html/2503.04697v1/x2.png"> |[Github](https://github.com/cmu-l3/l1) <br> [Paper](https://www.arxiv.org/abs/2503.04697)| [//]: #04/08
- Distilling System 2 into System 1
-  <br> Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li |<img width="1002" alt="image" src="figures/TokenSkip.png"> |[Github](https://github.com/hemingkx/TokenSkip) <br> [Paper](https://arxiv.org/abs/2502.12067)|[//]: #03/20
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
-  <br> Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun |<img width="1002" alt="image" src="https://arxiv.org/html/2502.20122v2/x1.png"> |[Github](https://github.com/TergelMunkhbat/concise-reasoning) <br> [Paper](https://arxiv.org/abs/2502.20122)| [//]: #04/08
- ]()<br>[The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models](https://arxiv.org/abs/2401.05618) <br> Matthew Renze, Erhan Guven |<img width="1002" alt="image" src="https://arxiv.org/html/2401.05618v3/x1.png"> |[Github](https://github.com/matthewrenze/jhu-concise-cot) <br> [Paper](https://arxiv.org/abs/2401.05618)| [//]: #04/08
- Break the Chain: Large Language Models Can be Shortcut Reasoners
-  <br> Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18600v2/extracted/6244873/plot.png"> |[Github](https://github.com/sileix/chain-of-draft) <br> [Paper](https://arxiv.org/abs/2502.18600)| [//]: #04/08
- ]()<br>[Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought](https://arxiv.org/abs/2410.05695) <br> Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che |<img width="1002" alt="image" src="https://arxiv.org/html/2410.05695v2/x1.png"> |[Github](https://github.com/LightChen233/reasoning-boundary) <br> [Paper](https://arxiv.org/abs/2410.05695)| [//]: #04/08
- How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - pro-legend.png" width="45%"> <img src="https://arxiv.org/html/2503.01141v2/extracted/6325669/plot/Anthropic/claude-3-5-sonnet-20241022-mmlu-main.png" width="45%"> |[Paper](https://arxiv.org/abs/2503.01141)| [//]: #04/08
-  <br> Simon A. Aytes, Jinheon Baek, Sung Ju Hwang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.05179v1/x1.png"> |[Github](https://github.com/SimonAytes/SoT) <br> [Paper](https://arxiv.org/abs/2503.05179)| [//]: #04/08
- Learning to Route LLMs with Confidence Tokens - Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2410.13284v2/x1.png"> |[Paper](https://arxiv.org/abs/2410.13284)| [//]: #04/08
- Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization - Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04428v1/x1.png"> |[Paper](https://arxiv.org/abs/2502.04428)| [//]: #04/08
- Claude 3.7 Sonnet - 3-7-sonnet)
- Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
-  <br> Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.08482v1/x1.png"> |[Github](https://github.com/qifanyu/RELAY) <br> [Paper](https://arxiv.org/abs/2502.08482)| [//]: #04/08
-  <br> Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li |<img width="1002" alt="image" src="figures/TokenSkip.png"> |[Github](https://github.com/hemingkx/TokenSkip) <br> [Paper](https://arxiv.org/abs/2502.12067)|[//]: #03/20
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
-  <br> Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun |<img width="1002" alt="image" src="https://arxiv.org/html/2502.20122v2/x1.png"> |[Github](https://github.com/TergelMunkhbat/concise-reasoning) <br> [Paper](https://arxiv.org/abs/2502.20122)| [//]: #04/08
-  <br> Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.18547v4/x10.png"> |[Github](https://github.com/GeniusHTX/TALE) <br> [Paper](https://arxiv.org/abs/2412.18547)| [//]: #04/08
-  <br> Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao |<img width="1002" alt="image" src="figures/o1_pruner.png"> |[Github](https://github.com/StarDewXXX/O1-Pruner) <br> [Paper](https://arxiv.org/abs/2501.12570)|[//]: #03/16
- Kimi k1.5: Scaling Reinforcement Learning with LLMs
-  <br> Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2502.03373v1/x1.png"> |[Github](https://github.com/eddycmu/demystify-long-cot) <br> [Paper](https://arxiv.org/abs/2502.03373)| [//]: #04/08
-  <br> Daman Arora, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04463v2/x3.png"> |[Github](https://github.com/Zanette-Labs/efficient-reasoning) <br> [Paper](https://arxiv.org/abs/2502.04463)| [//]: #04/08
-  <br> Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) <br> [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
-  <br> Yu Kang, Xianghui Sun, Liangyu Chen, Wei Zou |<img width="1002" alt="image" src="figures/co3t.png"> |[Paper](https://arxiv.org/abs/2412.11664)|[//]: #03/16
- ![Star - NeurIPS_2024-blue)]()<br>[Can Language Models Learn to Skip Steps?](https://arxiv.org/abs/2411.01855) <br> Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang |<img width="1002" alt="image" src="figures/skip_step.png"> |[Github](https://github.com/tengxiaoliu/LM_skip) <br> [Paper](https://arxiv.org/abs/2411.01855)|[//]: #03/16
- Distilling System 2 into System 1
-  <br> Pranjal Aggarwal, Sean Welleck |<img width="1002" alt="image" src="https://arxiv.org/html/2503.04697v1/x2.png"> |[Github](https://github.com/cmu-l3/l1) <br> [Paper](https://www.arxiv.org/abs/2503.04697)| [//]: #04/08
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
- Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
-  <br> Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.01296v1/x1.png"> |[Github](https://github.com/UCSB-NLP-Chang/ThinkPrune) <br> [Paper](https://arxiv.org/abs/2504.01296)| [//]: #04/08
- Think When You Need: Self-Adaptive Chain-of-Thought Learning
- Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint
- CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models
-  <br> Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18600v2/extracted/6244873/plot.png"> |[Github](https://github.com/sileix/chain-of-draft) <br> [Paper](https://arxiv.org/abs/2502.18600)| [//]: #04/08
- How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach - pro-legend.png" width="45%"> <img src="https://arxiv.org/html/2503.01141v2/extracted/6325669/plot/Anthropic/claude-3-5-sonnet-20241022-mmlu-main.png" width="45%"> |[Paper](https://arxiv.org/abs/2503.01141)| [//]: #04/08
-  <br> Simon A. Aytes, Jinheon Baek, Sung Ju Hwang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.05179v1/x1.png"> |[Github](https://github.com/SimonAytes/SoT) <br> [Paper](https://arxiv.org/abs/2503.05179)| [//]: #04/08
- Learning to Route LLMs with Confidence Tokens - Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2410.13284v2/x1.png"> |[Paper](https://arxiv.org/abs/2410.13284)| [//]: #04/08
- Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization - Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.04428v1/x1.png"> |[Paper](https://arxiv.org/abs/2502.04428)| [//]: #04/08
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
- Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
-  <br> Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.01296v1/x1.png"> |[Github](https://github.com/UCSB-NLP-Chang/ThinkPrune) <br> [Paper](https://arxiv.org/abs/2504.01296)| [//]: #04/08
- Think When You Need: Self-Adaptive Chain-of-Thought Learning
-  <br> Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein |<img width="1002" alt="image" src="https://arxiv.org/html/2502.05171v2/x2.png"> |[Github](https://github.com/seal-rg/recurrent-pretraining) <br> [Paper](https://arxiv.org/abs/2502.05171)| [//]: #04/08
- Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
- Claude 3.7 Sonnet - 3-7-sonnet)
- Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
- ]()<br>[Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models](https://arxiv.org/abs/2402.07754) <br> Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong |<img width="1002" alt="image" src="figures/diffusion_thought.png"> |[Github](https://github.com/HKUNLP/diffusion-of-thoughts) <br> [Paper](https://arxiv.org/abs/2402.07754)| [//]: #04/08
-  <br> Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He |<img width="1002" alt="image" src="https://arxiv.org/html/2502.08482v1/x1.png"> |[Github](https://github.com/qifanyu/RELAY) <br> [Paper](https://arxiv.org/abs/2502.08482)| [//]: #04/08
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
-  <br> Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2502.15589v1/x1.png"> |[Github](https://github.com/zjunlp/LightThinker) <br> [Paper](https://arxiv.org/abs/2502.15589)| [//]: #04/08
- ![Star - COLM_2024-blue)]()<br>[Guiding Language Model Reasoning with Planning Tokens](https://arxiv.org/abs/2310.05707) <br> Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni |<img width="1002" alt="image" src="https://arxiv.org/html/2310.05707v4/extracted/5777851/img/overview.png"> |[Github](https://github.com/WANGXinyiLinda/planning_tokens) <br> [Paper](https://arxiv.org/abs/2310.05707)| [//]: #04/08
- ![Star - COLM_2024-blue)]()<br>[Let's Think Dot by Dot: Hidden Computation in Transformer Language Models](https://arxiv.org/abs/2404.15758) <br> Jacob Pfau, William Merrill, Samuel R. Bowman |<img width="1002" alt="image" src="https://arxiv.org/html/2404.15758v1/extracted/2404.15758v1/figs/scale_len.png"> |[Github](https://github.com/JacobPfau/fillerTokens) <br> [Paper](https://arxiv.org/abs/2404.15758)| [//]: #04/08
-  <br> Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2411.13504v2/x1.png"> |[Github](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning) <br> [Paper](https://arxiv.org/abs/2411.13504)| [//]: #04/08
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
- Training Large Language Models to Reason in a Continuous Latent Space
-  <br> Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein |<img width="1002" alt="image" src="https://arxiv.org/html/2502.05171v2/x2.png"> |[Github](https://github.com/seal-rg/recurrent-pretraining) <br> [Paper](https://arxiv.org/abs/2502.05171)| [//]: #04/08
- Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
- Training Large Language Models to Reason in a Continuous Latent Space
-  <br> Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2502.15589v1/x1.png"> |[Github](https://github.com/zjunlp/LightThinker) <br> [Paper](https://arxiv.org/abs/2502.15589)| [//]: #04/08
- ![Star - COLM_2024-blue)]()<br>[Guiding Language Model Reasoning with Planning Tokens](https://arxiv.org/abs/2310.05707) <br> Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni |<img width="1002" alt="image" src="https://arxiv.org/html/2310.05707v4/extracted/5777851/img/overview.png"> |[Github](https://github.com/WANGXinyiLinda/planning_tokens) <br> [Paper](https://arxiv.org/abs/2310.05707)| [//]: #04/08
-  <br> Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2411.13504v2/x1.png"> |[Github](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning) <br> [Paper](https://arxiv.org/abs/2411.13504)| [//]: #04/08
-
Background Papers
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
-  <br> Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou |<img width="1002" alt="image" src="figures/webthinker.png"> |[Github](https://github.com/RUC-NLPIR/WebThinker) <br> [Paper](https://arxiv.org/abs/2504.21776)|[//]: #05/02
-  <br> Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2504.20571v1/x3.png"> |[Github](https://github.com/ypwang61/One-Shot-RLVR) <br> [Paper](https://arxiv.org/abs/2504.20571)|[//]: #04/30
- Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision - Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu |<img width="1002" alt="image" src="figures/eorm.png"> |[Paper](https://arxiv.org/abs/2505.14999)| [//]: #05/22
- AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
-  <br> Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang |<img width="1002" alt="image" src="figures/BARREL.png"> |[Github](https://github.com/thu-coai/BARREL) <br> [Paper](https://arxiv.org/abs/2505.13529)| [//]: #05/23
-  <br> Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi |<img width="1002" alt="image" src="figures/Tango.png"> |[Github](https://github.com/kaiwenzha/rl-tango) <br> [Paper](https://arxiv.org/abs/2505.15034)| [//]: #05/23
- Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
- Reasoning Models Better Express Their Confidence
- Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
-  <br> Jiaan Wang, Fandong Meng, Jie Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12996v1/x4.png"> |[Github](https://github.com/krystalan/DRT) <br> [Paper](https://arxiv.org/abs/2505.12996)| [//]: #05/20
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data
-  <br> Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10554v1/x2.png"> |[Github](https://github.com/zhiyuanhubj/Meta-Ability-Alignment) <br> [Paper](https://arxiv.org/abs/2505.10554)| [//]: #05/19
- | [//]: #05/18
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
- INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
- AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
-  <br> Fangwei Zhu, Peiyi Wang, Zhifang Sui |<img width="1002" alt="image" src="https://arxiv.org/html/2505.04955v1/x2.png"> |[Github](https://github.com/solitaryzero/CoTs_are_Variables) <br> [Paper](https://arxiv.org/abs/2505.04955)| [//]: #05/17
-  <br> Xiaomi LLM-Core Team |<img width="1002" alt="image" src="https://arxiv.org/html/2505.07608v1/x1.png"> |[Github](https://github.com/xiaomimimo/MiMo) <br> [Paper](https://arxiv.org/abs/2505.07608)| [//]: #05/17
-  <br> Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.13837v1/x1.png"> |[Github](https://github.com/LeapLabTHU/limit-of-RLVR) <br> [Paper](https://arxiv.org/abs/2504.13837)| [//]: #04/22
-  <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
- ]()<br>[Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) <br> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) <br> [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
- ]()<br>[Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) <br> Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) <br> [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
-  <br> Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou |<img width="1002" alt="image" src="figures/sc.png"> |[Paper](https://arxiv.org/abs/2203.11171)| [//]: #04/08
-  <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
- ]()<br>[Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) <br> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) <br> [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
- ]()<br>[Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) <br> Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) <br> [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
- ]()<br>[Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) <br> Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) <br> [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
- Thinking Machines: A Survey of LLM based Reasoning Strategies
-  <br> Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.17419v2/extracted/6232702/images/timeline.png"> |[Github](https://github.com/zzli2022/Awesome-System2-Reasoning-LLM) <br> [Paper](https://arxiv.org/abs/2502.17419)| [//]: #04/08
- ]()<br>[Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588) <br> Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen |<img width="1002" alt="image" src="figures/pot.png"> |[Github](https://github.com/TIGER-AI-Lab/Program-of-Thoughts) <br> [Paper](https://arxiv.org/abs/2211.12588)| [//]: #04/08
- ]()<br>[Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) <br> Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) <br> [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
- Thinking Machines: A Survey of LLM based Reasoning Strategies
- Resa: Transparent Reasoning Models via SAEs
-
Let Decoding More Efficient
- Reward Reasoning Model
- Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
- Control-R: Towards controllable test-time scaling
- Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
- First Finish Search: Efficient Test-Time Scaling in Large Language Models
- LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
- Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones - Adsera |<img width="1002" alt="image" src="https://arxiv.org/html/2505.21825v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.21825)| [//]: #06/11
- Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
-  <br> Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17373v1/x2.png"> |[Github](https://github.com/kaiwenw/value-guided-search) <br> [Paper](https://arxiv.org/abs/2505.17373)| [//]: #06/11
- Accelerated Test-Time Scaling with Model-Free Speculative Sampling
- Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
- Fractured Chain-of-Thought Reasoning
- Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
- Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
-  <br> Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.24863v1/x27.png"> |[Github](https://github.com/ASTRAL-Group/AlphaOne) <br> [Paper](https://arxiv.org/abs/2505.24863)| [//]: #06/13
- ProxyThinker: Test-Time Guidance through Small Visual Reasoners
- A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
- Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
- Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
-  <br> Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim |<img width="1002" alt="image" src="figures/rpc_new.png"> |[Github](https://github.com/jiwonsong-dev/ReasoningPathCompression) <br> [Paper](https://arxiv.org/abs/2505.13866)| [//]: #05/22
- RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
- Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity - Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.11107v1/extracted/6445446/figures/gt_main_new.png"> |[Paper](https://arxiv.org/abs/2505.11107)| [//]: #05/19
- ![Star - ACL_main_2025-blue)]()<br>[Rethinking Repetition Problems of LLMs in Code Generation](https://arxiv.org/abs/2505.10402) <br> Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li |<img width="1002" alt="image" src="figures/code_repeat.png"> |[Github](https://github.com/LYC127/RPG) <br> [Paper](https://arxiv.org/abs/2505.10402)| [//]: #05/18
- Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping
- ![Star - IJCAI-blue)]()<br>[Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning](https://arxiv.org/abs/2505.06321) <br> Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu |<img width="1002" alt="image" src="figures/learn2think.png"> |[Github](https://github.com/zch65458525/L2T) <br> [Paper](https://arxiv.org/abs/2505.06321)| [//]: #05/17
-  <br> Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) <br> [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
-  <br> Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) <br> [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
- ![Star - ICLR_2024-blue)]()<br>[Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) <br> [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
-  <br> Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) <br> [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
-  <br> Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) <br> [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
- ![Star - ICLR_2024-blue)]()<br>[Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) <br> [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
- Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
-  <br> Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) <br> [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
-  <br> Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) <br> [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
- ![Star - NAACL_Findings_2025-blue)]()<br>[Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) <br> Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) <br> [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
- Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
- Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
- Confidence Improves Self-Consistency in LLMs
-  <br> Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) <br> [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
- ]()<br>[Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) <br> Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) <br> [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
- FastMCTS: A Simple Sampling Strategy for Data Synthesis
- ]()<br>[Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) <br> Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) <br> [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
- ![Star - NAACL_Findings_2025-blue)]()<br>[Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) <br> Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) <br> [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
- Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
- Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
- Confidence Improves Self-Consistency in LLMs
-  <br> Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) <br> [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
- ]()<br>[Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) <br> Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) <br> [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
- Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
- FastMCTS: A Simple Sampling Strategy for Data Synthesis
- ]()<br>[Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) <br> Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) <br> [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
-  <br> Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) <br> [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
-  <br> Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) <br> [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
- Dynamic Parallel Tree Search for Efficient LLM Reasoning
-  <br> Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) <br> [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
-  <br> Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) <br> [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
- Dynamic Parallel Tree Search for Efficient LLM Reasoning
-  <br> Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) <br> [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
- ]()<br>[Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) <br> Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) <br> [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
- Adaptive Skeleton Graph Decoding
-  <br> Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) <br> [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
- THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
- ]()<br>[Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) <br> Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) <br> [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
- Adaptive Skeleton Graph Decoding
-  <br> Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) <br> [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
-  <br> Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) <br> [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
- DISC: Dynamic Decomposition Improves LLM Inference Scaling
- From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
- DISC: Dynamic Decomposition Improves LLM Inference Scaling
- From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
-  <br> Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) <br> [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
- ]()<br>[Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) <br> Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) <br> [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
-  <br> Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) <br> [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
-  <br> Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) <br> [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
-  <br> Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) <br> [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
-  <br> Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) <br> [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
- ]()<br>[Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) <br> Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) <br> [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
-  <br> Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) <br> [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
- Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models
-  <br> Jikai Wang, Juntao Li, Lijun Wu, Min Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19095v1/extracted/6392438/images/scot.png"> |[Github](https://github.com/Jikai0Wang/Speculative_CoT) <br> [Paper](https://arxiv.org/abs/2504.19095)| [//]: #04/29
- Dynamic Early Exit in Reasoning Models
-  <br> Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar |<img width="1002" alt="image" src="figures/tts_effective.png"> |[Paper](https://arxiv.org/abs/2408.03314)| [//]: #04/08
- Inference-Time Hyper-Scaling with KV Cache Compression
- Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
-  <br> Peijie Liu, Fengli Xu, Yong Li |<img width="1002" alt="image" src="https://arxiv.org/html/2506.06008v1/x2.png"> |[Github](https://github.com/tsinghua-fib-lab/Token_Signature) <br> [Paper](https://arxiv.org/abs/2506.06008)| [//]: #06/16
-
Efficient Multimodal Reasoning
- MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
-  <br> Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang |<img width="1002" alt="image" src="figures/umrf.png"> |[Github](https://github.com/CodeGoat24/UnifiedReward) <br> [Paper](https://arxiv.org/abs/2505.03318)| [//]: #05/17
- Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
- Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
-  <br> Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23727v1/x2.png"> |[Github](https://github.com/songw-zju/PixelThink) <br> [Paper](https://arxiv.org/abs/2505.23727)| [//]: #06/06
-  <br> Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16854v1/x1.png"> |[Github](https://github.com/kokolerk/TON) <br> [Paper](https://arxiv.org/abs/2505.16854)| [//]: #05/24
- Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
- One RL to See Them All: Visual Triple Unified Reinforcement Learning
- MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
- GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
- Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
- Grounded Reinforcement Learning for Visual Reasoning
- Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models - Chi Cheung,Shengyu Zhang,Fei Wu,Hongxia Yang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23091v2/extracted/6518453/images_folder/mmr1_framework_update.png"> |[Paper](https://arxiv.org/abs/2505.23091)| [//]: #06/11
-  <br> Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23558v2/x5.png"> |[Github](https://github.com/Liar406/Look_Again) <br> [Paper](https://arxiv.org/abs/2505.23558)| [//]: #06/11
- Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought - An Huang,Guilin Liu,Shiwei Sheng,Shilong Liu,Liang-Yan Gui,Jan Kautz,Yu-Xiong Wang,Zhiding Yu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23766v1/x2.png"> |[Paper](https://arxiv.org/abs/2505.23766)| [//]: #06/11
- Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
- Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning - Ching Lin, Kevin Lin, Wangmeng Zuo, Lijuan Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.19702v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.19702)| [//]: #06/11
- Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
- SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
- Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
- Visual Abstract Thinking Empowers Multimodal Reasoning
- VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
- DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
- FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
-  <br> Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14677v1/x3.png"> |[Github](https://github.com/maifoundations/Visionary-R1) <br> [Paper](https://arxiv.org/abs/2505.14677)| [//]: #05/22
-  <br> Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12081v1/x1.png"> |[Github](https://github.com/dvlab-research/VisionReasoner) <br> [Paper](https://arxiv.org/abs/2505.12081)| [//]: #05/20
-  <br> Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao |<img width="1002" alt="image" src="figures/mmprm.png"> |[Github](https://github.com/ModalMinds/MM-PRM) <br> [Paper](https://arxiv.org/abs/2505.13427)| [//]: #05/20
- CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
- Visual Planning: Let's Think Only with Images
-  <br> Hongchen Wei, Zhenzhong Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16151v1/x1.png"> |[Github](https://github.com/hcwei13/FRANK-ZERO-Inference) <br> [Paper](https://arxiv.org/abs/2505.16151)| [//]: #05/24
-  <br> Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16673v1/x2.png"> |[Github](https://github.com/HJYao00/R1-ShareVL) <br> [Paper](https://arxiv.org/abs/2505.16673)| [//]: #05/24
-  <br> Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17018v1/x1.png"> |[Github](https://github.com/kxfan2002/SophiaVL-R1) <br> [Paper](https://arxiv.org/abs/2505.17018)| [//]: #05/24
- VLM-R3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
-  <br> Siqu Ou, Hongcheng Liu, Pingjie Wang, Yusheng Liao, Chuan Xuan, Yanfeng Wang, Yu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16579v1/x1.png"> |[Github](https://github.com/Cratileo/D2R) <br> [Paper](https://arxiv.org/abs/2505.16579)| [//]: #05/24
- GRIT: Teaching MLLMs to Think with Images - Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju, Xinze Guan, Xin Eric Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15879v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.15879)| [//]: #05/24
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
-  <br> Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon |<img width="1002" alt="image" src="https://arxiv.org/html/2505.03981v1/x1.png"> |[Github](https://github.com/microsoft/x-reasoner) <br> [Paper](https://arxiv.org/abs/2505.03981)| [//]: #05/18
- Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
-  <br> Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10557v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10557)| [//]: #05/18
- ![Star
- ![Star - ICML_2025-blue)]()<br>[Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging](https://arxiv.org/abs/2505.05464) <br> Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He |<img width="1002" alt="image" src="https://arxiv.org/html/2505.05464v1/x1.png"> |[Github](https://github.com/shiqichen17/VLM_Merging) <br> [Paper](https://arxiv.org/abs/2505.05464)| [//]: #05/18
- Seed1.5-VL Technical Report
-  <br> Shenshen Li, Kaiyuan Deng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Heng Tao Shen, Xing Xu |<img width="1002" alt="image" src="https://arxiv.org/html/2506.04755v1/x3.png"> |[Github](https://github.com/Leo-ssl/RAP) <br> [Paper](https://arxiv.org/abs/2506.04755)| [//]: #06/16
-  <br> Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan |<img width="1002" alt="image" src="https://arxiv.org/html/2506.09965v1/x4.png"> |[Github](https://github.com/AntResearchNLP/ViLaSR) <br> [Paper](https://arxiv.org/abs/2506.09965)| [//]: #06/16
- Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
-
Evaluation and Benchmarks
-  <br> Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.18675v2/x1.png"> |[Github](https://github.com/fscdc/ReasonMap) <br> [Paper](https://arxiv.org/abs/2505.18675)| [//]: #06/11
- ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
-  <br> Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett |<img width="1002" alt="image" src="figures/chartmuseum.png"> |[Github](https://github.com/Liyan06/ChartMuseum) <br> [Paper](https://arxiv.org/abs/2505.13444)| [//]: #05/20
- Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations - v2.png"> |[Paper](https://arxiv.org/abs/2505.10937)| [//]: #05/19
- StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
- THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
-  <br> Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) <br> [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- Evaluating Large Language Models Trained on Code
- τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
-  <br> Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) <br> [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
- LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
-  <br> Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) <br> [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
-  <br> Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) <br> [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
-  <br> Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) <br> [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- Evaluating Large Language Models Trained on Code
- τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
-  <br> Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) <br> [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
- LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
-  <br> Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) <br> [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
-  <br> Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.07191v4/x1.png"> |[Github](https://github.com/usail-hkust/benchmark_inference_time_computation_LLM) <br> [Paper](https://arxiv.org/abs/2502.07191)| [//]: #04/08
-  <br> Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) <br> [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
-  <br> Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) <br> [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
- S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
-  <br> Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) <br> [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
-  <br> Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) <br> [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
- DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
-  <br> Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli Zhao |<img width="1002" alt="image" src="https://arxiv.org/html/2504.18589v1/x1.png"> |[Github](https://github.com/alibaba-damo-academy/VCBench) <br> [Paper](https://arxiv.org/abs/2504.18589)| [//]: #04/29
-  <br> Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19093v1/x2.png"> |[Github](https://github.com/Goodman-liyu/CipherBank) <br> [Paper](https://arxiv.org/abs/2504.19093)| [//]: #04/29
-  <br> Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15279v1/x1.png"> |[Github](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) <br> [Paper](https://arxiv.org/abs/2504.15279)| [//]: #04/25
-
Competition
-  [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891). Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman. [[Paper]](https://arxiv.org/abs/2504.16891)[[Github]](https://github.com/NVIDIA/NeMo-Skills)
-
-
Updates
Programming Languages
Categories