An open API service indexing awesome lists of open source software.

Awesome-Efficient-Reasoning-Models

[TMLR 2025] Efficient Reasoning Models: A Survey
https://github.com/fscdc/Awesome-Efficient-Reasoning-Models

Last synced: 9 days ago
JSON representation

  • Full list

    • Background Papers

      • ![Star - of-RLVR)<br>[Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) <br> Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.13837v1/x1.png"> |[Github](https://github.com/LeapLabTHU/limit-of-RLVR) <br> [Paper](https://arxiv.org/abs/2504.13837)| [//]: #04/22
      • ![Publish - of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) <br> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou |<img width="1002" alt="image" src="figures/cot_prompting.png"> |[Paper](https://arxiv.org/abs/2201.11903)| [//]: #04/08
      • ![Star - nlp/tree-of-thought-llm) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2023-blue)]()<br>[Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601) <br> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10601v2/x1.png"> |[Github](https://github.com/princeton-nlp/tree-of-thought-llm) <br> [Paper](https://arxiv.org/abs/2305.10601)| [//]: #04/08
      • ![Star - of-thoughts) [![Publish](https://img.shields.io/badge/Conference-AAAI_2024-blue)]()<br>[Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687) <br> Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler |<img width="1002" alt="image" src="figures/got.png"> |[Github](https://github.com/spcl/graph-of-thoughts) <br> [Paper](https://arxiv.org/abs/2308.09687)| [//]: #04/08
      • ![Publish - Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) <br> Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou |<img width="1002" alt="image" src="figures/sc.png"> |[Paper](https://arxiv.org/abs/2203.11171)| [//]: #04/08
      • ![Star - AI-Lab/Program-of-Thoughts) [![Publish](https://img.shields.io/badge/Conference-TMLR_2023-blue)]()<br>[Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks](https://arxiv.org/abs/2211.12588) <br> Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen |<img width="1002" alt="image" src="figures/pot.png"> |[Github](https://github.com/TIGER-AI-Lab/Program-of-Thoughts) <br> [Paper](https://arxiv.org/abs/2211.12588)| [//]: #04/08
      • ![Star - of-symbol-planning) [![Publish](https://img.shields.io/badge/Conference-COLM_2024-blue)]()<br>[Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models](https://arxiv.org/abs/2305.10276) <br> Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2305.10276v7/x1.png"> |[Github](https://github.com/hanxuhu/chain-of-symbol-planning) <br> [Paper](https://arxiv.org/abs/2305.10276)| [//]: #04/08
      • Thinking Machines: A Survey of LLM based Reasoning Strategies
      • ![Star - System2-Reasoning-LLM)<br>[From System 1 to System 2: A Survey of Reasoning Large Language Models](https://arxiv.org/abs/2502.17419) <br> Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.17419v2/extracted/6232702/images/timeline.png"> |[Github](https://github.com/zzli2022/Awesome-System2-Reasoning-LLM) <br> [Paper](https://arxiv.org/abs/2502.17419)| [//]: #04/08
      • ![Star
      • Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
      • Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
      • ![Star - NLPIR/WebThinker)<br>[WebThinker: Empowering Large Reasoning Models with Deep Research Capability](https://arxiv.org/abs/2504.21776) <br> Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou |<img width="1002" alt="image" src="figures/webthinker.png"> |[Github](https://github.com/RUC-NLPIR/WebThinker) <br> [Paper](https://arxiv.org/abs/2504.21776)|[//]: #05/02
      • ![Star - Shot-RLVR)<br>[Reinforcement Learning for Reasoning in Large Language Models with One Training Example](https://arxiv.org/abs/2504.20571) <br> Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2504.20571v1/x3.png"> |[Github](https://github.com/ypwang61/One-Shot-RLVR) <br> [Paper](https://arxiv.org/abs/2504.20571)|[//]: #04/30
      • Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision - Fu Yang, Zongyu Lin, Xinfeng Li, Hao Xu, Kai-Wei Chang, Ying Nian Wu |<img width="1002" alt="image" src="figures/eorm.png"> |[Paper](https://arxiv.org/abs/2505.14999)| [//]: #05/22
      • AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
      • ![Star - coai/BARREL)<br>[BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs](https://arxiv.org/abs/2505.13529) <br> Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang |<img width="1002" alt="image" src="figures/BARREL.png"> |[Github](https://github.com/thu-coai/BARREL) <br> [Paper](https://arxiv.org/abs/2505.13529)| [//]: #05/23
      • ![Star - tango)<br>[RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning](https://arxiv.org/abs/2505.15034) <br> Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi |<img width="1002" alt="image" src="figures/Tango.png"> |[Github](https://github.com/kaiwenzha/rl-tango) <br> [Paper](https://arxiv.org/abs/2505.15034)| [//]: #05/23
      • Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
      • Reasoning Models Better Express Their Confidence
      • Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
      • ![Star - Enhanced Reinforcement Learning](https://arxiv.org/abs/2505.12996) <br> Jiaan Wang, Fandong Meng, Jie Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12996v1/x4.png"> |[Github](https://github.com/krystalan/DRT) <br> [Paper](https://arxiv.org/abs/2505.12996)| [//]: #05/20
      • Absolute Zero: Reinforced Self-play Reasoning with Zero Data
      • ![Star
      • ![Star - Ability-Alignment)<br>[Beyond Aha!: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models](https://arxiv.org/abs/2505.10554) <br> Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10554v1/x2.png"> |[Github](https://github.com/zhiyuanhubj/Meta-Ability-Alignment) <br> [Paper](https://arxiv.org/abs/2505.10554)| [//]: #05/19
      • ![Star
      • The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
      • Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models - Jun Qi |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10446v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10446)| [//]: #05/18
      • J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
      • INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
      • AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
      • ![Star - of-Thought Tokens are Computer Program Variables](https://arxiv.org/abs/2505.04955) <br> Fangwei Zhu, Peiyi Wang, Zhifang Sui |<img width="1002" alt="image" src="https://arxiv.org/html/2505.04955v1/x2.png"> |[Github](https://github.com/solitaryzero/CoTs_are_Variables) <br> [Paper](https://arxiv.org/abs/2505.04955)| [//]: #05/17
      • ![Star - - From Pretraining to Posttraining](https://arxiv.org/abs/2505.07608) <br> Xiaomi LLM-Core Team |<img width="1002" alt="image" src="https://arxiv.org/html/2505.07608v1/x1.png"> |[Github](https://github.com/xiaomimimo/MiMo) <br> [Paper](https://arxiv.org/abs/2505.07608)| [//]: #05/17
      • Resa: Transparent Reasoning Models via SAEs
      • ![Star - AI/MiniMax-M1)<br>[MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention](https://arxiv.org/abs/2506.13585) <br> MiniMax Team |<img width="1002" alt="image" src="https://arxiv.org/html/2506.13585v1/x1.png"> |[Github](https://github.com/MiniMax-AI/MiniMax-M1) <br> [Paper](https://arxiv.org/abs/2506.13585)| [//]: #06/24
    • Build SLM with Strong Reasoning Ability

      • ![Publish
      • ![Publish - main.333/) <br> Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, Yin Zhang |<img width="1002" alt="image" src="figures/counterfactual_distillation.png"> |[Paper](https://aclanthology.org/2024.emnlp-main.333/)| [//]: #04/08
      • ![Star - Model-Gap/Small-Model-Learnability-Gap)<br>[Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
      • Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
      • ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
      • Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation
      • ![Publish - main.1140.pdf) <br> Yichun Zhao, Shuheng Zhou, Huijia Zhu |<img width="1002" alt="image" src="figures/prr.png"> |[Paper](https://aclanthology.org/2024.lrec-main.1140.pdf)| [//]: #04/08
      • Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
      • Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
      • ![Star - NLP/Distilling-CoT-Reasoning)<br>[Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
      • Towards Reasoning Ability of Small Language Models
      • ![Star - Reasoning-Models)<br>[Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
      • When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
      • ![Star - rs)<br>[Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
      • ![Star - nlp/simpleRL-reason)<br>[SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
      • ![Star - wang/Tina)<br>[Tina: Tiny Reasoning Models via LoRA](https://arxiv.org/abs/2504.15777) <br> Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15777v1/x4.png"> |[Github](https://github.com/shangshang-wang/Tina) <br> [Paper](https://arxiv.org/abs/2504.15777)| [//]: #04/25
      • Llama-Nemotron: Efficient Reasoning Models
      • Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math - Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, Shuohang Wang, Weijian Xu, Jianfeng Gao, Weizhu Chen |<img width="1002" alt="image" src="figures/phi_4_mini_reasoning.png"> |[Paper](https://arxiv.org/abs/2504.21233)|[//]: #05/02
      • Phi-4-reasoning Technical Report
      • Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
      • Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
      • Replacing thinking with tool usage enables reasoning in small language models
      • DeepScaleR - project.com/)
      • ![Star - Model-Gap/Small-Model-Learnability-Gap)<br>[Small Models Struggle to Learn from Strong Reasoners](https://arxiv.org/abs/2502.12143) <br> Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12143v2/x1.png"> |[Github](https://github.com/Small-Model-Gap/Small-Model-Learnability-Gap) <br> [Paper](https://arxiv.org/abs/2502.12143)| [//]: #04/08
      • ![Star - AAAI_2024-blue)]()<br>[Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data](https://arxiv.org/abs/2312.12832) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Bin Sun, Xinglin Wang, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2312.12832v1/x1.png"> |[Github](https://github.com/Yiwei98/TDG) <br> [Paper](https://arxiv.org/abs/2312.12832)| [//]: #04/08
      • ![Star - z/SCORE) [![Publish](https://img.shields.io/badge/Conference-ACL_Findings_2024-blue)]()<br>[Small Language Models Need Strong Verifiers to Self-Correct Reasoning](https://arxiv.org/abs/2404.17140) <br> Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2404.17140v2/x1.png"> |[Github](https://github.com/yunx-z/SCORE) <br> [Paper](https://arxiv.org/abs/2404.17140)| [//]: #04/08
      • ![Star - COLING_2025-blue)]()<br>[SKIntern : Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models](https://arxiv.org/abs/2409.13183) <br> Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Jun Zhao, Kang Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2409.13183v2/x1.png"> |[Github](https://github.com/Xnhyacinth/SKIntern) <br> [Paper](https://arxiv.org/abs/2409.13183)| [//]: #04/08
      • ![Star - NLP/Distilling-CoT-Reasoning)<br>[Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning](https://arxiv.org/abs/2502.18001) <br> Xinghao Chen, Zhijing Sun, Wenjin Guo, Miaoran Zhang, Yanjun Chen, Yirong Sun, Hui Su, Yijie Pan, Dietrich Klakow, Wenjie Li, Xiaoyu Shen |<img width="1002" alt="image" src="https://arxiv.org/html/2502.18001v1/x1.png"> |[Github](https://github.com/EIT-NLP/Distilling-CoT-Reasoning) <br> [Paper](https://arxiv.org/abs/2502.18001)| [//]: #04/08
      • ![Star - Reasoning-Models)<br>[Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models](https://arxiv.org/abs/2504.04823) <br> Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou |<img width="1002" alt="image" src="figures/quant_hurt.png"> |[Github](https://github.com/ruikangliu/Quantized-Reasoning-Models) <br> [Paper](https://arxiv.org/abs/2504.04823)| [//]: #04/14
      • ![Star - wang/Tina)<br>[Tina: Tiny Reasoning Models via LoRA](https://arxiv.org/abs/2504.15777) <br> Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15777v1/x4.png"> |[Github](https://github.com/shangshang-wang/Tina) <br> [Paper](https://arxiv.org/abs/2504.15777)| [//]: #04/25
      • ![Star - rs)<br>[Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't](https://arxiv.org/abs/2503.16219) <br> Quy-Anh Dang, Chris Ngo |<img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/pass1.png" width="45%"> <img src="https://arxiv.org/html/2503.16219v1/extracted/6296504/images/costs.png" width="45%"> |[Github](https://github.com/knoveleng/open-rs) <br> [Paper](https://arxiv.org/abs/2503.16219)| [//]: #04/08
      • ![Star - nlp/simpleRL-reason)<br>[SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild](https://arxiv.org/abs/2503.18892) <br> Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He |<img width="1002" alt="image" src="figures/simplerl_zoo.png"> |[Github](https://github.com/hkust-nlp/simpleRL-reason) <br> [Paper](https://arxiv.org/abs/2503.18892)| [//]: #04/08
    • Competition

      • ![Publish - Skills.svg?style=social&label=Star)](https://github.com/NVIDIA/NeMo-Skills) [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891). Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman. [[Paper]](https://arxiv.org/abs/2504.16891)[[Github]](https://github.com/NVIDIA/NeMo-Skills)
    • Efficient Agentic Reasoning

    • Efficient Multimodal Reasoning

      • MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing
      • ![Star - of-Thought Reward Model through Reinforcement Fine-Tuning](https://arxiv.org/abs/2505.03318) <br> Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang |<img width="1002" alt="image" src="figures/umrf.png"> |[Github](https://github.com/CodeGoat24/UnifiedReward) <br> [Paper](https://arxiv.org/abs/2505.03318)| [//]: #05/17
      • ![Star - zju/PixelThink)<br>[PixelThink: Towards Efficient Chain-of-Pixel Reasoning](https://arxiv.org/abs/2505.23727) <br> Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23727v1/x2.png"> |[Github](https://github.com/songw-zju/PixelThink) <br> [Paper](https://arxiv.org/abs/2505.23727)| [//]: #06/06
      • ![Star - Language Models](https://arxiv.org/abs/2505.16854) <br> Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16854v1/x1.png"> |[Github](https://github.com/kokolerk/TON) <br> [Paper](https://arxiv.org/abs/2505.16854)| [//]: #05/24
      • Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
      • One RL to See Them All: Visual Triple Unified Reinforcement Learning
      • MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
      • GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
      • Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
      • Grounded Reinforcement Learning for Visual Reasoning
      • Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models - Chi Cheung,Shengyu Zhang,Fei Wu,Hongxia Yang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23091v2/extracted/6518453/images_folder/mmr1_framework_update.png"> |[Paper](https://arxiv.org/abs/2505.23091)| [//]: #06/11
      • ![Star - Language Reasoning Models to Re-attention Visual Information](https://arxiv.org/abs/2505.23558) <br> Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23558v2/x5.png"> |[Github](https://github.com/Liar406/Look_Again) <br> [Paper](https://arxiv.org/abs/2505.23558)| [//]: #06/11
      • Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought - An Huang,Guilin Liu,Shiwei Sheng,Shilong Liu,Liang-Yan Gui,Jan Kautz,Yu-Xiong Wang,Zhiding Yu |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23766v1/x2.png"> |[Paper](https://arxiv.org/abs/2505.23766)| [//]: #06/11
      • Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
      • Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning - Ching Lin, Kevin Lin, Wangmeng Zuo, Lijuan Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.19702v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.19702)| [//]: #06/11
      • Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
      • SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
      • Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
      • Visual Abstract Thinking Empowers Multimodal Reasoning
      • VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
      • DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
      • FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
      • ![Star
      • Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework
      • Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
      • ![Star - ZERO-Inference)<br>[Training-Free Reasoning and Reflection in MLLMs](https://arxiv.org/abs/2505.16151) <br> Hongchen Wei, Zhenzhong Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16151v1/x1.png"> |[Github](https://github.com/hcwei13/FRANK-ZERO-Inference) <br> [Paper](https://arxiv.org/abs/2505.16151)| [//]: #05/24
      • ![Star - ShareVL)<br>[R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO](https://arxiv.org/abs/2505.16673) <br> Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16673v1/x2.png"> |[Github](https://github.com/HJYao00/R1-ShareVL) <br> [Paper](https://arxiv.org/abs/2505.16673)| [//]: #05/24
      • ![Star - R1)<br>[SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward](https://arxiv.org/abs/2505.17018) <br> Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17018v1/x1.png"> |[Github](https://github.com/kxfan2002/SophiaVL-R1) <br> [Paper](https://arxiv.org/abs/2505.17018)| [//]: #05/24
      • VLM-R3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
      • ![Star - Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning](https://arxiv.org/abs/2505.16579) <br> Siqu Ou, Hongcheng Liu, Pingjie Wang, Yusheng Liao, Chuan Xuan, Yanfeng Wang, Yu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16579v1/x1.png"> |[Github](https://github.com/Cratileo/D2R) <br> [Paper](https://arxiv.org/abs/2505.16579)| [//]: #05/24
      • GRIT: Teaching MLLMs to Think with Images - Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju, Xinze Guan, Xin Eric Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.15879v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.15879)| [//]: #05/24
      • UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
      • Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
      • ![Star - R1)<br>[Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning](https://arxiv.org/abs/2505.14677) <br> Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14677v1/x3.png"> |[Github](https://github.com/maifoundations/Visionary-R1) <br> [Paper](https://arxiv.org/abs/2505.14677)| [//]: #05/22
      • ![Star - research/VisionReasoner)<br>[VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning](https://arxiv.org/abs/2505.12081) <br> Yuqi Liu, Tianyuan Qu, Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia |<img width="1002" alt="image" src="https://arxiv.org/html/2505.12081v1/x1.png"> |[Github](https://github.com/dvlab-research/VisionReasoner) <br> [Paper](https://arxiv.org/abs/2505.12081)| [//]: #05/20
      • ![Star - PRM)<br>[MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision](https://arxiv.org/abs/2505.13427) <br> Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao |<img width="1002" alt="image" src="figures/mmprm.png"> |[Github](https://github.com/ModalMinds/MM-PRM) <br> [Paper](https://arxiv.org/abs/2505.13427)| [//]: #05/20
      • CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
      • Visual Planning: Let's Think Only with Images
      • ![Star - reasoner)<br>[X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains](https://arxiv.org/abs/2505.03981) <br> Qianchu Liu, Sheng Zhang, Guanghui Qin, Timothy Ossowski, Yu Gu, Ying Jin, Sid Kiblawi, Sam Preston, Mu Wei, Paul Vozila, Tristan Naumann, Hoifung Poon |<img width="1002" alt="image" src="https://arxiv.org/html/2505.03981v1/x1.png"> |[Github](https://github.com/microsoft/x-reasoner) <br> [Paper](https://arxiv.org/abs/2505.03981)| [//]: #05/18
      • Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
      • ![Publish - VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning](https://arxiv.org/abs/2505.10557) <br> Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.10557v1/x1.png"> |[Paper](https://arxiv.org/abs/2505.10557)| [//]: #05/18
      • ![Star
      • ![Star - ICML_2025-blue)]()<br>[Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging](https://arxiv.org/abs/2505.05464) <br> Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He |<img width="1002" alt="image" src="https://arxiv.org/html/2505.05464v1/x1.png"> |[Github](https://github.com/shiqichen17/VLM_Merging) <br> [Paper](https://arxiv.org/abs/2505.05464)| [//]: #05/18
      • Seed1.5-VL Technical Report
      • ![Star - ssl/RAP)<br>[Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning](https://arxiv.org/abs/2506.04755) <br> Shenshen Li, Kaiyuan Deng, Lei Wang, Hao Yang, Chong Peng, Peng Yan, Fumin Shen, Heng Tao Shen, Xing Xu |<img width="1002" alt="image" src="https://arxiv.org/html/2506.04755v1/x3.png"> |[Github](https://github.com/Leo-ssl/RAP) <br> [Paper](https://arxiv.org/abs/2506.04755)| [//]: #06/16
      • ![Star - Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965) <br> Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan |<img width="1002" alt="image" src="https://arxiv.org/html/2506.09965v1/x4.png"> |[Github](https://github.com/AntResearchNLP/ViLaSR) <br> [Paper](https://arxiv.org/abs/2506.09965)| [//]: #06/16
      • Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
      • VGR: Visual Grounded Reasoning
      • Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
      • ![Star - O-I/RRVF)<br>[Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback](https://arxiv.org/abs/2507.20766) <br> Yang Chen, Yufan Shen, Wenxuan Huang, Sheng Zhou, Qunshu Lin, Xinyu Cai, Zhi Yu, Jiajun Bu, Botian Shi, Yu Qiao |<img width="1002" alt="image" src="https://arxiv.org/html/2507.20766v4/x1.png"> |[Github](https://github.com/L-O-I/RRVF) <br> [Paper](https://arxiv.org/abs/2507.20766)| [//]: #08/09
      • ![Star - 4B)<br>[R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning](https://arxiv.org/abs/2508.21113) <br> Qi Yang, Bolin Ni, Shiming Xiang, Han Hu, Houwen Peng, Jie Jiang |<img width="1002" alt="image" src="https://arxiv.org/html/2508.21113v2/x5.png"> |[Github](https://github.com/yannqi/R-4B) <br> [Paper](https://arxiv.org/abs/2508.21113)| [//]: #09/10
      • Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
      • ![Star - grained Visual Reasoning via Multi-Stage Reinforcement Learning](https://arxiv.org/abs/2510.02240) <br> Sicheng Feng, Kaiwen Tuo, Song Wang, Lingdong Kong, Jianke Zhu, Huan Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2510.02240v1/x2.png"> |[Github](https://github.com/fscdc/RewardMap) <br> [Paper](https://arxiv.org/abs/2510.02240)| [//]: #10/19
      • ![Star - R1V)<br>[Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning](https://arxiv.org/abs/2504.16656) <br> Chris, Yichen Wei, Yi Peng, Xiaokun Wang, Weijie Qiu, Wei Shen, Tianyidan Xie, Jiangbo Pei, Jianhao Zhang, Yunzhuo Hao, Xuchen Song, Yang Liu, Yahui Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.16656v2/extracted/6389842/figure/ssb_diagram.png"> |[Github](https://github.com/SkyworkAI/Skywork-R1V) <br> [Paper](https://arxiv.org/abs/2504.16656)| [//]: #04/29
      • ![Star - 4B)<br>[R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning](https://arxiv.org/abs/2508.21113) <br> Qi Yang, Bolin Ni, Shiming Xiang, Han Hu, Houwen Peng, Jie Jiang |<img width="1002" alt="image" src="https://arxiv.org/html/2508.21113v2/x5.png"> |[Github](https://github.com/yannqi/R-4B) <br> [Paper](https://arxiv.org/abs/2508.21113)| [//]: #09/10
      • ![Star - zju/PixelThink)<br>[PixelThink: Towards Efficient Chain-of-Pixel Reasoning](https://arxiv.org/abs/2505.23727) <br> Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23727v1/x2.png"> |[Github](https://github.com/songw-zju/PixelThink) <br> [Paper](https://arxiv.org/abs/2505.23727)| [//]: #06/06
      • ![Star - Language Models](https://arxiv.org/abs/2505.16854) <br> Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16854v1/x1.png"> |[Github](https://github.com/kokolerk/TON) <br> [Paper](https://arxiv.org/abs/2505.16854)| [//]: #05/24
      • ![Star - grained Visual Reasoning via Multi-Stage Reinforcement Learning](https://arxiv.org/abs/2510.02240) <br> Sicheng Feng, Kaiwen Tuo, Song Wang, Lingdong Kong, Jianke Zhu, Huan Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2510.02240v1/x2.png"> |[Github](https://github.com/fscdc/RewardMap) <br> [Paper](https://arxiv.org/abs/2510.02240)| [//]: #10/19
      • ![Star - O-I/RRVF)<br>[Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback](https://arxiv.org/abs/2507.20766) <br> Yang Chen, Yufan Shen, Wenxuan Huang, Sheng Zhou, Qunshu Lin, Xinyu Cai, Zhi Yu, Jiajun Bu, Botian Shi, Yu Qiao |<img width="1002" alt="image" src="https://arxiv.org/html/2507.20766v4/x1.png"> |[Github](https://github.com/L-O-I/RRVF) <br> [Paper](https://arxiv.org/abs/2507.20766)| [//]: #08/09
      • ![Star - Language Models with Interwoven Thinking and Visual Drawing](https://arxiv.org/abs/2506.09965) <br> Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan |<img width="1002" alt="image" src="https://arxiv.org/html/2506.09965v1/x4.png"> |[Github](https://github.com/AntResearchNLP/ViLaSR) <br> [Paper](https://arxiv.org/abs/2506.09965)| [//]: #06/16
      • ![Star - Language Reasoning Models to Re-attention Visual Information](https://arxiv.org/abs/2505.23558) <br> Xu Chu, Xinrong Chen, Guanyu Wang, Zhijie Tan, Kui Huang, Wenyu Lv, Tong Mo, Weiping Li |<img width="1002" alt="image" src="https://arxiv.org/html/2505.23558v2/x5.png"> |[Github](https://github.com/Liar406/Look_Again) <br> [Paper](https://arxiv.org/abs/2505.23558)| [//]: #06/11
      • ![Star
      • ![Star - ZERO-Inference)<br>[Training-Free Reasoning and Reflection in MLLMs](https://arxiv.org/abs/2505.16151) <br> Hongchen Wei, Zhenzhong Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16151v1/x1.png"> |[Github](https://github.com/hcwei13/FRANK-ZERO-Inference) <br> [Paper](https://arxiv.org/abs/2505.16151)| [//]: #05/24
      • ![Star - ShareVL)<br>[R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO](https://arxiv.org/abs/2505.16673) <br> Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16673v1/x2.png"> |[Github](https://github.com/HJYao00/R1-ShareVL) <br> [Paper](https://arxiv.org/abs/2505.16673)| [//]: #05/24
      • ![Star - R1)<br>[SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward](https://arxiv.org/abs/2505.17018) <br> Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue |<img width="1002" alt="image" src="https://arxiv.org/html/2505.17018v1/x1.png"> |[Github](https://github.com/kxfan2002/SophiaVL-R1) <br> [Paper](https://arxiv.org/abs/2505.17018)| [//]: #05/24
      • ![Star - Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning](https://arxiv.org/abs/2505.16579) <br> Siqu Ou, Hongcheng Liu, Pingjie Wang, Yusheng Liao, Chuan Xuan, Yanfeng Wang, Yu Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.16579v1/x1.png"> |[Github](https://github.com/Cratileo/D2R) <br> [Paper](https://arxiv.org/abs/2505.16579)| [//]: #05/24
      • ![Star - Verse/MMaDA)<br>[MMaDA: Multimodal Large Diffusion Language Models](https://arxiv.org/abs/2505.15809) <br> Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang |<img width="1002" alt="image" src="figures/mmaba.png"> |[Github](https://github.com/Gen-Verse/MMaDA) <br> [Paper](https://arxiv.org/abs/2505.15809)| [//]: #05/22
      • ![Star - R1)<br>[Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning](https://arxiv.org/abs/2505.14677) <br> Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2505.14677v1/x3.png"> |[Github](https://github.com/maifoundations/Visionary-R1) <br> [Paper](https://arxiv.org/abs/2505.14677)| [//]: #05/22
      • ![Star - PRM)<br>[MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision](https://arxiv.org/abs/2505.13427) <br> Lingxiao Du, Fanqing Meng, Zongkai Liu, Zhixiang Zhou, Ping Luo, Qiaosheng Zhang, Wenqi Shao |<img width="1002" alt="image" src="figures/mmprm.png"> |[Github](https://github.com/ModalMinds/MM-PRM) <br> [Paper](https://arxiv.org/abs/2505.13427)| [//]: #05/20
    • Evaluation and Benchmarks

      • ![Star - Valve)<br>[CoT-Valve: Length-Compressible Chain-of-Thought Tuning](https://arxiv.org/abs/2502.09601) <br> Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang |<img width="1002" alt="image" src="figures/cot_valve.png"> |[Github](https://github.com/horseee/CoT-Valve) <br> [Paper](https://arxiv.org/abs/2502.09601)|[//]: #03/16
      • THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
      • Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
      • ![Star - stability)<br>[Non-Determinism of "Deterministic" LLM Settings](https://arxiv.org/abs/2408.04667) <br> Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin |<img width="1002" alt="image" src="https://arxiv.org/html/2408.04667v5/extracted/6331111/max_min_diff.png"> |[Github](https://github.com/breckbaldwin/llm-stability) <br> [Paper](https://arxiv.org/abs/2408.04667)| [//]: #04/08
      • The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
      • Evaluating Large Language Models Trained on Code
      • τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
      • ![Star - compass/GPassK)<br>[Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147) <br> Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen |<img width="1002" alt="image" src="https://arxiv.org/html/2412.13147v3/x1.png"> |[Github](https://github.com/open-compass/GPassK) <br> [Paper](https://arxiv.org/abs/2412.13147)| [//]: #04/08
      • LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception - Hong Liao, Sven Elflein, Liu He, Laura Leal-Taixé, Yejin Choi, Sanja Fidler, David Acuna |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15362v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.15362)| [//]: #04/23
      • ![Star - Time Computations for LLM Reasoning and Planning: A Benchmark and Insights](https://arxiv.org/abs/2502.12521) <br> Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji |<img width="1002" alt="image" src="https://arxiv.org/html/2502.12521v1/x1.png"> |[Github](https://github.com/divelab/sys2bench) <br> [Paper](https://arxiv.org/abs/2502.12521)| [//]: #04/08
      • ![Star - hkust/benchmark_inference_time_computation_LLM)<br>[Bag of Tricks for Inference-time Computation of LLM Reasoning](https://arxiv.org/abs/2502.07191) <br> Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu |<img width="1002" alt="image" src="https://arxiv.org/html/2502.07191v4/x1.png"> |[Github](https://github.com/usail-hkust/benchmark_inference_time_computation_LLM) <br> [Paper](https://arxiv.org/abs/2502.07191)| [//]: #04/08
      • ![Star - optimal-tts)<br>[Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling](https://arxiv.org/abs/2502.06703) <br> Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |<img width="1002" alt="image" src="https://arxiv.org/html/2502.06703v1/x2.png"> |[Github](https://github.com/RyanLiu112/compute-optimal-tts) <br> [Paper](https://arxiv.org/abs/2502.06703)| [//]: #04/08
      • DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs
      • S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
      • ![Star - Bench)<br>[VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning](https://arxiv.org/abs/2504.07956) <br> Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao |<img width="1002" alt="image" src="figures/video.png"> |[Github](https://github.com/zhishuifeiqian/VCR-Bench) <br> [Paper](https://arxiv.org/abs/2504.07956)| [//]: #04/16
      • ![Star - damo-academy/VCBench)<br>[Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency](https://arxiv.org/abs/2504.18589) <br> Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli Zhao |<img width="1002" alt="image" src="https://arxiv.org/html/2504.18589v1/x1.png"> |[Github](https://github.com/alibaba-damo-academy/VCBench) <br> [Paper](https://arxiv.org/abs/2504.18589)| [//]: #04/29
      • ![Star - liyu/CipherBank)<br>[CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges](https://arxiv.org/abs/2504.19093) <br> Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19093v1/x2.png"> |[Github](https://github.com/Goodman-liyu/CipherBank) <br> [Paper](https://arxiv.org/abs/2504.19093)| [//]: #04/29
      • ![Star - Benchmark/VisuLogic-Eval)<br>[VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models](https://arxiv.org/abs/2504.15279) <br> Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15279v1/x1.png"> |[Github](https://github.com/VisuLogic-Benchmark/VisuLogic-Eval) <br> [Paper](https://arxiv.org/abs/2504.15279)| [//]: #04/25
      • ![Star - Grained Visual Reasoning from Transit Maps](https://arxiv.org/abs/2505.18675) <br> Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2505.18675v2/x1.png"> |[Github](https://github.com/fscdc/ReasonMap) <br> [Paper](https://arxiv.org/abs/2505.18675)| [//]: #06/11
      • ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
      • ![Star - Language Models](https://arxiv.org/abs/2505.13444) <br> Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett |<img width="1002" alt="image" src="figures/chartmuseum.png"> |[Github](https://github.com/Liyan06/ChartMuseum) <br> [Paper](https://arxiv.org/abs/2505.13444)| [//]: #05/20
      • Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations - v2.png"> |[Paper](https://arxiv.org/abs/2505.10937)| [//]: #05/19
      • StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
      • ![Star
    • Let Decoding More Efficient

      • ![Publish - Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314) <br> Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar |<img width="1002" alt="image" src="figures/tts_effective.png"> |[Paper](https://arxiv.org/abs/2408.03314)| [//]: #04/08
      • Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods - Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou |<img width="1002" alt="image" src="https://arxiv.org/html/2504.14047v1/x1.png"> |[Paper](https://arxiv.org/abs/2504.14047)| [//]: #04/23
      • ![Star - Shanghai/xVerify)<br>[xVerify: Efficient Answer Verifier for Reasoning Model Evaluations](https://arxiv.org/abs/2504.10481) <br> Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li |<img width="1002" alt="image" src="https://arxiv.org/html/2504.10481v1/x1.png"> |[Github](https://github.com/IAAR-Shanghai/xVerify) <br> [Paper](https://arxiv.org/abs/2504.10481)| [//]: #04/17
      • ![Star - Consistency for Efficient Reasoning and Coding with LLMs](https://arxiv.org/abs/2305.11860) <br> Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam |<img width="1002" alt="image" src="figures/asc.png"> |[Github](https://github.com/Pranjal2041/AdaptiveConsistency) <br> [Paper](https://arxiv.org/abs/2305.11860)| [//]: #04/08
      • ![Star - ICLR_2024-blue)]()<br>[Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning](https://arxiv.org/abs/2401.10480) <br> Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2401.10480v1/x1.png"> |[Github](https://github.com/Yiwei98/ESC) <br> [Paper](https://arxiv.org/abs/2401.10480)| [//]: #04/08
      • ![Star - NAACL_Findings_2025-blue)]()<br>[Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning](https://arxiv.org/abs/2408.13457) <br> Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li |<img width="1002" alt="image" src="https://arxiv.org/html/2408.13457v3/x3.png"> |[Github](https://github.com/WangXinglin/DSC) <br> [Paper](https://arxiv.org/abs/2408.13457)| [//]: #04/08
      • Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
      • Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Zhe Guo, Xiaoxing Ma, Yu-Feng Li |<img width="1002" alt="image" src="https://arxiv.org/html/2502.00511v2/x3.png"> |[Paper](https://arxiv.org/abs/2502.00511)| [//]: #04/08
      • Confidence Improves Self-Consistency in LLMs
      • ![Star - Huang/Self-Calibration)<br>[Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031) <br> Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang |<img width="1002" alt="image" src="https://arxiv.org/html/2503.00031v1/x2.png"> |[Github](https://github.com/Chengsong-Huang/Self-Calibration) <br> [Paper](https://arxiv.org/abs/2503.00031)| [//]: #04/08
      • ![Star - Labs/SpeculativeRejection) [![Publish](https://img.shields.io/badge/Conference-NeurIPS_2024-blue)]()<br>[Fast Best-of-N Decoding via Speculative Rejection](https://arxiv.org/abs/2410.20290) <br> Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette |<img width="1002" alt="image" src="https://arxiv.org/html/2410.20290v2/x1.png"> |[Github](https://github.com/Zanette-Labs/SpeculativeRejection) <br> [Paper](https://arxiv.org/abs/2410.20290)| [//]: #04/08
      • Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
      • FastMCTS: A Simple Sampling Strategy for Data Synthesis
      • ![Star - github-00/LLM-Predictive-Decoding) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Non-myopic Generation of Language Models for Reasoning and Planning](https://arxiv.org/abs/2410.17195) <br> Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong |<img width="1002" alt="image" src="figures/predictive_decoding.png"> |[Github](https://github.com/chang-github-00/LLM-Predictive-Decoding) <br> [Paper](https://arxiv.org/abs/2410.17195)| [//]: #04/08
      • ![Star - taught-lookahead)<br>[Language Models can Self-Improve at State-Value Estimation for Better Search](https://arxiv.org/abs/2503.02878) <br> Ethan Mendes, Alan Ritter |<img width="1002" alt="image" src="https://arxiv.org/html/2503.02878v1/x1.png"> |[Github](https://github.com/ethanm88/self-taught-lookahead) <br> [Paper](https://arxiv.org/abs/2503.02878)| [//]: #04/08
      • ![Star - Decoding)<br>[ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation](https://arxiv.org/abs/2503.13288) <br> Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Jun Liu, Qika Lin, Zhiyong Wu |<img width="1002" alt="image" src="https://arxiv.org/html/2503.13288v1/x2.png"> |[Github](https://github.com/xufangzhi/phi-Decoding) <br> [Paper](https://arxiv.org/abs/2503.13288)| [//]: #04/08
      • Dynamic Parallel Tree Search for Efficient LLM Reasoning
      • ![Star
      • ![Star - Reasoning/APR)<br>[Learning Adaptive Parallel Reasoning with Language Models](https://arxiv.org/abs/2504.15466) <br> Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr |<img width="1002" alt="image" src="https://arxiv.org/html/2504.15466v1/x2.png"> |[Github](https://github.com/Parallel-Reasoning/APR) <br> [Paper](https://arxiv.org/abs/2504.15466)| [//]: #04/23
      • ![Star - research/sot) [![Publish](https://img.shields.io/badge/Conference-ICLR_2024-blue)]()<br>[Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation](https://arxiv.org/abs/2307.15337) <br> Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang |<img width="1002" alt="image" src="figures/skeleton_ot.png"> |[Github](https://github.com/imagination-research/sot) <br> [Paper](https://arxiv.org/abs/2307.15337)| [//]: #04/08
      • Adaptive Skeleton Graph Decoding
      • ![Star - Guided Speculative Decoding for Efficient LLM Reasoning](https://arxiv.org/abs/2501.19324) <br> Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong |<img width="1002" alt="image" src="figures/rsd.png"> |[Github](https://github.com/BaohaoLiao/RSD) <br> [Paper](https://arxiv.org/abs/2501.19324)| [//]: #04/08
      • Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
      • ![Star - Time Scaling](https://arxiv.org/abs/2502.12018) <br> Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo |<img width="1002" alt="image" src="figures/aot.png"> |[Github](https://github.com/qixucen/atom) <br> [Paper](https://arxiv.org/abs/2502.12018)| [//]: #04/08
      • DISC: Dynamic Decomposition Improves LLM Inference Scaling
      • From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
      • ![Star - structured Reasoning of Multimodal Large Models?](https://arxiv.org/abs/2503.06252) <br> Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang |<img width="1002" alt="image" src="figures/atom.png"> |[Github](https://github.com/Quinn777/AtomThink) <br> [Paper](https://arxiv.org/abs/2503.06252)| [//]: #04/08
      • ![Star - wyz/inference_scaling) [![Publish](https://img.shields.io/badge/Conference-ICLR_2025-blue)]()<br>[Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models](https://arxiv.org/abs/2408.00724) <br> Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang |<img width="1002" alt="image" src="figures/scaling_law.png"> |[Github](https://github.com/thu-wyz/inference_scaling) <br> [Paper](https://arxiv.org/abs/2408.00724)| [//]: #04/08
      • ![Star - AIRe/MRT)<br>[Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning](https://arxiv.org/abs/2503.07572) <br> Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar |<img width="1002" alt="image" src="figures/mrt.png"> |[Github](https://github.com/CMU-AIRe/MRT) <br> [Paper](https://arxiv.org/abs/2503.07572)| [//]: #04/08
      • ![Star - Time Compute via Speculative Reasoning](https://arxiv.org/abs/2504.07891) <br> Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali |<img width="1002" alt="image" src="figures/specreason.png"> |[Github](https://github.com/ruipeterpan/specreason) <br> [Paper](https://arxiv.org/abs/2504.07891)| [//]: #04/14
      • Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models
      • ![Star - of-Thought](https://arxiv.org/abs/2504.19095) <br> Jikai Wang, Juntao Li, Lijun Wu, Min Zhang |<img width="1002" alt="image" src="https://arxiv.org/html/2504.19095v1/extracted/6392438/images/scot.png"> |[Github](https://github.com/Jikai0Wang/Speculative_CoT) <br> [Paper](https://arxiv.org/abs/2504.19095)| [//]: #04/29
      • Dynamic Early Exit in Reasoning Models
      • Reward Reasoning Model
      • Control-R: Towards controllable test-time scaling
      • Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
      • First Finish Search: Efficient Test-Time Scaling in Large Language Models
      • LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling