Awesome-RL-based-LLM-Reasoning

Awesome RL-based LLM Reasoning
https://github.com/bruno686/Awesome-RL-based-LLM-Reasoning

Last synced: 5 days ago
JSON representation

Papers
- Search algorithms (Monte Carlo Tree Search or Beam Search)
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2502
  - 2408
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
  - 2310
- Reinforcement learning
  - 2506
  - 2504
  - 2504
  - 2504
  - 2504
  - 2503
  - 2503
  - 2503
  - 2502 - based (VB) is better than verifier-free (VF))
  - 2502
  - 2502
  - 2502 - Play) (MIT)
  - 2502
  - 2409
- Outcome-based Reward Model
  - 2502
  - 2502 - scaling reward with repetition penalty for stable CoT length growth) (IN.AI)
  - 2501
- Process-based Reward Model
  - 2502
  - 2502
  - 2502 - Yizhou Sun)
  - 2312
  - 2305
  - 2211
  - 2504
- Surveys
  - 2503
  - 2503
  - 2503
  - 2503
  - 2502
  - 2407
  - 2504
  - 2503
- Question about LLM Reasoning Ability
  - 2504
  - 2504 - o1 and DeepSeek-R1 can suffer 60% performance loss on elementary school-level arithmetic and reasoning problems)
  - 2503
- Other Newest Interesting Papers about LLM Reasoning
  - 2504
  - 2504
  - 2503
  - 2504
  - 2503
  - 2503
  - 2503
  - 2503
  - 2502
  - 2502
  - 2502 - Yuandong Tian)
  - 2502
  - 2502
  - 2502 - Plank)
  - 2502
  - 2502
  - 2502 - play generate data) (LSE)
  - 2501
  - 2501
  - 2412
  - 2412
  - 2412
  - 2408
Slides and Discussion
- Surveys
  - Self-improvement of LLM agents through Reinforcement Learning at Scale
  - Understanding Reasoning LLMs Methods and Strategies for Building and Refining Reasoning Models
  - What is the difference between large reasoning model and LLM?
  - LLM Reasoning: Key Ideas and Limitations - DeepMind ([Video](https://www.google.com/search?q=llm+reasoning+key+ideas+and+limitations&oq=LLM+Reasoning+key+ideas&gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyBggBEEUYOTINCAIQABiGAxiABBiKBTINCAMQABiGAxiABBiKBTINCAQQABiGAxiABBiKBTINCAUQABiGAxiABBiKBdIBCDQ5NjRqMGoxqAIAsAIA&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:22a2556e,vid:-SZAciVbswk,st:0))
  - Towards Reasoning in Large Language Models - UIUC
  - Can LLMs Reason & Plan? - ASU
  - Inference-Time Techniques for LLM Reasoning - DeepMind
  - Chain-of-Thought Reasoning In Language Models - SJTU
  - Learning to Self-Improve & Reason with LLMs - Meta & NYU
  - 为什么在Deepseek-R1-ZERO出现前，无人尝试放弃微调对齐，通过强化学习生成思考链推理模型？
  - Kimi Flood Sung
  - Deepseek系列文章梳理
  - ChatGPT and The Art of Post-Training - 25/02/18
Video
- Surveys
Open-Source Project
- Surveys
  - TinyZero Stars - Pan/TinyZero) (4*4090 is enough for 0.5B LLM, but can't observe aha moment)
  - Open-r1 Stars - r1](https://github.com/huggingface/open-r1)
  - Logic-RL Stars - RL](https://github.com/Unakar/Logic-RL)
  - Unsloth-GRPO Stars - GRPO](https://colab.research.google.com/drive/11t4njE3c4Lxl-07OD8lJSMKkfyJml3Tn?usp=sharing) (simplest r1 implementation)
  - OpenR
  - DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
  - deepseek_r1_train
Introduction to Reinforcement Learning
- Surveys
X_PO
- Surveys
  - 1707
  - 1502
  - 1706
  - 2501
  - 2405
  - 2402
  - 2402
  - 2305
  - 2203
Cloud GPU
- Surveys
  - Compshare
Other Interesting RL-based Reasoning Repository
- Surveys

Categories

Papers 76 Slides and Discussion 13 Video 9 X_PO 9 Open-Source Project 7 Other Interesting RL-based Reasoning Repository 5 Introduction to Reinforcement Learning 3 Cloud GPU 1

Sub Categories

Surveys 55 Other Newest Interesting Papers about LLM Reasoning 23 Search algorithms (Monte Carlo Tree Search or Beam Search) 18 Reinforcement learning 14 Process-based Reward Model 7 Outcome-based Reward Model 3 Question about LLM Reasoning Ability 3

Keywords

zero-shot-learning 1 prompt-tuning 1 prompt-engineering 1 prompt 1 papers 1 llm-agent 1 llm 1 instruction-tuning 1 in-context-learning 1 few-shot-learning 1 demonstration 1 chatgpt 1 chain-of-thought 1 aigc 1