Awesome-RL-based-LLM-Reasoning
Awesome RL-based LLM Reasoning
https://github.com/bruno686/Awesome-RL-based-LLM-Reasoning
Last synced: 5 days ago
JSON representation
-
Papers
-
Search algorithms (Monte Carlo Tree Search or Beam Search)
-
Reinforcement learning
-
Outcome-based Reward Model
-
Process-based Reward Model
-
Surveys
-
Question about LLM Reasoning Ability
-
Other Newest Interesting Papers about LLM Reasoning
-
-
Slides and Discussion
-
Surveys
- Self-improvement of LLM agents through Reinforcement Learning at Scale
- Understanding Reasoning LLMs Methods and Strategies for Building and Refining Reasoning Models
- What is the difference between large reasoning model and LLM?
- LLM Reasoning: Key Ideas and Limitations - DeepMind ([Video](https://www.google.com/search?q=llm+reasoning+key+ideas+and+limitations&oq=LLM+Reasoning+key+ideas&gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyBggBEEUYOTINCAIQABiGAxiABBiKBTINCAMQABiGAxiABBiKBTINCAQQABiGAxiABBiKBTINCAUQABiGAxiABBiKBdIBCDQ5NjRqMGoxqAIAsAIA&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:22a2556e,vid:-SZAciVbswk,st:0))
- Towards Reasoning in Large Language Models - UIUC
- Can LLMs Reason & Plan? - ASU
- Inference-Time Techniques for LLM Reasoning - DeepMind
- Chain-of-Thought Reasoning In Language Models - SJTU
- Learning to Self-Improve & Reason with LLMs - Meta & NYU
- 为什么在Deepseek-R1-ZERO出现前,无人尝试放弃微调对齐,通过强化学习生成思考链推理模型?
- Kimi Flood Sung
- Deepseek系列文章梳理
- ChatGPT and The Art of Post-Training - 25/02/18
-
-
Video
-
Surveys
- [LLM+RL
- [LLM+RL
- EZ撸paper: DeepSeek-R1 论文详解 part 2:AGI是什么? | Reinforcement Learning快速入门 | AlphaGo介绍
- LLM-Based Reasoning: Opportunities and Pitfalls (LAVA Workshop in ACCV 2024)
- Reinforcement Learning in DeepSeek r1 Visualized - 2-2.click&vd_source=228d782c60d8b392d7077abd8d7a1fee))
- EZ撸paper: DeepSeek-R1 论文详解 part 1:比肩 OpenAI-o1,如何做到的?
- [GRPO Explained
- EZ撸paper: DeepSeek-R1 论文详解 part 3:GPT发展史 | scaling law | 训练范式 | emergent ability
- DeepSeek R1 Explained to your grandma
-
-
Open-Source Project
-
Surveys
- TinyZero Stars - Pan/TinyZero) (4*4090 is enough for 0.5B LLM, but can't observe aha moment)
- Open-r1 Stars - r1](https://github.com/huggingface/open-r1)
- Logic-RL Stars - RL](https://github.com/Unakar/Logic-RL)
- Unsloth-GRPO Stars - GRPO](https://colab.research.google.com/drive/11t4njE3c4Lxl-07OD8lJSMKkfyJml3Tn?usp=sharing) (simplest r1 implementation)
- OpenR
- DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
- deepseek_r1_train
-
-
Introduction to Reinforcement Learning
-
X_PO
-
Cloud GPU
-
Surveys
-
-
Other Interesting RL-based Reasoning Repository
Categories
Sub Categories