https://github.com/modelscope/awesome-deep-reasoning

Collect every awesome work about r1!
https://github.com/modelscope/awesome-deep-reasoning
List: awesome-deep-reasoning
collection deepseek grpo o1 qwen r1 reasoning rl
Last synced: 4 months ago
JSON representation
Collect every awesome work about r1!
Host: GitHub
URL: https://github.com/modelscope/awesome-deep-reasoning
Owner: modelscope
Created: 2025-01-30T04:05:33.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-05-02T03:10:49.000Z (5 months ago)
Last Synced: 2025-05-02T04:23:15.634Z (5 months ago)
Topics: collection, deepseek, grpo, o1, qwen, r1, reasoning, rl
Language: Python
Homepage:
Size: 76.2 KB
Stars: 353
Watchers: 6
Forks: 10
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-awesome-llm - modelscope/awesome-deep-reasoning - deep-reasoning.svg) | Deep reasoning models, datasets, and metrics | | ★★★★★ | (Topics / LLM Reasoning)
awesome-awesome-llm - modelscope/awesome-deep-reasoning - deep-reasoning.svg) | Deep reasoning models, datasets, and metrics | | ★★★★★ | (Topics / LLM Reasoning)
ultimate-awesome - awesome-deep-reasoning - Collect every awesome work about r1!. (Other Lists / TeX Lists)
README

          # Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection [ModelScope-r1-collection](https://www.modelscope.cn/collections/R1-gongzuoheji-3cfe79822e894a) | [HuggingFace-r1-collection](https://huggingface.co/spaces/modelscope/awesome-o1-r1)

## Table of Contents

- [News](#news)

- [Highlights](#highlights)

- [Papers](#papers)

- [Models](#models)

- [Infra](#infra)

- [Datasets](#datasets)

- [Evaluation](#evaluation)

- [RelatedRepos](#relatedrepos)

## News

- 🔥 **[2025.04.23]** Add section "Advanced Reasoning for Agent", including `Search-R1`, `Re-Search`, `R1-Searcher`, ...

- 🔥 **[2025.03.21]** Add [DAPO](https://github.com/BytedTsinghua-SIA/DAPO) - DAPO: An Open-Source LLM Reinforcement Learning System at Scale

- 🔥 **[2025.03.18]** Add [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V) - Pioneering Multimodal Reasoning with CoT

- 🔥 **[2025.03.17]** Add START: Self-taught Reasoner with Tools from Qwen Team - [START](http://arxiv.org/abs/2503.04625)

- 🔥 **[2025.03.12]** Add Multi-modal Reasoning datasets: [LLaVA-R1-100k](https://www.modelscope.cn/datasets/modelscope/LLaVA-R1-100k) and  [MMMU-Reasoning-R1-Distill-Validation](https://www.modelscope.cn/datasets/modelscope/MMMU-Reasoning-Distill-Validation)

- 🔥 **[2025.03.04]** Add the [Visual-RFT](https://arxiv.org/abs/2503.01785) - Visual Reinforcement Fine-Tuning

- 🔥 **[2025.03.01]** DeepSeek has released the [smallpond](https://github.com/deepseek-ai/smallpond) - A lightweight data processing framework built on DuckDB and 3FS.

- 🔥 **[2025.02.28]** DeepSeek has released the [3FS](https://github.com/deepseek-ai/3FS) - A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

- 🔥 **[2025.02.27]** DeepSeek has released the [DualPipe](https://github.com/deepseek-ai/DualPipe) - DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.

- 🔥 **[2025.02.27]** DeepSeek has released the [ProfileData](https://github.com/deepseek-ai/profile-data) -The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.

- 🔥 **[2025.02.26]** DeepSeek has released the [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) - Clean and efficient FP8 GEMM kernels with fine-grained scaling

- OpenAI publishes a [deep-research](https://openai.com/index/introducing-deep-research/) capability.

- OpenAI has launched the latest o3 model: [o3-mini & o3-mini-high](https://openai.com/index/openai-o3-mini/), which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.

- NVIDIA-NIM has supported [the DeepSeek-R1 model](https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/).

- Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the [Bailian](https://bailian.console.aliyun.com/) platform.

- CodeGPT: [VSCode co-pilot](https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt) now supports R1.

## Highlights

### DeepSeek repos:

[DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) ![Stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1?style=social) - DeepSeek-R1 official repository.

### Qwen repos:

[Qwen-QwQ](https://github.com/QwenLM/Qwen2.5) ![Stars](https://img.shields.io/github/stars/QwenLM/Qwen2.5?style=social) - Qwen 2.5 official repository, with QwQ.

[S1 from stanford](https://github.com/simplescaling/s1) - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

## Papers

### 2025.04

* [ReSearch](https://arxiv.org/abs/2503.19470) - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

* [Search-R1](https://arxiv.org/abs/2503.09516) - Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

* [R1-Searcher](https://arxiv.org/pdf/2503.05592) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

### 2025.03

* [Visual-RFT](https://arxiv.org/abs/2503.01785) - Visual Reinforcement Fine-Tuning

* [LLaVE](https://arxiv.org/pdf/2503.04812) - LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

* [VisualPRM](https://arxiv.org/pdf/2503.10291) - VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

* [START](http://arxiv.org/abs/2503.04625) - START: Self-taught Reasoner with Tools

* [DAPO](https://arxiv.org/pdf/2503.14476) - DAPO: An Open-Source LLM Reinforcement Learning System at Scale

* [What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret](https://arxiv.org/pdf/2503.01491v1)

* [OThink-MR1](https://arxiv.org/abs/2503.16081) - Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning

* [Embodied Reasoner](https://arxiv.org/abs/2503.21696) - Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

### 2025.02

* [Visual Perception Token](https://arxiv.org/abs/2502.17425) - Enhancing visual reasoning by enabling the LLM to control its perception process.

* [DeepSeek-V3 Tech-Report](https://arxiv.org/pdf/2412.19437)

* [LIMO](https://arxiv.org/pdf/2502.03387) - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.

* [Underthinking of Reasoning models](https://arxiv.org/abs/2501.18585) - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

* [Competitive Programming with Large Reasoning Models](https://arxiv.org/html/2502.06807v1) - OpenAI: Competitive Programming with Large Reasoning Models

* [The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks](https://www.arxiv.org/pdf/2502.08235)

* [OverThink: Slowdown Attacks on Reasoning LLMs](https://arxiv.org/abs/2502.02542)

* [Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy](https://novasky-ai.github.io/posts/reduce-overthinking/) - Sky-T1-32B-Flash, reasoning language model that significantly reduces overthinking

* [Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention](https://arxiv.org/abs/2502.11089) - (DeepSeek) NSA: A natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling.

* [MM-RLHF](https://arxiv.org/pdf/2502.10391) - MM-RLHF:The Next Step Forward in Multimodal LLM Alignment

### 2025.01

* [Imagine while Reasoning in Space:

Multimodal Visualization-of-Thought](https://arxiv.org/pdf/2501.07542) - Multimodal Visualization-of-Thought (MVoT)

* [DeepSeek-R1-Tech-Report](https://arxiv.org/pdf/2501.12948)

* [Qwen-math-PRM-Tech-Report(MCTS/PRM)](https://arxiv.org/pdf/2501.07301)

* [Qwen2.5 Tech-Report](https://arxiv.org/pdf/2412.15115)

* [Kimi K1.5 Tech-Report](https://arxiv.org/pdf/2501.12599)

* [Qwen-Math-PRM](https://arxiv.org/pdf/2501.07301) - The Lessons of Developing Process Reward Models in Mathematical Reasoning

* [LlamaV-o1](https://arxiv.org/abs/2501.06186) - Rethinking Step-by-step Visual Reasoning in LLMs

* [rStar-Math](https://arxiv.org/abs/2501.04519) - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

* [LLMS CAN PLAN ONLY IF WE TELL THEM](https://arxiv.org/pdf/2501.13545) - A new CoT method: AoT+

* [SFT Memorizes, RL Generalizes](https://arxiv.org/pdf/2501.17161) - A research from DeepMind shows the effect of SFT and RL.

### 2024

* [Qwen QwQ Technical blog](https://qwenlm.github.io/blog/qwq-32b-preview/) - QwQ: Reflect Deeply on the Boundaries of the Unknown

* [OpenAI-o1 Announcement](https://openai.com/index/learning-to-reason-with-llms/) - Learning to Reason with Large Language Models

* [DeepSeek Math Tech-Report(GRPO)](https://arxiv.org/pdf/2402.03300)

* [Large Language Models for Mathematical Reasoning: Progresses and Challenges](https://arxiv.org/abs/2402.00157) (EACL 2024)

* [Large Language Models Cannot Self-Correct Reasoning Yet](https://arxiv.org/abs/2310.01798) (ICLR 2024)

* [AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING?](https://arxiv.org/pdf/2309.16298) (ICLR 2024)

* [DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought](https://arxiv.org/abs/2412.17498) [ [code](https://github.com/krystalan/DRT-o1) ]

* [MathScale](https://arxiv.org/abs/2403.02884) - Scaling Instruction Tuning for Mathematical Reasoning

* [Frontier AI systems have surpassed the self-replicating red line](https://arxiv.org/pdf/2412.12140) - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.

## Blogs

* [Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs](https://arxiv.org/abs/2412.21187)

* [1/30训练步骤复刻DeepSeek-R1-Zero](https://zhuanlan.zhihu.com/p/25616750613)

## Models

DeepSeek series:

| Model ID                      | ModelScope                                                                               | Hugging Face                                                                   |

|-------------------------------|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|

| DeepSeek R1                   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1)                   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1)                   |

| DeepSeek V3                   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3)                   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-V3)                   |

| DeepSeek-R1-Distill-Qwen-32B  | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)  | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)  |

| DeepSeek-R1-Distill-Qwen-14B  | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)  | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)  |

| DeepSeek-R1-Distill-Llama-8B   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   |

| DeepSeek-R1-Distill-Qwen-7B   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)   |

| DeepSeek-R1-Distill-Qwen-1.5B | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |

| DeepSeek-R1-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF) |

| DeepSeek-R1-Distill-Qwen-32B-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF) |

| DeepSeek-R1-Distill-Llama-8B-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF) |

Qwen series:

| Model ID    | ModelScope                                                             | Hugging Face                                                 |

|-------------|------------------------------------------------------------------------|--------------------------------------------------------------|

| QwQ-32B-Preview | [Model Link](https://www.modelscope.cn/models/Qwen/QwQ-32B-Preview) | [Model Link](https://huggingface.co/Qwen/QwQ-32B-Preview) |

| QVQ-72B-Preview | [Model Link](https://www.modelscope.cn/models/deepseek-ai/QVQ-72B-Preview) | [Model Link](https://huggingface.co/Qwen/QVQ-72B-Preview) |

| QwQ-32B-Preview-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/QwQ-32B-Preview-GGUF) | [Model Link](https://huggingface.co/unsloth/QwQ-32B-Preview-GGUF) |

| QVQ-72B-Preview-bnb-4bit | [Model Link](https://www.modelscope.cn/models/unsloth/QVQ-72B-Preview-bnb-4bit) | [Model Link](https://huggingface.co/unsloth/QVQ-72B-Preview-bnb-4bit) |

Others:

| Model ID    | ModelScope                                                             | Hugging Face                                                 |

|-------------|------------------------------------------------------------------------|--------------------------------------------------------------|

| Qwen2-VL-2B-GRPO-8k | - | [Model Link](https://huggingface.co/lmms-lab/Qwen2-VL-2B-GRPO-8k) |

## Infra

- Flash MLA [DeepSeek]: https://github.com/deepseek-ai/FlashMLA

  - FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.

- Open R1 by Hugging Face: https://github.com/huggingface/open-r1

  - This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1

- TinyZero: https://github.com/Jiayi-Pan/TinyZero

  - Clean, minimal, accessible reproduction of DeepSeek R1-Zero

- SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason

  - Use OpenRLHF to reproduce DeepSeek-R1

- Ragen: https://github.com/ZihanWang314/RAGEN

  - A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1

- TRL: https://github.com/huggingface/trl

  - Hugging Face official training framework which supports open-source GRPO and other RL algorithms.

- OpenRLHF: https://github.com/OpenRLHF/OpenRLHF

  - An RL repo which supports RLs(supports REINFORCE++)

- Logic-RL: https://github.com/Unakar/Logic-RL

- Align-Anything: https://github.com/PKU-Alignment/align-anything

  - Training All-modality Model with Feedback

- R-Chain: A lightweight toolkit for distilling reasoning models

  - https://github.com/modelscope/r-chain

- Math Verify: A robust mathematical expression evaluation system designed for assessing Large Language Model outputs in mathematical tasks. 

  - https://github.com/huggingface/Math-Verify

- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

  - https://github.com/hiyouga/EasyR1

- [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) - [DeepSeek] Clean and efficient FP8 GEMM kernels with fine-grained scaling

- [DualPipe](https://github.com/deepseek-ai/DualPipe) - [DeepSeek] DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.

- [ProfileData](https://github.com/deepseek-ai/profile-data) - [DeepSeek] The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.

- [3FS](https://github.com/deepseek-ai/3FS) - [DeepSeek] A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

- [smallpond](https://github.com/deepseek-ai/smallpond) - [DeepSeek] A lightweight data processing framework built on DuckDB and 3FS.

## Datasets

* OpenR1-Math-220k [ModelScope](https://modelscope.cn/datasets/open-r1/OpenR1-Math-220k) | [HuggingFace](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)

* OpenR1-Math-Raw [ModelScope](https://modelscope.cn/datasets/open-r1/OpenR1-Math-Raw) | [HuggingFace](https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw)

* [MathR](https://modelscope.cn/datasets/modelscope/MathR/summary) - A dataset distilled from DeepSeek-R1 for NuminaMath hard-level problems.

* Dolphin-R1 ([HuggingFace](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | [ModelScope](https://modelscope.cn/datasets/AI-ModelScope/dolphin-r1)) - 800k samples dataset to train DeepSeek-R1 Distill models.

* R1-Distill-SFT ([HuggingFace](https://huggingface.co/datasets/ServiceNow-AI/R1-Distill-SFT) | [ModelScope](https://modelscope.cn/datasets/ServiceNow-AI/R1-Distill-SFT))

* [NuminaMath-TIR](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-TIR) - Tool-integrated reasoning (TIR) plays a crucial role in this competition.  

* [NuminaMath-CoT](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-CoT) - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.

* [BAAI-TACO](https://modelscope.cn/datasets/BAAI/TACO) - TACO is a benchmark for code generation with 26443 problems. 

* [OpenThoughts-114k](https://modelscope.cn/datasets/open-thoughts/OpenThoughts-114k) - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!

* [Bespoke-Stratos-17k](https://modelscope.cn/datasets/bespokelabs/Bespoke-Stratos-17k) - A reasoning dataset of questions, reasoning traces, and answers.

* [Clevr_CoGenT_TrainA_R1](https://huggingface.co/datasets/MMInstruction/Clevr_CoGenT_TrainA_R1) - A multi-modal dataset for training MM R1 model.

* [clevr_cogen_a_train](https://huggingface.co/datasets/leonardPKU/clevr_cogen_a_train) - A R1-distilled visual reasoning dataset.

* [S1k](https://huggingface.co/datasets/simplescaling/s1K) - A dataset for training S1 model.

* 中文基于满血DeepSeek-R1蒸馏数据集-110k [ModelScope](https://modelscope.cn/datasets/liucong/Chinese-DeepSeek-R1-Distill-data-110k) | [HuggingFace](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k)

* LLaVA多模态Reasoning数据集LLaVA-R1-100k [ModelScope](https://www.modelscope.cn/datasets/modelscope/LLaVA-R1-100k)

* MMMU-满血版R1蒸馏多模态Reasoning验证集 [ModelScope](https://www.modelscope.cn/datasets/modelscope/MMMU-Reasoning-Distill-Validation)

## Evaluation

* [Best practice for evaluating R1/o1-like reasoning models](https://evalscope.readthedocs.io/zh-cn/latest/best_practice/deepseek_r1_distill.html)

* [MATH-500](https://www.modelscope.cn/datasets/AI-ModelScope/MATH-500) - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper

* [AIME-2024](https://modelscope.cn/datasets/AI-ModelScope/AIME_2024) - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024. 

* AIME-2025: [ModelScope](https://modelscope.cn/datasets/TIGER-Lab/AIME25) | [HuggingFace](https://huggingface.co/datasets/opencompass/AIME2025) - American Invitational Mathematics Examination (AIME) 2025-I February 6th, 2025.

* [AIME-VALIDATION](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-aime) - All 90 problems come from AIME 22, AIME 23, and AIME 24

* [MATH-LEVEL-4](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-math-level-4) - A subset of level 4 problems from the MATH benchmark.

* [MATH-LEVEL-5](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-math-level-5) - A subset of level 5 problems from the MATH benchmark.

* [aimo-validation-amc](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-amc) - All 83 samples come from AMC12 2022, AMC12 2023

* [GPQA-Diamond](https://modelscope.cn/datasets/AI-ModelScope/gpqa_diamond/summary) - Diamond subset from GPQA benchmark.

* [Codeforces-Python-Submissions](https://modelscope.cn/datasets/AI-ModelScope/Codeforces-Python-Submissions) - A dataset of Python submissions from Codeforces.

## RelatedRepos

### Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

1. [HuggingFace Open R1](https://github.com/huggingface/open-r1)

2. [Simple Reinforcement Learning for Reasoning](https://github.com/hkust-nlp/simpleRL-reason)

3. [oatllm](https://oatllm.notion.site/oat-zero)

4. [TinyZero](https://github.com/Jiayi-Pan/TinyZero)

5. [32B-DeepSeek-R1-Zero](https://zhuanlan.zhihu.com/p/24078459991?utm_medium=social&utm_psn=1875270693987426305&utm_source=ZHShareTargetIDMore)

6. [X-R1](https://github.com/dhcode-cpp/X-R1)

7. [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero)

8. [Logic-RL](https://github.com/Unakar/Logic-RL) - Reproduce R1 Zero on Logic Puzzle

### Advanced Reasoning for Coding

1. [SWE-RL](https://github.com/facebookresearch/swe-rl) - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

### Advanced Reasoning for Multi-Modal

1. [R1-V](https://github.com/Deep-Agent/R1-V) - Multi-modal R1

2. [Open-R1-Multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal) - A multimodal reasoning model based on OpenR1

3. [R1-Multimodal-Journey](https://github.com/FanqingM/R1-Multimodal-Journey) - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal

4. [VLM-R1](https://github.com/om-ai-lab/VLM-R1) | [DEMO](https://huggingface.co/spaces/omlab/VLM-R1-Referral-Expression) - A stable and generalizable R1-style Large Vision-Language Model

5. [Video-R1](https://github.com/tulerfeng/Video-R1) - Towards Super Reasoning Ability in Video Understanding MLLMs

6. [VL-Thinking](https://github.com/UCSC-VLAA/VL-Thinking) - An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs

7. [Open-R1-Multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal) - A fork to add multimodal model training to open-r1

8. [Visual-RFT](https://github.com/Liuziyu77/Visual-RFT) - Visual Reinforcement Fine-Tuning

9. [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V)

10. [R1-Omni](https://github.com/HumanMLLM/R1-Omni) - Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning

11. [R1-OneVision](https://github.com/Fancy-MLLM/R1-Onevision) - A visual language model capable of deep CoT reasoning

### Advanced Reasoning for Agent

1. [Search-R1](https://github.com/PeterGriffinJin/Search-R1) - An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

2. [ReSearch](https://github.com/Agent-RL/ReSearch) - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

3. [R1-Searcher](https://arxiv.org/pdf/2503.05592) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

4. [UI-TARS](https://github.com/bytedance/UI-TARS) - Pioneering Automated GUI Interaction with Native Agents

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=modelscope/awesome-deep-reasoning&type=Date)](https://star-history.com/#modelscope/awesome-deep-reasoning&Date)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/modelscope/awesome-deep-reasoning

Awesome Lists containing this project

README