{"id":28702271,"url":"https://github.com/modelscope/awesome-deep-reasoning","last_synced_at":"2025-06-14T12:32:17.937Z","repository":{"id":274912628,"uuid":"924469257","full_name":"modelscope/awesome-deep-reasoning","owner":"modelscope","description":"Collect every awesome work about r1!","archived":false,"fork":false,"pushed_at":"2025-05-02T03:10:49.000Z","size":78,"stargazers_count":353,"open_issues_count":0,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-02T04:23:15.634Z","etag":null,"topics":["collection","deepseek","grpo","o1","qwen","r1","reasoning","rl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-30T04:05:33.000Z","updated_at":"2025-05-02T03:10:53.000Z","dependencies_parsed_at":"2025-02-07T16:23:49.792Z","dependency_job_id":"25efeb88-afa8-4acd-bf7d-f018aea2b224","html_url":"https://github.com/modelscope/awesome-deep-reasoning","commit_stats":null,"previous_names":["modelscope-lab/awesome-r1","modelscope-lab/awesome-reasoning","modelscope-lab/awesome-deep-reasoning","modelscope/awesome-deep-reasoning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/modelscope/awesome-deep-reasoning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fawesome-deep-reasoning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fawesome-deep-reasoning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fawesome-deep-reasoning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fawesome-deep-reasoning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/awesome-deep-reasoning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fawesome-deep-reasoning/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259816183,"owners_count":22915828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collection","deepseek","grpo","o1","qwen","r1","reasoning","rl"],"created_at":"2025-06-14T12:30:57.211Z","updated_at":"2025-06-14T12:32:17.924Z","avatar_url":"https://github.com/modelscope.png","language":"Python","readme":"# Awesome-deep-reasoning\nCollect the awesome works evolved around reasoning models like O1/R1! You can also find the collection [ModelScope-r1-collection](https://www.modelscope.cn/collections/R1-gongzuoheji-3cfe79822e894a) | [HuggingFace-r1-collection](https://huggingface.co/spaces/modelscope/awesome-o1-r1)\n\n\n## Table of Contents\n- [News](#news)\n- [Highlights](#highlights)\n- [Papers](#papers)\n- [Models](#models)\n- [Infra](#infra)\n- [Datasets](#datasets)\n- [Evaluation](#evaluation)\n- [RelatedRepos](#relatedrepos)\n\n\n\n## News\n- 🔥 **[2025.04.23]** Add section \"Advanced Reasoning for Agent\", including `Search-R1`, `Re-Search`, `R1-Searcher`, ...\n- 🔥 **[2025.03.21]** Add [DAPO](https://github.com/BytedTsinghua-SIA/DAPO) - DAPO: An Open-Source LLM Reinforcement Learning System at Scale\n- 🔥 **[2025.03.18]** Add [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V) - Pioneering Multimodal Reasoning with CoT\n- 🔥 **[2025.03.17]** Add START: Self-taught Reasoner with Tools from Qwen Team - [START](http://arxiv.org/abs/2503.04625)\n- 🔥 **[2025.03.12]** Add Multi-modal Reasoning datasets: [LLaVA-R1-100k](https://www.modelscope.cn/datasets/modelscope/LLaVA-R1-100k) and  [MMMU-Reasoning-R1-Distill-Validation](https://www.modelscope.cn/datasets/modelscope/MMMU-Reasoning-Distill-Validation)\n- 🔥 **[2025.03.04]** Add the [Visual-RFT](https://arxiv.org/abs/2503.01785) - Visual Reinforcement Fine-Tuning\n- 🔥 **[2025.03.01]** DeepSeek has released the [smallpond](https://github.com/deepseek-ai/smallpond) - A lightweight data processing framework built on DuckDB and 3FS.\n- 🔥 **[2025.02.28]** DeepSeek has released the [3FS](https://github.com/deepseek-ai/3FS) - A high-performance distributed file system designed to address the challenges of AI training and inference workloads.\n- 🔥 **[2025.02.27]** DeepSeek has released the [DualPipe](https://github.com/deepseek-ai/DualPipe) - DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.\n- 🔥 **[2025.02.27]** DeepSeek has released the [ProfileData](https://github.com/deepseek-ai/profile-data) -The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.\n- 🔥 **[2025.02.26]** DeepSeek has released the [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) - Clean and efficient FP8 GEMM kernels with fine-grained scaling\n- OpenAI publishes a [deep-research](https://openai.com/index/introducing-deep-research/) capability.\n- OpenAI has launched the latest o3 model: [o3-mini \u0026 o3-mini-high](https://openai.com/index/openai-o3-mini/), which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.\n- NVIDIA-NIM has supported [the DeepSeek-R1 model](https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/).\n- Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the [Bailian](https://bailian.console.aliyun.com/) platform.\n- CodeGPT: [VSCode co-pilot](https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt) now supports R1.\n\n## Highlights\n\n### DeepSeek repos:\n\n[DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) ![Stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1?style=social) - DeepSeek-R1 official repository.\n\n### Qwen repos:\n\n[Qwen-QwQ](https://github.com/QwenLM/Qwen2.5) ![Stars](https://img.shields.io/github/stars/QwenLM/Qwen2.5?style=social) - Qwen 2.5 official repository, with QwQ.\n\n[S1 from stanford](https://github.com/simplescaling/s1) - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.\n\n\n\n## Papers\n\n### 2025.04\n* [ReSearch](https://arxiv.org/abs/2503.19470) - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning\n* [Search-R1](https://arxiv.org/abs/2503.09516) - Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning\n* [R1-Searcher](https://arxiv.org/pdf/2503.05592) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning\n\n\n### 2025.03\n* [Visual-RFT](https://arxiv.org/abs/2503.01785) - Visual Reinforcement Fine-Tuning\n* [LLaVE](https://arxiv.org/pdf/2503.04812) - LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning\n* [VisualPRM](https://arxiv.org/pdf/2503.10291) - VisualPRM: An Effective Process Reward Model for Multimodal Reasoning\n* [START](http://arxiv.org/abs/2503.04625) - START: Self-taught Reasoner with Tools\n* [DAPO](https://arxiv.org/pdf/2503.14476) - DAPO: An Open-Source LLM Reinforcement Learning System at Scale\n* [What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret](https://arxiv.org/pdf/2503.01491v1)\n* [OThink-MR1](https://arxiv.org/abs/2503.16081) - Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning\n* [Embodied Reasoner](https://arxiv.org/abs/2503.21696) - Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks\n\n### 2025.02\n* [Visual Perception Token](https://arxiv.org/abs/2502.17425) - Enhancing visual reasoning by enabling the LLM to control its perception process.\n* [DeepSeek-V3 Tech-Report](https://arxiv.org/pdf/2412.19437)\n* [LIMO](https://arxiv.org/pdf/2502.03387) - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.\n* [Underthinking of Reasoning models](https://arxiv.org/abs/2501.18585) - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs\n* [Competitive Programming with Large Reasoning Models](https://arxiv.org/html/2502.06807v1) - OpenAI: Competitive Programming with Large Reasoning Models\n* [The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks](https://www.arxiv.org/pdf/2502.08235)\n* [OverThink: Slowdown Attacks on Reasoning LLMs](https://arxiv.org/abs/2502.02542)\n* [Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy](https://novasky-ai.github.io/posts/reduce-overthinking/) - Sky-T1-32B-Flash, reasoning language model that significantly reduces overthinking\n* [Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention](https://arxiv.org/abs/2502.11089) - (DeepSeek) NSA: A natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling.\n* [MM-RLHF](https://arxiv.org/pdf/2502.10391) - MM-RLHF:The Next Step Forward in Multimodal LLM Alignment\n\n\n### 2025.01\n* [Imagine while Reasoning in Space:\nMultimodal Visualization-of-Thought](https://arxiv.org/pdf/2501.07542) - Multimodal Visualization-of-Thought (MVoT)\n* [DeepSeek-R1-Tech-Report](https://arxiv.org/pdf/2501.12948)\n* [Qwen-math-PRM-Tech-Report(MCTS/PRM)](https://arxiv.org/pdf/2501.07301)\n* [Qwen2.5 Tech-Report](https://arxiv.org/pdf/2412.15115)\n* [Kimi K1.5 Tech-Report](https://arxiv.org/pdf/2501.12599)\n* [Qwen-Math-PRM](https://arxiv.org/pdf/2501.07301) - The Lessons of Developing Process Reward Models in Mathematical Reasoning\n* [LlamaV-o1](https://arxiv.org/abs/2501.06186) - Rethinking Step-by-step Visual Reasoning in LLMs\n* [rStar-Math](https://arxiv.org/abs/2501.04519) - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking\n* [LLMS CAN PLAN ONLY IF WE TELL THEM](https://arxiv.org/pdf/2501.13545) - A new CoT method: AoT+\n* [SFT Memorizes, RL Generalizes](https://arxiv.org/pdf/2501.17161) - A research from DeepMind shows the effect of SFT and RL.\n\n\n### 2024\n* [Qwen QwQ Technical blog](https://qwenlm.github.io/blog/qwq-32b-preview/) - QwQ: Reflect Deeply on the Boundaries of the Unknown\n* [OpenAI-o1 Announcement](https://openai.com/index/learning-to-reason-with-llms/) - Learning to Reason with Large Language Models\n* [DeepSeek Math Tech-Report(GRPO)](https://arxiv.org/pdf/2402.03300)\n* [Large Language Models for Mathematical Reasoning: Progresses and Challenges](https://arxiv.org/abs/2402.00157) (EACL 2024)\n* [Large Language Models Cannot Self-Correct Reasoning Yet](https://arxiv.org/abs/2310.01798) (ICLR 2024)\n* [AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING?](https://arxiv.org/pdf/2309.16298) (ICLR 2024)\n* [DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought](https://arxiv.org/abs/2412.17498) [ [code](https://github.com/krystalan/DRT-o1) ]\n* [MathScale](https://arxiv.org/abs/2403.02884) - Scaling Instruction Tuning for Mathematical Reasoning\n* [Frontier AI systems have surpassed the self-replicating red line](https://arxiv.org/pdf/2412.12140) - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.\n\n\n\n## Blogs\n* [Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs](https://arxiv.org/abs/2412.21187)\n* [1/30训练步骤复刻DeepSeek-R1-Zero](https://zhuanlan.zhihu.com/p/25616750613)\n\n\n\n## Models\n\nDeepSeek series:\n\n| Model ID                      | ModelScope                                                                               | Hugging Face                                                                   |\n|-------------------------------|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| DeepSeek R1                   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1)                   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1)                   |\n| DeepSeek V3                   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3)                   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-V3)                   |\n| DeepSeek-R1-Distill-Qwen-32B  | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)  | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)  |\n| DeepSeek-R1-Distill-Qwen-14B  | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)  | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)  |\n| DeepSeek-R1-Distill-Llama-8B   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   |\n| DeepSeek-R1-Distill-Qwen-7B   | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)   | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)   |\n| DeepSeek-R1-Distill-Qwen-1.5B | [Model Link](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | [Model Link](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |\n| DeepSeek-R1-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF) |\n| DeepSeek-R1-Distill-Qwen-32B-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF) |\n| DeepSeek-R1-Distill-Llama-8B-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF) | [Model Link](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF) |\n\nQwen series:\n\n| Model ID    | ModelScope                                                             | Hugging Face                                                 |\n|-------------|------------------------------------------------------------------------|--------------------------------------------------------------|\n| QwQ-32B-Preview | [Model Link](https://www.modelscope.cn/models/Qwen/QwQ-32B-Preview) | [Model Link](https://huggingface.co/Qwen/QwQ-32B-Preview) |\n| QVQ-72B-Preview | [Model Link](https://www.modelscope.cn/models/deepseek-ai/QVQ-72B-Preview) | [Model Link](https://huggingface.co/Qwen/QVQ-72B-Preview) |\n| QwQ-32B-Preview-GGUF | [Model Link](https://www.modelscope.cn/models/unsloth/QwQ-32B-Preview-GGUF) | [Model Link](https://huggingface.co/unsloth/QwQ-32B-Preview-GGUF) |\n| QVQ-72B-Preview-bnb-4bit | [Model Link](https://www.modelscope.cn/models/unsloth/QVQ-72B-Preview-bnb-4bit) | [Model Link](https://huggingface.co/unsloth/QVQ-72B-Preview-bnb-4bit) |\n\nOthers:\n| Model ID    | ModelScope                                                             | Hugging Face                                                 |\n|-------------|------------------------------------------------------------------------|--------------------------------------------------------------|\n| Qwen2-VL-2B-GRPO-8k | - | [Model Link](https://huggingface.co/lmms-lab/Qwen2-VL-2B-GRPO-8k) |\n\n## Infra\n\n- Flash MLA [DeepSeek]: https://github.com/deepseek-ai/FlashMLA\n  - FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.\n- Open R1 by Hugging Face: https://github.com/huggingface/open-r1\n  - This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1\n- TinyZero: https://github.com/Jiayi-Pan/TinyZero\n  - Clean, minimal, accessible reproduction of DeepSeek R1-Zero\n- SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason\n  - Use OpenRLHF to reproduce DeepSeek-R1\n- Ragen: https://github.com/ZihanWang314/RAGEN\n  - A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1\n- TRL: https://github.com/huggingface/trl\n  - Hugging Face official training framework which supports open-source GRPO and other RL algorithms.\n- OpenRLHF: https://github.com/OpenRLHF/OpenRLHF\n  - An RL repo which supports RLs(supports REINFORCE++)\n- Logic-RL: https://github.com/Unakar/Logic-RL\n- Align-Anything: https://github.com/PKU-Alignment/align-anything\n  - Training All-modality Model with Feedback\n- R-Chain: A lightweight toolkit for distilling reasoning models\n  - https://github.com/modelscope/r-chain\n- Math Verify: A robust mathematical expression evaluation system designed for assessing Large Language Model outputs in mathematical tasks. \n  - https://github.com/huggingface/Math-Verify\n- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL\n  - https://github.com/hiyouga/EasyR1\n- [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) - [DeepSeek] Clean and efficient FP8 GEMM kernels with fine-grained scaling\n- [DualPipe](https://github.com/deepseek-ai/DualPipe) - [DeepSeek] DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.\n- [ProfileData](https://github.com/deepseek-ai/profile-data) - [DeepSeek] The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.\n- [3FS](https://github.com/deepseek-ai/3FS) - [DeepSeek] A high-performance distributed file system designed to address the challenges of AI training and inference workloads.\n- [smallpond](https://github.com/deepseek-ai/smallpond) - [DeepSeek] A lightweight data processing framework built on DuckDB and 3FS.\n\n\n\n## Datasets\n\n* OpenR1-Math-220k [ModelScope](https://modelscope.cn/datasets/open-r1/OpenR1-Math-220k) | [HuggingFace](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)\n* OpenR1-Math-Raw [ModelScope](https://modelscope.cn/datasets/open-r1/OpenR1-Math-Raw) | [HuggingFace](https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw)\n* [MathR](https://modelscope.cn/datasets/modelscope/MathR/summary) - A dataset distilled from DeepSeek-R1 for NuminaMath hard-level problems.\n* Dolphin-R1 ([HuggingFace](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | [ModelScope](https://modelscope.cn/datasets/AI-ModelScope/dolphin-r1)) - 800k samples dataset to train DeepSeek-R1 Distill models.\n* R1-Distill-SFT ([HuggingFace](https://huggingface.co/datasets/ServiceNow-AI/R1-Distill-SFT) | [ModelScope](https://modelscope.cn/datasets/ServiceNow-AI/R1-Distill-SFT))\n* [NuminaMath-TIR](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-TIR) - Tool-integrated reasoning (TIR) plays a crucial role in this competition.  \n* [NuminaMath-CoT](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-CoT) - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.\n* [BAAI-TACO](https://modelscope.cn/datasets/BAAI/TACO) - TACO is a benchmark for code generation with 26443 problems. \n* [OpenThoughts-114k](https://modelscope.cn/datasets/open-thoughts/OpenThoughts-114k) - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!\n* [Bespoke-Stratos-17k](https://modelscope.cn/datasets/bespokelabs/Bespoke-Stratos-17k) - A reasoning dataset of questions, reasoning traces, and answers.\n* [Clevr_CoGenT_TrainA_R1](https://huggingface.co/datasets/MMInstruction/Clevr_CoGenT_TrainA_R1) - A multi-modal dataset for training MM R1 model.\n* [clevr_cogen_a_train](https://huggingface.co/datasets/leonardPKU/clevr_cogen_a_train) - A R1-distilled visual reasoning dataset.\n* [S1k](https://huggingface.co/datasets/simplescaling/s1K) - A dataset for training S1 model.\n* 中文基于满血DeepSeek-R1蒸馏数据集-110k [ModelScope](https://modelscope.cn/datasets/liucong/Chinese-DeepSeek-R1-Distill-data-110k) | [HuggingFace](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k)\n* LLaVA多模态Reasoning数据集LLaVA-R1-100k [ModelScope](https://www.modelscope.cn/datasets/modelscope/LLaVA-R1-100k)\n* MMMU-满血版R1蒸馏多模态Reasoning验证集 [ModelScope](https://www.modelscope.cn/datasets/modelscope/MMMU-Reasoning-Distill-Validation)\n\n\n\n## Evaluation\n\n* [Best practice for evaluating R1/o1-like reasoning models](https://evalscope.readthedocs.io/zh-cn/latest/best_practice/deepseek_r1_distill.html)\n* [MATH-500](https://www.modelscope.cn/datasets/AI-ModelScope/MATH-500) - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper\n* [AIME-2024](https://modelscope.cn/datasets/AI-ModelScope/AIME_2024) - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024. \n* AIME-2025: [ModelScope](https://modelscope.cn/datasets/TIGER-Lab/AIME25) | [HuggingFace](https://huggingface.co/datasets/opencompass/AIME2025) - American Invitational Mathematics Examination (AIME) 2025-I February 6th, 2025.\n* [AIME-VALIDATION](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-aime) - All 90 problems come from AIME 22, AIME 23, and AIME 24\n* [MATH-LEVEL-4](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-math-level-4) - A subset of level 4 problems from the MATH benchmark.\n* [MATH-LEVEL-5](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-math-level-5) - A subset of level 5 problems from the MATH benchmark.\n* [aimo-validation-amc](https://www.modelscope.cn/datasets/AI-MO/aimo-validation-amc) - All 83 samples come from AMC12 2022, AMC12 2023\n* [GPQA-Diamond](https://modelscope.cn/datasets/AI-ModelScope/gpqa_diamond/summary) - Diamond subset from GPQA benchmark.\n* [Codeforces-Python-Submissions](https://modelscope.cn/datasets/AI-ModelScope/Codeforces-Python-Submissions) - A dataset of Python submissions from Codeforces.\n\n\n\n## RelatedRepos\n\n### Replicates of DeepSeek-R1 and DeepSeek-R1-Zero\n\n1. [HuggingFace Open R1](https://github.com/huggingface/open-r1)\n2. [Simple Reinforcement Learning for Reasoning](https://github.com/hkust-nlp/simpleRL-reason)\n3. [oatllm](https://oatllm.notion.site/oat-zero)\n4. [TinyZero](https://github.com/Jiayi-Pan/TinyZero)\n5. [32B-DeepSeek-R1-Zero](https://zhuanlan.zhihu.com/p/24078459991?utm_medium=social\u0026utm_psn=1875270693987426305\u0026utm_source=ZHShareTargetIDMore)\n6. [X-R1](https://github.com/dhcode-cpp/X-R1)\n7. [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero)\n8. [Logic-RL](https://github.com/Unakar/Logic-RL) - Reproduce R1 Zero on Logic Puzzle\n\n\n\n### Advanced Reasoning for Coding\n\n1. [SWE-RL](https://github.com/facebookresearch/swe-rl) - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution\n\n\n### Advanced Reasoning for Multi-Modal\n\n1. [R1-V](https://github.com/Deep-Agent/R1-V) - Multi-modal R1\n2. [Open-R1-Multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal) - A multimodal reasoning model based on OpenR1\n3. [R1-Multimodal-Journey](https://github.com/FanqingM/R1-Multimodal-Journey) - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal\n4. [VLM-R1](https://github.com/om-ai-lab/VLM-R1) | [DEMO](https://huggingface.co/spaces/omlab/VLM-R1-Referral-Expression) - A stable and generalizable R1-style Large Vision-Language Model\n5. [Video-R1](https://github.com/tulerfeng/Video-R1) - Towards Super Reasoning Ability in Video Understanding MLLMs\n6. [VL-Thinking](https://github.com/UCSC-VLAA/VL-Thinking) - An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs\n7. [Open-R1-Multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal) - A fork to add multimodal model training to open-r1\n8. [Visual-RFT](https://github.com/Liuziyu77/Visual-RFT) - Visual Reinforcement Fine-Tuning\n9. [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V)\n10. [R1-Omni](https://github.com/HumanMLLM/R1-Omni) - Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning\n11. [R1-OneVision](https://github.com/Fancy-MLLM/R1-Onevision) - A visual language model capable of deep CoT reasoning\n\n\n### Advanced Reasoning for Agent\n1. [Search-R1](https://github.com/PeterGriffinJin/Search-R1) - An Efficient, Scalable RL Training Framework for Reasoning \u0026 Search Engine Calling interleaved LLM based on veRL\n2. [ReSearch](https://github.com/Agent-RL/ReSearch) - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning\n3. [R1-Searcher](https://arxiv.org/pdf/2503.05592) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning\n4. [UI-TARS](https://github.com/bytedance/UI-TARS) - Pioneering Automated GUI Interaction with Native Agents\n\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=modelscope/awesome-deep-reasoning\u0026type=Date)](https://star-history.com/#modelscope/awesome-deep-reasoning\u0026Date)\n\n\n","funding_links":[],"categories":["Topics","Other Lists","A01_文本生成_文本对话"],"sub_categories":["LLM Reasoning","TeX Lists","大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fawesome-deep-reasoning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2Fawesome-deep-reasoning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fawesome-deep-reasoning/lists"}