awesome-deep-reasoning
Collect every awesome work about r1!
https://github.com/modelscope/awesome-deep-reasoning
Last synced: 1 day ago
JSON representation
-
Star History
-
Advanced Reasoning for Agent
- 
- 2025.03.04 - Visual Reinforcement Fine-Tuning
- 2025.03.01 - A lightweight data processing framework built on DuckDB and 3FS.
- 2025.02.28 - A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
- 2025.02.27 - DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.
- 2025.02.27 - The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.
- 2025.02.26 - Clean and efficient FP8 GEMM kernels with fine-grained scaling
- deep-research
- o3-mini & o3-mini-high
- the DeepSeek-R1 model
- Bailian
- VSCode co-pilot
-
RelatedRepos
-
Advanced Reasoning for Multi-Modal
- VL-Thinking - An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs
- R1-V - Multi-modal R1
- Open-R1-Multimodal - A multimodal reasoning model based on OpenR1
- R1-Multimodal-Journey - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal
- VLM-R1 - R1-Referral-Expression) - A stable and generalizable R1-style Large Vision-Language Model
- Video-R1 - Towards Super Reasoning Ability in Video Understanding MLLMs
- Visual-RFT - Visual Reinforcement Fine-Tuning
- R1-Omni - Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
- R1-OneVision - A visual language model capable of deep CoT reasoning
-
Replicates of DeepSeek-R1 and DeepSeek-R1-Zero
- HuggingFace Open R1
- Simple Reinforcement Learning for Reasoning
- oatllm
- TinyZero
- 32B-DeepSeek-R1-Zero
- X-R1
- Open-Reasoner-Zero
- Logic-RL - Reproduce R1 Zero on Logic Puzzle
-
Advanced Reasoning for Coding
- SWE-RL - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
-
Advanced Reasoning for Agent
-
-
Papers
-
2025.01
- LlamaV-o1 - Rethinking Step-by-step Visual Reasoning in LLMs
- rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
- LLMS CAN PLAN ONLY IF WE TELL THEM - A new CoT method: AoT+
- SFT Memorizes, RL Generalizes - A research from DeepMind shows the effect of SFT and RL.
- DeepSeek-R1-Tech-Report
- Qwen-math-PRM-Tech-Report(MCTS/PRM)
- Qwen2.5 Tech-Report
- Kimi K1.5 Tech-Report
-
2025.04
- ReSearch - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
- Search-R1 - Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- R1-Searcher - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
-
2025.03
- LLaVE - LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
- VisualPRM - VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
- DAPO - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
- What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret
- OThink-MR1 - Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
- Embodied Reasoner - Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
-
2025.02
- Visual Perception Token - Enhancing visual reasoning by enabling the LLM to control its perception process.
- DeepSeek-V3 Tech-Report
- LIMO - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.
- Underthinking of Reasoning models - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Competitive Programming with Large Reasoning Models - OpenAI: Competitive Programming with Large Reasoning Models
- The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
- OverThink: Slowdown Attacks on Reasoning LLMs
- Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy - Sky-T1-32B-Flash, reasoning language model that significantly reduces overthinking
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - (DeepSeek) NSA: A natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling.
- MM-RLHF - MM-RLHF:The Next Step Forward in Multimodal LLM Alignment
-
2024
- Qwen QwQ Technical blog - QwQ: Reflect Deeply on the Boundaries of the Unknown
- OpenAI-o1 Announcement - Learning to Reason with Large Language Models
- DeepSeek Math Tech-Report(GRPO)
- Large Language Models for Mathematical Reasoning: Progresses and Challenges
- Large Language Models Cannot Self-Correct Reasoning Yet
- AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING?
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought - o1) ]
- MathScale - Scaling Instruction Tuning for Mathematical Reasoning
- Frontier AI systems have surpassed the self-replicating red line - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.
-
-
Highlights
-
DeepSeek repos:
- DeepSeek-R1 - ai/DeepSeek-R1?style=social) - DeepSeek-R1 official repository.
-
Qwen repos:
- Qwen-QwQ - Qwen 2.5 official repository, with QwQ.
- S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.
-
-
Blogs
-
Models
-
2024
- Model Link - ai/DeepSeek-R1) |
- Model Link - ai/DeepSeek-V3) |
- Model Link - ai/DeepSeek-R1-Distill-Qwen-32B) |
- Model Link - ai/DeepSeek-R1-Distill-Qwen-14B) |
- Model Link - ai/DeepSeek-R1-Distill-Llama-8B) |
- Model Link - ai/DeepSeek-R1-Distill-Qwen-7B) |
- Model Link - ai/DeepSeek-R1-Distill-Qwen-1.5B) |
- Model Link - R1-GGUF) |
- Model Link - R1-Distill-Qwen-32B-GGUF) |
- Model Link - R1-Distill-Llama-8B-GGUF) |
- Model Link - 32B-Preview) |
- Model Link - 72B-Preview) |
- Model Link - 32B-Preview-GGUF) |
- Model Link - 72B-Preview-bnb-4bit) |
- Model Link
-
-
Datasets
-
2024
- ModelScope - r1/OpenR1-Math-220k)
- ModelScope - r1/OpenR1-Math-Raw)
- MathR - A dataset distilled from DeepSeek-R1 for NuminaMath hard-level problems.
- HuggingFace - AI/R1-Distill-SFT))
- NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
- NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
- BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
- OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
- Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
- Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
- clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
- S1k - A dataset for training S1 model.
- ModelScope - DeepSeek-R1-Distill-data-110k)
- ModelScope
-
-
Evaluation
-
2024
- Best practice for evaluating R1/o1-like reasoning models
- MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
- AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
- ModelScope - American Invitational Mathematics Examination (AIME) 2025-I February 6th, 2025.
- AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
- MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
- MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
- aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
- GPQA-Diamond - Diamond subset from GPQA benchmark.
- Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.
-
Categories
Sub Categories
Keywords