https://github.com/hkust-nlp/simpleRL-reason
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
https://github.com/hkust-nlp/simpleRL-reason
Last synced: 4 months ago
JSON representation
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
- Host: GitHub
- URL: https://github.com/hkust-nlp/simpleRL-reason
- Owner: hkust-nlp
- License: mit
- Created: 2025-01-25T07:16:58.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-25T15:14:17.000Z (4 months ago)
- Last Synced: 2025-01-25T15:29:53.739Z (4 months ago)
- Language: Python
- Homepage:
- Size: 11.4 MB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
Awesome Lists containing this project
- awesome-rl-reasoning-recipes - hkust-nlp/simpleRL-reason
- awesome-rl-reasoning-recipes - hkust-nlp/simpleRL-reason
- awesome-llm-strawberry - HKUST
- StarryDivineSky - hkust-nlp/simpleRL-reason - nlp/simpleRL-reason旨在复现DeepSeek-R1-Zero和DeepSeek-R1的训练过程,但专注于使用小型模型和有限的数据集。它主要研究强化学习在推理任务中的应用。项目特色在于探索了在资源受限条件下训练高性能推理模型的可行性。通过简化模型结构和优化训练策略,项目力求在小规模数据上达到与大型模型相媲美的推理能力。具体工作原理可能涉及模仿DeepSeek-R1的训练框架和目标函数,并进行针对性的调整以适应小型模型。该项目可能包含训练脚本、模型定义、数据集处理代码以及评估指标等。它为研究人员提供了一个低成本复现和改进DeepSeek-R1推理能力的平台。最终目标是推动强化学习在推理任务上的研究,特别是在资源有限的环境下。 (A01_文本生成_文本对话 / 大语言对话模型及数据)