https://github.com/Jiayi-Pan/TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero
https://github.com/Jiayi-Pan/TinyZero

Last synced: 25 days ago
JSON representation

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Host: GitHub
URL: https://github.com/Jiayi-Pan/TinyZero
Owner: Jiayi-Pan
License: apache-2.0
Created: 2025-01-21T16:49:12.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-02-01T04:58:23.000Z (9 months ago)
Last Synced: 2025-02-01T05:25:36.226Z (9 months ago)
Language: Python
Homepage:
Size: 2.28 MB
Stars: 5,166
Watchers: 71
Forks: 552
Open Issues: 21
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-RL-for-LRMs - Tiny-Zero
Awesome-RL-for-LRMs - Tiny-Zero
StarryDivineSky - Jiayi-Pan/TinyZero - Zero模型的简洁、最小化和可访问的复现项目。它旨在提供一个易于理解和使用的R1-Zero实现，方便研究者和开发者学习和探索。该项目专注于核心功能，去除冗余代码，力求清晰易懂。TinyZero可能包含模型结构定义、训练脚本、推理示例等。它强调可访问性，降低了运行和修改代码的门槛。该项目可能使用了PyTorch等深度学习框架。通过TinyZero，用户可以更轻松地理解R1-Zero的工作原理，并在此基础上进行二次开发或实验。它是一个轻量级的R1-Zero实现，适合快速原型验证和教学目的。项目目标是提供一个干净、易于理解的R1-Zero版本，促进相关技术的普及和发展。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
Awesome-LLM - TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero (Trending LLM Projects)
awesome-llm-and-aigc - TinyZero - Pan/TinyZero?style=social"/> : Clean, minimal, accessible reproduction of DeepSeek R1-Zero. TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl). (Summary)
awesome-deepseek - Jiayi-Pan/TinyZero - Zero, focused on accessibility and simplicity. (GitHub projects)
awesome-deepseek - Jiayi-Pan/TinyZero - Zero, focused on accessibility and simplicity. (GitHub projects)
awesome-llm-strawberry - Berkeley AI Research
awesome-rlvr - **TinyZero** - Pan/TinyZero?style=flat-square&logo=github)](https://github.com/Jiayi-Pan/TinyZero/stargazers) | Minimal reproduction of DeepSeek R1-Zero | (Codebases)
awesome-rlvr - **TinyZero** - line minimal reproduction of DeepSeek R1-Zero; 4 × RTX 4090 is enough for a 0.5 B LLM. | (Codebases)

README

          # TinyZero

![image](cover.png)

TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).

Through RL, the 3B base LM develops self-verification and search abilities all on its own 

You can experience the Ahah moment yourself for < $30 

Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655

Full experiment log: https://wandb.ai/jiayipan/TinyZero

> 📢: We release [Apative Parallel Reasoning](https://github.com/Parallel-Reasoning/APR), where we explore a new dimension in scaling reasoining models

## Installation

```

conda create -n zero python=3.9

# install torch [or you can skip this step and let vllm to install the correct version for you]

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121

# install vllm

pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1

pip3 install ray

# verl

pip install -e .

# flash attention 2

pip3 install flash-attn --no-build-isolation

# quality of life

pip install wandb IPython matplotlib

```

## Countdown task

**Data Preparation**

```

conda activate zero

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

```

### Run Training

```

conda activate zero

```

For the following code, if you see Out-of-vram, try add `critic.model.enable_gradient_checkpointing=True` to the script, and checkout the discussion [here](https://github.com/Jiayi-Pan/TinyZero/issues/5#issuecomment-2624161643)

**Single GPU**

Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.

```

export N_GPUS=1

export BASE_MODEL={path_to_your_model}

export DATA_DIR={path_to_your_dataset}

export ROLLOUT_TP_SIZE=1

export EXPERIMENT_NAME=countdown-qwen2.5-0.5b

export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

```

**3B+ model**

In this case, the base model is able to develop sophisticated reasoning skills.

```

export N_GPUS=2

export BASE_MODEL={path_to_your_model}

export DATA_DIR={path_to_your_dataset}

export ROLLOUT_TP_SIZE=2

export EXPERIMENT_NAME=countdown-qwen2.5-3b

export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

```

### Instruct Ablation

We experiment with QWen-2.5-3B Instruct too.

**Data Preparation**

To follow chat template, we need to reprocess the data:

```

conda activate zero

python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}

```

**Training**

```

export N_GPUS=2

export BASE_MODEL={path_to_your_model}

export DATA_DIR={path_to_your_dataset}

export ROLLOUT_TP_SIZE=2

export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct

export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

```

## Acknowledge

* We run our experiments based on [veRL](https://github.com/volcengine/verl).

* We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).

## Citation

```

@misc{tinyzero,

author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},

title        = {TinyZero},

howpublished = {https://github.com/Jiayi-Pan/TinyZero},

note         = {Accessed: 2025-01-24},

year         = {2025}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Jiayi-Pan/TinyZero

Awesome Lists containing this project

README