Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
https://github.com/Jiayi-Pan/TinyZero
Last synced: 7 days ago
JSON representation
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
- Host: GitHub
- URL: https://github.com/Jiayi-Pan/TinyZero
- Owner: Jiayi-Pan
- License: apache-2.0
- Created: 2025-01-21T16:49:12.000Z (18 days ago)
- Default Branch: main
- Last Pushed: 2025-02-01T04:58:23.000Z (8 days ago)
- Last Synced: 2025-02-01T05:25:36.226Z (8 days ago)
- Language: Python
- Homepage:
- Size: 2.28 MB
- Stars: 5,166
- Watchers: 71
- Forks: 552
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LLM - TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero (Trending LLM Projects)
- awesome-llm-and-aigc - TinyZero - Pan/TinyZero?style=social"/> : Clean, accessible reproduction of DeepSeek R1-Zero. TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl). (Summary)
- awesome-deepseek - Jiayi-Pan/TinyZero - Zero, focused on accessibility and simplicity. (GitHub projects)
- awesome-deepseek - Jiayi-Pan/TinyZero - Zero, focused on accessibility and simplicity. (GitHub projects)
- awesome-llm-strawberry - Berkeley AI Research
README
# TinyZero
![image](cover.png)TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).
Through RL, the 3B base LM develops self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655
Full experiment log: https://wandb.ai/jiayipan/TinyZero
Paper's on it's way!
## Installation
```
conda create -n zero python=3.9
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray# verl
pip install -e .# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib
```## Countdown task
**Data Preparation**
```
conda activate zero
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
```### Run Training
```
conda activate zero
```For the following code, if you see Out-of-vram, try add `critic.model.enable_gradient_checkpointing=True` to the script, and checkout the discussion [here](https://github.com/Jiayi-Pan/TinyZero/issues/5#issuecomment-2624161643)
**Single GPU**
Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.
```
export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERSbash ./scripts/train_tiny_zero.sh
```**3B+ model**
In this case, the base model is able to develop sophisticated reasoning skills.
```
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERSbash ./scripts/train_tiny_zero.sh
```### Instruct Ablation
We experiment with QWen-2.5-3B Instruct too.
**Data Preparation**
To follow chat template, we need to reprocess the data:
```
conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
```**Training**
```
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERSbash ./scripts/train_tiny_zero.sh
```## Acknowledge
* We run our experiments based on [veRL](https://github.com/volcengine/verl).
* We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).## Citation
```
@misc{tinyzero,
author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
title = {TinyZero},
howpublished = {https://github.com/Jiayi-Pan/TinyZero},
note = {Accessed: 2025-01-24},
year = {2025}
}
```