{"id":24901081,"url":"https://github.com/Jiayi-Pan/TinyZero","last_synced_at":"2025-10-16T12:32:22.134Z","repository":{"id":274066110,"uuid":"920152532","full_name":"Jiayi-Pan/TinyZero","owner":"Jiayi-Pan","description":"Clean, minimal, accessible reproduction of DeepSeek R1-Zero","archived":false,"fork":false,"pushed_at":"2025-02-01T04:58:23.000Z","size":2394,"stargazers_count":5166,"open_issues_count":21,"forks_count":552,"subscribers_count":71,"default_branch":"main","last_synced_at":"2025-02-01T05:25:36.226Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jiayi-Pan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-21T16:49:12.000Z","updated_at":"2025-02-01T05:23:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"92f13cd2-6ad1-4c8a-9e0a-1c9b9517e953","html_url":"https://github.com/Jiayi-Pan/TinyZero","commit_stats":null,"previous_names":["jiayi-pan/tinyzero"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiayi-Pan%2FTinyZero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiayi-Pan%2FTinyZero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiayi-Pan%2FTinyZero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiayi-Pan%2FTinyZero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jiayi-Pan","download_url":"https://codeload.github.com/Jiayi-Pan/TinyZero/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236719473,"owners_count":19194048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-01T21:14:59.815Z","updated_at":"2025-10-16T12:32:22.126Z","avatar_url":"https://github.com/Jiayi-Pan.png","language":"Python","funding_links":[],"categories":["Python","Projects","A01_文本生成_文本对话","Trending LLM Projects","Summary","GitHub projects","Open-source","Uncategorized","Codebases","Part 1: O1 Replication","RelatedRepos","7. Training \u0026 Fine-tuning Ecosystem","Agentic Systems"],"sub_categories":["Large Language Models","大语言对话模型及数据","Codebase","Uncategorized","Replicates of DeepSeek-R1 and DeepSeek-R1-Zero"],"readme":"# TinyZero\n\n![image](cover.png)\n\nTinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).\n\nThrough RL, the 3B base LM develops self-verification and search abilities all on its own \n\nYou can experience the Ahah moment yourself for \u003c $30 \n\nTwitter thread: https://x.com/jiayi_pirate/status/1882839370505621655\n\nFull experiment log: https://wandb.ai/jiayipan/TinyZero\n\n\u003e 📢: We release [Apative Parallel Reasoning](https://github.com/Parallel-Reasoning/APR), where we explore a new dimension in scaling reasoining models\n\n## Installation\n\n```\nconda create -n zero python=3.9\n# install torch [or you can skip this step and let vllm to install the correct version for you]\npip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121\n# install vllm\npip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1\npip3 install ray\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\n# quality of life\npip install wandb IPython matplotlib\n```\n\n## Countdown task\n\n**Data Preparation**\n```\nconda activate zero\npython ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}\n```\n\n### Run Training\n```\nconda activate zero\n```\n\nFor the following code, if you see Out-of-vram, try add `critic.model.enable_gradient_checkpointing=True` to the script, and checkout the discussion [here](https://github.com/Jiayi-Pan/TinyZero/issues/5#issuecomment-2624161643)\n\n**Single GPU**\n\n\nWorks for model \u003c= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.\n\n```\nexport N_GPUS=1\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=1\nexport EXPERIMENT_NAME=countdown-qwen2.5-0.5b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash ./scripts/train_tiny_zero.sh\n```\n\n**3B+ model**\nIn this case, the base model is able to develop sophisticated reasoning skills.\n```\nexport N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash ./scripts/train_tiny_zero.sh\n```\n\n### Instruct Ablation\nWe experiment with QWen-2.5-3B Instruct too.\n**Data Preparation**\nTo follow chat template, we need to reprocess the data:\n```\nconda activate zero\npython examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}\n```\n\n**Training**\n```\nexport N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash ./scripts/train_tiny_zero.sh\n```\n\n## Acknowledge\n* We run our experiments based on [veRL](https://github.com/volcengine/verl).\n* We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).\n\n## Citation\n```\n@misc{tinyzero,\nauthor       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},\ntitle        = {TinyZero},\nhowpublished = {https://github.com/Jiayi-Pan/TinyZero},\nnote         = {Accessed: 2025-01-24},\nyear         = {2025}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiayi-Pan%2FTinyZero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJiayi-Pan%2FTinyZero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiayi-Pan%2FTinyZero/lists"}