{"id":26638034,"url":"https://github.com/spellcraftai/open-r1","last_synced_at":"2026-02-10T13:33:45.805Z","repository":{"id":276086252,"uuid":"925512162","full_name":"SpellcraftAI/open-r1","owner":"SpellcraftAI","description":"Internal fork of HF's open-r1.","archived":false,"fork":false,"pushed_at":"2025-02-06T07:06:50.000Z","size":548,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T17:28:22.614Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SpellcraftAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-01T03:28:25.000Z","updated_at":"2025-02-06T05:53:07.000Z","dependencies_parsed_at":"2025-02-06T08:32:21.201Z","dependency_job_id":null,"html_url":"https://github.com/SpellcraftAI/open-r1","commit_stats":null,"previous_names":["spellcraftai/open-r1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SpellcraftAI/open-r1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpellcraftAI%2Fopen-r1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpellcraftAI%2Fopen-r1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpellcraftAI%2Fopen-r1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpellcraftAI%2Fopen-r1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SpellcraftAI","download_url":"https://codeload.github.com/SpellcraftAI/open-r1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpellcraftAI%2Fopen-r1/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267866403,"owners_count":24157345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-24T17:24:56.809Z","updated_at":"2026-02-10T13:33:45.762Z","avatar_url":"https://github.com/SpellcraftAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open R1\n\n*A fully open reproduction of DeepSeek-R1. This repo is a work in progress, let's build it together!*\n\n**Table of Contents**  \n1. [Overview](#overview)  \n2. [Plan of attack](#plan-of-attack)  \n3. [Installation](#installation)  \n4. [Training models](#training-models)  \n   - [SFT](#sft)  \n   - [GRPO](#grpo)  \n5. [Evaluating models](#evaluating-models)  \n6. [Reproducing Deepseek's evaluation results on MATH-500](#reproducing-deepseeks-evaluation-results-on-math-500)  \n7. [Data generation](#data-generation)  \n   - [Generate data from a smol distilled R1 model](#generate-data-from-a-smol-distilled-r1-model)  \n   - [Generate data from DeepSeek-R1](#generate-data-from-deepseek-r1)  \n8. [Contributing](#contributing)\n\n## Overview\n\nThe goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it. The project is simple by design and mostly consists of:\n\n\n- `src/open_r1`: contains the scripts to train and evaluate models as well as generate synthetic data:\n    - `grpo.py`: trains a model with GRPO on a given dataset.\n    - `sft.py`: performs a simple SFT of a model on a dataset.\n    - `evaluate.py`: evaluates a model on the R1 benchmarks.\n    - `generate.py`: generates synthetic data from a model using [Distilabel](https://github.com/argilla-io/distilabel).\n- `Makefile`: contains easy-to-run commands for each step in the R1 pipeline leveraging the scripts above.\n\n### Plan of attack\n\nWe will use the DeepSeek-R1 [tech report](https://github.com/deepseek-ai/DeepSeek-R1) as a guide, which can roughly be broken down into three main steps:\n\n* Step 1: replicate the R1-Distill models by distilling a high-quality corpus from DeepSeek-R1.\n* Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will likely involve curating new, large-scale datasets for math, reasoning, and code.\n* Step 3: show we can go from base model to RL-tuned via multi-stage training.\n\n\u003ccenter\u003e\n    \u003cimg src=\"assets/plan-of-attack.png\" width=\"500\"\u003e\n\u003c/center\u003e\n\n\n## Installation\n\n**Note: Libraries rely on CUDA 12.1. Double check your system if you get segmentation faults.**\n\nFirst, create a virtual environment, either with Conda or `uv`.\n\n### Using `conda`\n\nUse your existing venv or create a new one with:\n\n```shell\nconda create -n openr1 python=3.11\n```\n\n### Using `uv`\n\nTo install `uv`, follow the [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/).\n\n```shell\nuv venv openr1 --python 3.11 \u0026\u0026 source openr1/bin/activate \u0026\u0026 uv pip install --upgrade pip\n```\n\n### Dependencies\n\nFrom here, the easiest way to get started is to run:\n\n```shell\n./setup.sh\n```\n\nOR, do it manually:\n\u003cdetails\u003e\n\u003csummary\u003eBash\u003c/summary\u003e\n\n```shell\n# Install pinned vllm\npip install \"vllm\u003e=0.7.1\" --extra-index-url https://download.pytorch.org/whl/cu121\n\n# nvJitLink\nexport LD_LIBRARY_PATH=$(python -c \"import site; print(site.getsitepackages()[0] + '/nvidia/nvjitlink/lib')\"):$LD_LIBRARY_PATH\n\n# Install dependencies\npip install -e \".[dev]\"\n```\n\u003c/details\u003e\n\nThis will also install PyTorch `v2.5.1` and it is **very important** to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via `pip install -e .[LIST OF MODES]`. For most contributors, we recommend:\n\n```shell\npip install -e \".[dev]\"\n```\n\nNext, log into your Hugging Face and Weights and Biases accounts as follows:\n\n```shell\nhuggingface-cli login\nwandb login\n```\n\nFinally, check whether your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:\n\n```shell\ngit-lfs --version\n```\n\nIf it isn't installed, run:\n\n```shell\nsudo apt-get install git-lfs\n```\n\n## Training models\n\nWe support training models with either DDP or DeepSpeed (ZeRO-2 and ZeRO-3). To switch between methods, simply change the path to the `accelerate` YAML config in `configs`.\n\n\u003e [!NOTE]\n\u003e The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.\n\n### SFT\n\nTo run SFT on a dataset distilled from DeepSeek-R1 with reasoning traces such as [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), run:\n\n```shell\nACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py --config recipes/qwen/Qwen2.5-1.5B-Instruct/sft/config_full.yaml\n```\n\nTo launch a Slurm job, run:\n\n```shell\nsbatch --output=/path/to/logs/%x-%j.out --err=/path/to/logs/%x-%j.err slurm/sft.slurm {model} {dataset} {accelerator}\n```\n\nHere `{model}` and `{dataset}` refer to the model and dataset IDs on the Hugging Face Hub, while `{accelerator}` refers to the choice of an 🤗 Accelerate config file in configs. \n\n### GRPO\n\nTo train via the GRPO trainer, we use one GPU to run vLLM for faster generation and the remaining GPUs for training. For example, one a node with 8 GPUs, use the `recipes/accelerate_configs/zero3.yaml` config and then overwrite `num_processes` to run on 7 devices:\n\n```shell\nACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=7 src/open_r1/grpo.py --config recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml\n```\n\nTo launch a Slurm job, run:\n\n```shell\nsbatch --output=/path/to/logs/%x-%j.out --err=/path/to/logs/%x-%j.err slurm/grpo.slurm {model} {dataset} {accelerator}\n```\n\nYou can find more model configurations in the [recipes](./recipes).\n\n## Evaluating models\n\nWe use `lighteval` to evaluate models, with custom tasks defined in `src/open_r1/evaluate.py`. For models which fit on a single GPU, run:\n\n```shell\nMODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\nMODEL_ARGS=\"pretrained=$MODEL,dtype=float16,max_model_length=32768,gpu_memory_utilisation=0.8\"\nTASK=aime24\nOUTPUT_DIR=data/evals/$MODEL\n\nlighteval vllm $MODEL_ARGS \"custom|$TASK|0|0\" \\\n    --custom-tasks src/open_r1/evaluate.py \\\n    --use-chat-template \\\n    --system-prompt=\"Please reason step by step, and put your final answer within \\boxed{}.\" \\\n    --output-dir $OUTPUT_DIR \n```\n\nTo increase throughput across multiple GPUs, use _data parallel_ as follows:\n\n```shell\nNUM_GPUS=8\nMODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\nMODEL_ARGS=\"pretrained=$MODEL,dtype=float16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8\"\nTASK=aime24\nOUTPUT_DIR=data/evals/$MODEL\n\nlighteval vllm $MODEL_ARGS \"custom|$TASK|0|0\" \\\n    --custom-tasks src/open_r1/evaluate.py \\\n    --use-chat-template \\\n    --system-prompt=\"Please reason step by step, and put your final answer within \\boxed{}.\" \\\n    --output-dir $OUTPUT_DIR \n```\n\nFor large models which require sharding across GPUs, use _tensor parallel_ and run:\n\n```shell\nNUM_GPUS=8\nMODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B\nMODEL_ARGS=\"pretrained=$MODEL,dtype=float16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8\"\nTASK=aime24\nOUTPUT_DIR=data/evals/$MODEL\n\nexport VLLM_WORKER_MULTIPROC_METHOD=spawn\nlighteval vllm $MODEL_ARGS \"custom|$TASK|0|0\" \\\n    --custom-tasks src/open_r1/evaluate.py \\\n    --use-chat-template \\\n    --system-prompt=\"Please reason step by step, and put your final answer within \\boxed{}.\" \\\n    --output-dir $OUTPUT_DIR \n```\n\nYou can also launch an evaluation with `make evaluate`, specifying the model, task, and optionally the parallelism technique and number of GPUs.\n\nTo evaluate on a single GPU:\n```shell\nmake evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B TASK=aime24\n```\n\nTo use Data Parallelism:\n```shell\nmake evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B TASK=aime24 PARALLEL=data NUM_GPUS=8\n```\n\nTo use Tensor Parallelism:\n```shell\nmake evaluate MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B TASK=aime24 PARALLEL=tensor NUM_GPUS=8\n```\n## Reproducing Deepseek's evaluation results on MATH-500\nWe are able to reproduce Deepseek's reported results on the MATH-500 Benchmark:\n| Model                      | MATH-500 (HF lighteval) | MATH-500 (DeepSeek Reported) |\n| :-------------------------- | :-------: | :----------------------------: |\n| DeepSeek-R1-Distill-Qwen-1.5B  |  81.6   |              83.9              |\n| DeepSeek-R1-Distill-Qwen-7B    |  91.8   |              92.8              |\n| DeepSeek-R1-Distill-Qwen-14B   |  94.2   |              93.9              |\n| DeepSeek-R1-Distill-Qwen-32B   |  95.0   |              94.3              |\n| DeepSeek-R1-Distill-Llama-8B   |  85.8   |              89.1              |\n| DeepSeek-R1-Distill-Llama-70B  |  93.4   |              94.5              |\n\n\n\nTo reproduce these results use the following command:\n```shell\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B math_500\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-7B math_500\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-14B math_500\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Qwen-32B math_500 tp\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Llama-8B math_500\nsbatch slurm/evaluate.slurm deepseek-ai/DeepSeek-R1-Distill-Llama-70B math_500 tp\n```\n\n\n\n## Data generation\n\n### Generate data from a smol distilled R1 model\n\nThe following example can be run in 1xH100. \nFirst install the following dependencies:\n\n```shell\nuv pip install \"distilabel[vllm]\u003e=1.5.2\"\n```\n\nNow save the following snippet into a file named `pipeline.py` and run it with `python pipeline.py`. It will generate 4 outputs for each of the 10 examples (change the username for the repository to your org/user name):\n\n```python\nfrom datasets import load_dataset\nfrom distilabel.models import vLLM\nfrom distilabel.pipeline import Pipeline\nfrom distilabel.steps.tasks import TextGeneration\n\n\nprompt_template = \"\"\"\\\nYou will be given a problem. Please reason step by step, and put your final answer within \\boxed{}:\n{{ instruction }}\"\"\"\n\ndataset = load_dataset(\"AI-MO/NuminaMath-TIR\", split=\"train\").select(range(10))\n\nmodel_id = \"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\"  # Exchange with another smol distilled r1\n\nwith Pipeline(\n    name=\"distill-qwen-7b-r1\",\n    description=\"A pipeline to generate data from a distilled r1 model\",\n) as pipeline:\n\n    llm = vLLM(\n        model=model_id,\n        tokenizer=model_id,\n        extra_kwargs={\n            \"tensor_parallel_size\": 1,\n            \"max_model_len\": 8192,\n        },\n        generation_kwargs={\n            \"temperature\": 0.6,\n            \"max_new_tokens\": 8192,\n        },\n    )\n    prompt_column = \"problem\"\n    text_generation = TextGeneration(\n        llm=llm, \n        template=prompt_template,\n        num_generations=4,\n        input_mappings={\"instruction\": prompt_column} if prompt_column is not None else {}\n    )\n\n\nif __name__ == \"__main__\":\n    distiset = pipeline.run(dataset=dataset)\n    distiset.push_to_hub(repo_id=\"username/numina-deepseek-r1-qwen-7b\")\n```\n\nTake a look at the sample dataset at [HuggingFaceH4/numina-deepseek-r1-qwen-7b](https://huggingface.co/datasets/HuggingFaceH4/numina-deepseek-r1-qwen-7b).\n\n\n### Generate data from DeepSeek-R1\n\nTo run the bigger DeepSeek-R1, we used 2 nodes, each with 8×H100 GPUs using the slurm file present in this repo at `slurm/generate.slurm`. First, install the dependencies:\n\n(for now we need to install the vllm dev wheel that [fixes the R1 cuda graph capture](https://github.com/vllm-project/vllm/commits/221d388cc5a836fa189305785ed7e887cea8b510/csrc/moe/moe_align_sum_kernels.cu))\n```shell\npip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu121\n\nuv pip install \"distilabel[vllm,ray,openai]\u003e=1.5.2\"\n```\n\nAnd then run the following command:\n\n```shell\nsbatch slurm/generate.slurm \\\n    --hf-dataset AI-MO/NuminaMath-TIR \\\n    --temperature 0.6 \\\n    --prompt-column problem \\\n    --model deepseek-ai/DeepSeek-R1 \\\n    --hf-output-dataset username/r1-dataset\n```\n\n\u003e [!NOTE]  \n\u003e While the job is running, you can setup an SSH tunnel through the cluster login node to access the Ray dashboard from your computer running `ssh -L 8265:ray_ip_head_node:8265 \u003clogin_node\u003e`, then browsing `http://localhost:8265`\n\n## Contributing\n\nContributions are welcome. Please refer to https://github.com/huggingface/open-r1/issues/23.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspellcraftai%2Fopen-r1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspellcraftai%2Fopen-r1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspellcraftai%2Fopen-r1/lists"}