{"id":28905691,"url":"https://github.com/vainf/reasoning-sft","last_synced_at":"2025-10-28T22:16:16.087Z","repository":{"id":300096049,"uuid":"1005095154","full_name":"VainF/Reasoning-SFT","owner":"VainF","description":"SFT of Reasoning LLMs with Megatron-LM","archived":false,"fork":false,"pushed_at":"2025-06-19T21:19:39.000Z","size":1118,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-19T21:31:52.733Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VainF.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE_MEGATRON_LM","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T16:54:41.000Z","updated_at":"2025-06-19T21:19:43.000Z","dependencies_parsed_at":"2025-06-19T21:42:24.516Z","dependency_job_id":null,"html_url":"https://github.com/VainF/Reasoning-SFT","commit_stats":null,"previous_names":["vainf/reasoning-sft"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/VainF/Reasoning-SFT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VainF%2FReasoning-SFT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VainF%2FReasoning-SFT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VainF%2FReasoning-SFT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VainF%2FReasoning-SFT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VainF","download_url":"https://codeload.github.com/VainF/Reasoning-SFT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VainF%2FReasoning-SFT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278867073,"owners_count":26059708,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-21T13:39:09.022Z","updated_at":"2025-10-08T00:10:56.119Z","avatar_url":"https://github.com/VainF.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reasoning-SFT\n\n![image](https://github.com/user-attachments/assets/450042a1-6749-4015-83d5-1490db05e7fc)\n\n\nThis repository is a customized version of [NVIDIA Megatron-LM](https://github.com/NVIDIA/Megatron-LM), extended to support Supervised Fine-Tuning (SFT) of reasoning models. **Reasoning-SFT** applies prompt masking to train exclusively on the response. It was used to train the hybrid reasoning model [Thinkless-1.5B-Warmup](https://huggingface.co/Vinnnf/Thinkless-1.5B-Warmup). This code is also suitable for standard SFT. \n\n\u003ctable\u003e\n\u003ctable\u003e\n  \u003cthead\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e📄 \u003cstrong\u003ePaper Link\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"http://arxiv.org/abs/2505.13379\"\u003eArXiv\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e💻 \u003cstrong\u003eThinkless GitHub\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://github.com/VainF/Thinkless\"\u003eVainF/Thinkless\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e🤖 \u003cstrong\u003eRL Model\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/Vinnnf/Thinkless-1.5B-RL-DeepScaleR\"\u003eThinkless-1.5B-RL-DeepScaleR\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e🐣 \u003cstrong\u003eWarmup Model\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/Vinnnf/Thinkless-1.5B-Warmup\"\u003eThinkless-1.5B-Warmup\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e📊 \u003cstrong\u003eData for Warmup\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/datasets/Vinnnf/Hybrid-OpenThoughts2-1M-1.5B\"\u003eHybrid-OpenThoughts2-1M-1.5B\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e📊 \u003cstrong\u003eData for RL\u003c/strong\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset\"\u003eagentica-org/DeepScaleR-Preview-Dataset\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\n## Setup\n\nWe recommend using a Docker container to run this code, as installing Transformer Engine and Megatron-LM might be a bit complex.\n\n```bash\n# In your user account\ncd Megatron-SFT\npip install -r requirements.txt # install the transformers in your user account\ndocker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v \"$PWD\":\"$PWD\" -v $HOME:$HOME -w \"$PWD\" -it --rm nvcr.io/nvidia/pytorch:24.12-py3 \n```\n\nRunning the above command will mount both the current directory and your home directory into the Docker container. To mount additional directories, simply add `-v /path/to/dir:/path/to/dir` to the command.\n\nOnce inside the Docker container, install all necessary packages:\n```bash\n# In the docker\npip install -r requirements.txt\n```\n\n## Example: Hybrid Reasoning via SFT (DeepSeek-R1-Distill-Qwen-1.5B)\n\n![image](https://github.com/user-attachments/assets/2dcf76bd-af2d-425f-b25c-c5e050f11875)\n\nIn this example, we show how to fine-tune the ``deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`` to enable hybrid reasoning (Warm-up). \n\n\u003e [!IMPORTANT]\n\u003e Since Docker creates files with root permissions, we download and preprocess the models and data using your user account. This ensures you can easily modify the files later using your preferred editor, such as VSCode.\n\n### 0. GPU Resource\nThe default config for 1.5B LLM with `Tensor Parallel=1` and `Pipeline Parallel=1` (TP1PP1) requires ~70 GB of memory per GPU. If you're using GPUs with less memory, consider using TP2PP1 or TP1PP2 to distribute the model parameters across multiple GPUs. You can also decrease the sequence length from 16384 to 8192.\n\n### 1. Prepare the LLM\n\n```bash\n# In your user account \npython scripts/checkpoints/download_hf_models.py --model-card deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\n```\nThe huggingface model will be saved in `assets/checkpoints`.\n```bash\nassets\n├── cache\n└── checkpoints\n    └── deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B\n        ├── config.json\n        ├── generation_config.json\n        ├── model.safetensors\n        ├── special_tokens_map.json\n        ├── tokenizer_config.json\n        └── tokenizer.json\n```\n\nThen, we modify the tokenizer files to replace the `\u003c|quad_start|\u003e` token with a control token `\u003cshort\u003e`. \n```\n#assets/checkpoints/deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B/tokenizer_config.json\n\"151650\": {\n      \"content\": \"\u003c|short|\u003e\", # originally \"\u003c|quad_start|\u003e\"\n      \"lstrip\": false,\n      \"normalized\": false,\n      \"rstrip\": false,\n      \"single_word\": false,\n      \"special\": false # originally true\n    },\n```\n\n```\n#assets/checkpoints/deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B/tokenizer.json\n    {\n      \"id\": 151650,\n      \"content\": \"\u003cshort\u003e\", # originally \"\u003c|quad_start|\u003e\"\n      \"single_word\": false,\n      \"lstrip\": false,\n      \"rstrip\": false,\n      \"normalized\": false,\n      \"special\": false # originally true\n    },\n```\n\nRemove the final `\u003cthink\u003e` in the chat template, and remove the split (`content = content.split('\u003c/think\u003e')[-1]`).\n```\n#assets/checkpoints/deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B/tokenizer_config.json\n  \"chat_template\": \"{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'\u003c｜User｜\u003e' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'\u003c｜Assistant｜\u003e\u003c｜tool▁calls▁begin｜\u003e\u003c｜tool▁call▁begin｜\u003e' + tool['type'] + '\u003c｜tool▁sep｜\u003e' + tool['function']['name'] + '\\\\n' + '```json' + '\\\\n' + tool['function']['arguments'] + '\\\\n' + '```' + '\u003c｜tool▁call▁end｜\u003e'}}{%- set ns.is_first = true -%}{%- else %}{{'\\\\n' + '\u003c｜tool▁call▁begin｜\u003e' + tool['type'] + '\u003c｜tool▁sep｜\u003e' + tool['function']['name'] + '\\\\n' + '```json' + '\\\\n' + tool['function']['arguments'] + '\\\\n' + '```' + '\u003c｜tool▁call▁end｜\u003e'}}{{'\u003c｜tool▁calls▁end｜\u003e\u003c｜end▁of▁sentence｜\u003e'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'\u003c｜tool▁outputs▁end｜\u003e' + message['content'] + '\u003c｜end▁of▁sentence｜\u003e'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '\u003c/think\u003e' in content %}{% set content = content %}{% endif %}{{'\u003c｜Assistant｜\u003e' + content + '\u003c｜end▁of▁sentence｜\u003e'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'\u003c｜tool▁outputs▁begin｜\u003e\u003c｜tool▁output▁begin｜\u003e' + message['content'] + '\u003c｜tool▁output▁end｜\u003e'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\\\n\u003c｜tool▁output▁begin｜\u003e' + message['content'] + '\u003c｜tool▁output▁end｜\u003e'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'\u003c｜tool▁outputs▁end｜\u003e'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'\u003c｜Assistant｜\u003e\\\\n'}}{% endif %}\",\n```\n\n\n### 2. Convert to Megatron Format\n\nConvert the HF model to Megatron format:\n```bash\n# In the docker\nbash scripts/checkpoints/convert_deepseek_r1_to_megatron.sh 1.5B 1 1 \n```\nWe have three parameters here: \n* The model size: `1.5B`, `32B` etc.\n* Tensor Parallel and Pipeline Parallel: `1 1` means no tensor parallel and no pipeline parallel. You can try `2 1` or `1 2` to use tensor parallel or pipeline parallel, respectively.\n\nThe above command will create a megatron checkpoint like this:\n```bash\nassets\n├── cache\n└── checkpoints\n    ├── deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B\n    └── deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B_Megatron_TP1PP1\n        ├── iter_0000001\n        │   └── mp_rank_00\n        │       └── model_optim_rng.pt\n        └── latest_checkpointed_iteration.txt\n```\n\n### 3. Prepare the Hybrid Reasoning Dataset\n\nDownload the hybrid reasoning dataset from Huggingface and save it as a JSON file. We assume that the dataset already contains `instruction` and `output` fields. For other datasets, you may customize the [scripts/data/download_hf_dataset.py](scripts/data/download_hf_dataset.py).\n```bash\n# In your user account\nbash scripts/data/download_hf_dataset.py --dataset-card Vinnnf/Hybrid-OpenThoughts2-1M-1.5B\n```\n```bash\nassets\n├── cache\n├── checkpoints\n└── data\n    └── Vinnnf-Hybrid-OpenThoughts2-1M-1.5B\n        └── Vinnnf-Hybrid-OpenThoughts2-1M-1.5B.json\n```\n\nPre-tokenize the dataset:\n```bash\n# In the docker\nbash scripts/data/tokenize_dataset.sh assets/checkpoints/deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B assets/data/Vinnnf-Hybrid-OpenThoughts2-1M-1.5B/Vinnnf-Hybrid-OpenThoughts2-1M-1.5B.json 16384\n```\n```\nassets/\n├── cache\n├── checkpoints\n└── data\n    └── Vinnnf-Hybrid-OpenThoughts2-1M-1.5B\n        ├── Tokenized-Vinnnf-Hybrid-OpenThoughts2-1M-1.5B-deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B-16384_text_document.bin\n        ├── Tokenized-Vinnnf-Hybrid-OpenThoughts2-1M-1.5B-deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B-16384_text_document.idx\n        └── Vinnnf-Hybrid-OpenThoughts2-1M-1.5B.json\n```\nThe parameters are:\n* The path to the tokenizer model (usually the HF model path)\n* The path to the dataset JSON file\n* The maximum sequence length for training, a value larger than 16384 is recommended.\n\n\n### 4. Training\n\nRun the fine-tuning script:\n```bash\n# In the Docker\nbash scripts/sft/SFT_Hybrid_R1_1.5B_OpenThoughts_1M.sh train\n```\n\nAuto Resume:\n```bash\n# In the Docker\nbash scripts/sft/SFT_Hybrid_R1_1.5B_OpenThoughts_1M.sh resume\n```\n\n### 5. Export to Huggingface Format\n```bash\n# In the Docker\nbash scripts/checkpoints/merge_and_export.sh PATH_TO_YOUR_CKPT assets/checkpoints/deepseek_ai_DeepSeek_R1_Distill_Qwen_1.5B assets/checkpoints/export/Hybrid_R1_1.5B\n```\n\n### 6. Training Loss for Reference\n\n\u003cimg width=\"780\" alt=\"image\" src=\"https://github.com/user-attachments/assets/19bc47f4-b481-4aff-aa11-3e0ac658ee74\" /\u003e\n\n## Acknowledgement\n\nThis implementation is also heavily based on [alibaba/Pai-Megatron-Patch](https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvainf%2Freasoning-sft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvainf%2Freasoning-sft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvainf%2Freasoning-sft/lists"}