{"id":28702269,"url":"https://github.com/modelscope/r-chain","last_synced_at":"2025-06-14T12:32:17.420Z","repository":{"id":276975515,"uuid":"930765318","full_name":"modelscope/r-chain","owner":"modelscope","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-10T03:15:48.000Z","size":35,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-10T04:26:24.911Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-11T07:08:30.000Z","updated_at":"2025-03-10T03:15:52.000Z","dependencies_parsed_at":"2025-02-11T13:37:03.268Z","dependency_job_id":"5044dfb3-1ce8-4dda-9a99-85b865ee6090","html_url":"https://github.com/modelscope/r-chain","commit_stats":null,"previous_names":["modelscope/r-chain"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/modelscope/r-chain","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fr-chain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fr-chain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fr-chain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fr-chain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/r-chain/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2Fr-chain/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259816188,"owners_count":22915829,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-14T12:30:57.189Z","updated_at":"2025-06-14T12:32:17.400Z","avatar_url":"https://github.com/modelscope.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## R-Chain: A lightweight toolkit for distilling reasoning models\n\nInspired by reasoning models like DeepSeek-R1 series, we put together r-chain to systematically reproduce the distillation process of reasoning models like DeepSeek-R1, for various tasks including mathematical reasoning. This effort involves several key steps and outcomes:\n\n1. **Dataset Curation**: Curate mathematical distillation datasets, [MathR](https://www.modelscope.cn/datasets/modelscope/MathR) and [MathR-32B-Distill](https://www.modelscope.cn/datasets/modelscope/MathR-32B-Distill), which incorporate reasoning processes. These datasets shall be generated using the DeepSeek-R1 and DeepSeek-R1-Distill-Qwen-32B models, respectively.\n\n2. **Training and Evaluation**: Use the curated datasets to distill a smaller dense model, such as Qwen2.5-7B-Instruction, separately. Evaluate the resulting model on reaonsing datasets, to validate the effectiveness of the curated data.\n\n3. **Reasoning response verification**: Verify the reasoning content generated by the o1/R1-like models, and filter out the incorrect reasoning content with rule-based and model-based strategies.\n\n### MathR and MathR-32B-Distill Dataset Construction\n1. **Problem Selection**:  Utilize publicly available dataset such as [NuminaMath-CoT](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-CoT), including problems of different kinds, such as amc_aime, math, gsm8k and others.\n\n2. **Teacher Model Inference**: We generate responses from Teacher models such as DeepSeek-R1 and DeepSeek-R1-Distill-Qwen-32B. The instruction prompt `\"Please reason step by step, and put your final answer within \\boxed{}.\"` is employed to guide and solict output from Teacher Model. After obtaining the `reasoning_content` and `content` from teacher models, we format them using the template `f'\u003cthink\u003e{reasoning_content}\u003c/think\u003e\\n\\n\u003canswer\u003e{content}\u003c/answer\u003e'`. These formatted responses are then assembled into standard `messages` format, making them ready for direct use in training. All data generated in this step is progressively uploaded to `raw` subsets of the [MathR](https://www.modelscope.cn/datasets/modelscope/MathR) and [MathR-32B-Distill](https://www.modelscope.cn/datasets/modelscope/MathR-32B-Distill) datasets.\n\n3. **Response Filtering**: Even with strong Teacher models such as DeepSeek-R1, their responses to challenging math problems may still contain errors. To address this, we employ a rule-based filtering approach to filter the `raw` datasets. We have implemented different filtering strategies tailored to the various problems in [NuminaMath-CoT](https://www.modelscope.cn/datasets/AI-MO/NuminaMath-CoT), depending on the source of the questions. The filtered data is uploaded to the `clean` subsets of MathR and MathR-32B-Distill datasets.\n\n### Tools for Training and Evaluation\n**r-chain** is built upon existing tools such as [ms-swift](https://github.com/modelscope/ms-swift.git) and [evalscope](https://github.com/modelscope/evalscope.git) for performing supervised fine-tuning and evaluation, respectively.\n\n#### Supervised Fine-Tuning:\nTraining can be done with command\n```\nbash examples/train_scripts/train_MathR-Distill-7B.sh\n```\nThe script leverages [ms-swift](https://github.com/modelscope/ms-swift.git) and perform SFT on Qwen2.5-7B-Instruct with MathR and MathR-32B-Distill datasets. By default the training is configured to run on 8 GPUs, you may modify the script for various configurations.\n\n### Deployment:\nOnce the model is trained, you may deploy it to a vllm backend via\n```\nbash examples/evaluation_scripts/deploy_MathR-Distill-7B.sh\n```\nThis facilitates model evaluation later.\n##### Evaluation：\nThe model may be evaluated with [evalscope](https://github.com/modelscope/evalscope.git) with the following script:\n```\npython examples/evaluation_scripts/eval_MathR_Distill_7B.py\n```\nBy default it evaulates on MATH-500 and GPQA-Diamond benchmarks, with evaluation metric being `Pass@1`. Each sample is generated five times and the result is the average of these five attempts.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fr-chain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2Fr-chain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2Fr-chain/lists"}