{"id":25982414,"url":"https://github.com/microsoft/rStar","last_synced_at":"2025-03-05T09:03:35.936Z","repository":{"id":273377763,"uuid":"848189443","full_name":"microsoft/rStar","owner":"microsoft","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-19T17:15:18.000Z","size":2733,"stargazers_count":442,"open_issues_count":10,"forks_count":43,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-03-02T04:57:32.284Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-27T09:52:56.000Z","updated_at":"2025-03-01T14:28:44.000Z","dependencies_parsed_at":"2025-01-20T15:44:12.979Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/rStar","commit_stats":null,"previous_names":["microsoft/rstar"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FrStar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FrStar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FrStar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FrStar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/rStar/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241997422,"owners_count":20055117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-05T09:03:34.819Z","updated_at":"2025-03-05T09:03:35.927Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003e\n\u003cbr\u003e\nrStar-Math\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n📃 \u003ca href=\"https://huggingface.co/papers/2501.04519\" target=\"_blank\"\u003e[Paper]\u003c/a\u003e \n\u003c/p\u003e\n\nRepo for \"[rStar-Math: Small LLMs Can Master Math Reasoning\nwith Self-Evolved Deep Thinking](https://huggingface.co/papers/2501.04519)\".\n\nAuthors: [Xinyu Guan](https://gxy-2001.github.io/)\\*, [Li Lyna Zhang](https://www.microsoft.com/en-us/research/people/lzhani/)\\*, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"images/main_table.png\" width=\"1000\"\u003e\n        \u003cbr\u003e\n    \u003cem\u003eTable 1: rStar-Math enables frontier math reasoning in SLMs via deep thinking over 64 trajectories.\u003c/em\u003e\n\u003c/p\u003e\n\n## News \n- **[02/10/2025]** We are hiring interns! If you are interested in improving LLM reasoning, please send your CV to lzhani@microsoft.com.\n- **[01/21/2025]** Our code has been open-sourced. \n- **[01/09/2025]** Our paper is released: https://huggingface.co/papers/2501.04519.\n\n\nNote: Our prior work [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://huggingface.co/papers/2408.06195) is open-sourced on the `rStar-mutualreasoning` branch.\n\n\n\n## Contents\n- [Introduction](#Introduction)\n- [Setup](#Setup)\n- [Usage](#Usage)\n- [Citation](#Citation)\n\n\n## Introduction\nWe present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1-mini, without distillation from superior models. rStar-Math achieves this by exercising \"deep thinking\" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. The diagram below presents an overview of the rStar-Math framework, highlighting its core components and processes.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/rstar.png\"\u003e\n\u003c/p\u003e\n\n## Setup\n\nWe recommend using conda for environment management and executing the code on an A100 80G GPU equipped with CUDA 12.4.\n1. Create a Python environment with python3.11: \n```\nconda create -y --name rstar python=3.11\nconda init \u0026\u0026 source deactivate # init\nconda activate rstar\n```\n2. Install requirements\n```\npip install --upgrade pip\npip install -r requirements.txt\n\n# optional: install flash-attn 2\n# pip install flash-attn --no-build-isolation\n```\n3. Install [evaluation toolkit](https://arxiv.org/abs/2404.13925)\n```\ngit clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git\ncd MARIO_EVAL\ncd latex2sympy \u0026\u0026 pip install . \u0026\u0026 cd ..\npip install -e .\ncd ..\n```\n\nvllm 0.6.6 requires torch 2.5.1, which requires CUDA 12.4. If your CUDA version is lower than 12.4, you can execute the following command:\n```\nexport LD_LIBRARY_PATH=$(python -c \"import site; print(site.getsitepackages()[0] + '/nvidia/nvjitlink/lib')\"):$LD_LIBRARY_PATH\n```\nThis will help prevent the error: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12.\n\n## Usage\n\n### Generate Training Data \n\nMost of the math problems are sourced from [NuminaMath](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) and [MetaMath](https://huggingface.co/datasets/meta-math/MetaMathQA). To generate your own data, reformat questions and answers into the [eval_data](eval_data/aime2024_test.json) format.\n\n#### Bootstrapping round \n\nYou may choose to use the following command to generate train data. Please ensure that sufficient GPU memory is allocated for the model, and modify the `CUDA_VISIBLE_DEVICES` as well as the `llm_gpu_memory_utilization` and `tp` parameters in the configuration file.\n\n```bash\nMODEL=\"deepseek-ai/DeepSeek-Coder-V2-Instruct\"  \nQAF=\"train set path\"\nCFG=\"config/sample_mcts.yaml\"\nCUDA_VISIBLE_DEVICES=\"0,1,2,3,4,5,6,7\" python main.py --qaf $QAF --custom_cfg $CFG --model_dir $MODEL\n```\n\n#### Round2-4: From rStar Policy Model and Reward Model\n\nAfter training the policy and reward models, use this command to generate enhanced training data.\n\n```bash\nMODEL=\"policy model dir\"  \nRM=\"reward model dir\" \nQAF=\"train set path\"\nCFG=\"config/sft_sample_mcts.yaml\"\nCUDA_VISIBLE_DEVICES=\"0\" python main.py --qaf $QAF --custom_cfg $CFG --model_dir $MODEL --reward_model_dir $RM\n```\n\n### Extracting SFT and RM Training Data\nWe provide scripts to extract the complete trace from MCTS data, which can be used for both data analysis and the construction of training datasets for Supervised Fine-Tuning and Reward Modeling.\n\n#### Extract SFT Training Data\nRun the following command to extract the complete trace from your MCTS data directory, this command collects all the detailed information from the MCTS runs, which is useful for further analysis or to prepare data for training:\n\n```bash\npython extra_sft_file.py --data_dir \"MCTS file dir\" --output_file \"sft_extra_result.jsonl\"\n```\n\nyou can sample the data to create a dataset for SFT training by running:\n\n```bash\npython train/sample_sft_data.py --data_file \"sft_extra_result.jsonl\" --output_file \"sft.json\" --n 2\n```\n\n#### Extract RM Training Data\n\nRun the following command to extract all step-level pair preference data from your MCTS files.\n\n```bash\npython extra_rm_file.py --data_dir \"MCTS file dir\" --output_file \"rm_extra_result.jsonl\"\n```\n\nsample the data for Reward Modeling:\n\n```bash\npython train/sample_rm_data.py --data_file \"rm_extra_result.jsonl\" --output_file \"rm.json\"\n```\n\n### Inference \u0026 Evaluation\n\n#### MCTS Inference with Policy Model and Reward Model\n\nTo reproduce the results in our main table, run the provided command will yield a mcts result file. For each run, a trajectory is selected based on the highest overall response score. Refer to `run_example.sh` for a demonstration of executing multiple trajectories. Our final score is determined by a majority vote on the top K highest-reward trajectories.\n\n```bash\nMODEL=\"policy model dir\"\nRM=\"reward model dir\" \nQAF=\"test set path\"\nCFG=\"config/sft_eval_mcts.yaml\"\nCUDA_VISIBLE_DEVICES=\"0\" python main.py --qaf $QAF --custom_cfg $CFG --model_dir $MODEL --reward_model_dir $RM\n```\n\nExecuting the command or further increasing the number of nodes may lead to enhanced performance, but it would also require a considerable amount of GPU resources. To optimize, consider reducing `n_generate_sample` and `iterations` to 16 and 8, respectively, which still delivers satisfactory results. Alternatively, replacing MCTS with step beam search improves search speed, though with a slight accuracy trade-off. Use the following command to implement this strategy.\n\n```bash\nMODEL=\"policy model dir\"\nRM=\"reward model dir\" \nQAF=\"test set path\"\nCFG=\"config/sft_eval_bs.yaml\"\nCUDA_VISIBLE_DEVICES=\"0\" python main.py --qaf $QAF --custom_cfg $CFG --model_dir $MODEL --reward_model_dir $RM\n```\n\n\n#### Greedy Decoding with Policy Model (SFT Pass@1 accuracy)\n\nRunning the following command will generate the results using Greedy Decode.\n\n```bash\nMODEL=\"policy model dir\"\nQAF=\"test set path\"\nCFG=\"config/sft_eval_greedy.yaml\"\nCUDA_VISIBLE_DEVICES=\"0\" python main.py --qaf $QAF --custom_cfg $CFG --model_dir $MODEL\n```\n\nWe've streamlined the evaluation process, enabling easy generation of greedy decode results for tasks like `gsm8k`, `math`, `math500`, `aime2024`, `amc23`, `collegemath`, `gaokao2023en`, `olympiadbench`, and `omni-math`. Results are saved in the model's directory by default.\n\n```bash\nMODEL=\"policy model dir\"\npython eval.py --model \"$MODEL\" --device 0 --task amc23 \n# evaluate a result file\npython eval_output.py --file_path $MODEL\"/amc23.jsonl\"\n```\n\n\n### Fine-tune the Policy Model and Reward Model\n\nThe training script is configured for 8*mi300x GPUs by default. For users with NVIDIA GPUs with limited VRAM, reduce `per_device_train_batch_size` and increase `gradient_accumulation_steps` accordingly. You can also enable `flash_attention_2` with the `--attn_impl flash_attention_2` flag, which maintains similar accuracy to the `eager` implementation.\n\nExample training data is available in `train/sft_data_examples.json` and `train/rm_data_examples.json`. For SFT, we typically select the two traces with the highest average scores from MCTS. For PPM training, we pair the highest-scoring correct solution with the lowest-scoring incorrect one.\n\n**SFT Train Script**\n```bash\n\nexport NCCL_IB_DISABLE=1\nexport NCCL_P2P_DISABLE=0\nexport NLLC_P2P_LEVEL=NVL\nexport CUBLAS_WORKSPACE_CONFIG=:4096:8\nexport NCCL_BLOCKING_WAIT=0\nexport FLASH_ATTENTION_DETERMINISTIC=1\nexport MASTER_ADDR=\"localhost\"\nexport MASTER_PORT=\"1939\"\nexport GLOO_SOCKET_IFNAME=\"lo\"\nexport NCCL_SOCKET_IFNAME=\"lo\"\n\n\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nproc_per_node=8 --use_env train/train_SFT.py \\\n    --model_name_or_path \"Qwen/Qwen2.5-Math-7B\" \\\n    --data_path \"data_path\" \\\n    --data_length 10000000 \\\n    --bf16 True \\\n    --output_dir \"path_to_save\" \\\n    --num_train_epochs 2 \\\n    --per_device_train_batch_size 4 \\\n    --per_device_eval_batch_size 4 \\\n    --gradient_accumulation_steps 4 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"steps\" \\\n    --save_steps 100000 \\\n    --save_total_limit 2 \\\n    --learning_rate 7e-6 \\\n    --weight_decay 0.1 \\\n    --warmup_ratio 0 \\\n    --lr_scheduler_type \"linear\" \\\n    --logging_steps 1 \\\n    --fsdp \"full_shard auto_wrap\" \\\n    --fsdp_transformer_layer_cls_to_wrap 'Qwen2DecoderLayer'\n    # --attn_impl flash_attention_2 # \n```\n\n**RM Train Script**\n```bash\n\nexport WANDB_DISABLED=true\nexport CUBLAS_WORKSPACE_CONFIG=:4096:8\nexport FLASH_ATTENTION_DETERMINISTIC=1\n\naccelerate launch --num_processes=8 train/train_RM.py \\\n    --model_name_or_path=\"sft_model_path\" \\\n    --output_dir=\"path_to_save\" \\\n    --pair_json_path \"data_path\" \\\n    --per_device_train_batch_size=16 \\\n    --per_device_eval_batch_size=16 \\\n    --num_train_epochs=2 \\\n    --gradient_accumulation_steps=4 \\\n    --gradient_checkpointing=True \\\n    --learning_rate=7e-6 \\\n    --remove_unused_columns=False \\\n    --optim=\"adamw_torch\" \\\n    --logging_steps=1 \\\n    --eval_strategy=\"steps\" \\\n    --eval_steps=750 \\\n    --save_steps=750 \\\n    --load_best_model_at_end \\\n    --save_total_limit=5 \\\n    --max_length=2048 \\\n    --bf16 \\\n    --fsdp \"full_shard auto_wrap\" \\\n    --fsdp_transformer_layer_cls_to_wrap 'Qwen2DecoderLayer' \n    # --attn_impl flash_attention_2 \n\n```\n\n\n---\n\n\n## Citation\nIf you find this repo useful for your research, please consider citing the paper\n```\n@misc{guan2025rstar,\n    title={rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking},\n    author={Xinyu Guan and Li Lyna Zhang and Yifei Liu and Ning Shang and Youran Sun and Yi Zhu and Fan Yang and Mao Yang},\n    year={2025},\n    eprint={2501.04519},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FrStar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FrStar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FrStar/lists"}