{"id":13754199,"url":"https://github.com/meta-math/MetaMath","last_synced_at":"2025-05-09T22:31:33.510Z","repository":{"id":196192325,"uuid":"694580809","full_name":"meta-math/MetaMath","owner":"meta-math","description":"MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models","archived":false,"fork":false,"pushed_at":"2024-02-01T15:29:09.000Z","size":12177,"stargazers_count":385,"open_issues_count":12,"forks_count":35,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-11-16T07:33:10.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://meta-math.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meta-math.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-21T09:31:06.000Z","updated_at":"2024-11-13T07:35:53.000Z","dependencies_parsed_at":"2023-12-29T01:26:23.516Z","dependency_job_id":"4e7c5413-ba7b-42d7-86ef-7f205f7bf219","html_url":"https://github.com/meta-math/MetaMath","commit_stats":null,"previous_names":["meta-math/metamath"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-math%2FMetaMath","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-math%2FMetaMath/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-math%2FMetaMath/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-math%2FMetaMath/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meta-math","download_url":"https://codeload.github.com/meta-math/MetaMath/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335684,"owners_count":21892713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:49.355Z","updated_at":"2025-05-09T22:31:29.022Z","avatar_url":"https://github.com/meta-math.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","SFT Statistics","Python"],"sub_categories":["大语言对话模型及数据","Code \u0026 Math"],"readme":"# MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models\r\n\r\n[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](CODE_LICENSE)\r\n[![Model Weight License](https://img.shields.io/badge/Model%20Weights%20License-LLaMA2-yellow)](MetaMath/LICENSE)\r\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)\r\n\r\n\u003cp align=\"center\"\u003e\r\n🤗 \u003ca href=\"https://huggingface.co/meta-math\" target=\"_blank\"\u003eHF Repo\u003c/a\u003e • 📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e\u003cbr\u003e\r\n\u003c/p\u003e\r\n\r\n\u003cp align=\"center\" width=\"100%\"\u003e\r\n\u003ca \u003e\u003cimg src=\"./imgs/metamath.svg\" alt=\"MetaMath\" style=\"width: 80%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n\r\n## News\r\n- 🔥 Our **MetaMath-Llemma-7B** model achieves  **30.0 pass@1** on the MATH Benchmarks, surpassing all the SOTA open-source LLM in 7B-13B scales! All the training scripts and the model are opened.\r\n- 🔥 Our **MetaMath-Mistral-7B** model achieves  **77.7 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), surpassing all the SOTA open-source LLM! All the training scripts and the model are opened.\r\n- 🔥 The full **MetaMathQA** dataset is now released in the huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)!\r\n- 🔥 We released the GSM8K_Backward dataset is also released in the huggingface [GSM8K_Backward](https://huggingface.co/datasets/meta-math/GSM8K_Backward) to evaluate the reversal mathematical reasoning ability!\r\n- 🔥 Although the data augmentation for **MetaMathQA** is sourced from **ChatGPT 3.5**, Our **MetaMath-70B** model outperforms the closed-source LLMs **ChatGPT 3.5** on the GSM8K!\r\n- 🔥 Our **MetaMath-7B** model achieves  **66.5 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), **11.6** points higher than the SOTA open-source LLM!\r\n- 🔥 Our **MetaMath-7B** model achieves  **19.8 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), **9.1** points higher than the SOTA open-source LLM!\r\n\r\n| Model | Checkpoint | Paper  | GSM8k | MATH  | License|\r\n| ----- |------| ---- |------|-------| ----- |\r\n| MetaMath-70B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-70B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e |  📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e| **82.3**  |  **26.6**\t| \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/llama-downloads/\" target=\"_blank\"\u003eLlama 2  \u003c/a\u003e |\r\n| MetaMath-13B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-13B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e |  📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e| **72.3**  |  **22.4** | \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/llama-downloads/\" target=\"_blank\"\u003eLlama 2 \u003c/a\u003e |\r\n| MetaMath-7B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-7B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e  |  📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e| \t **66.5**  |  **19.8** |  \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/llama-downloads/\" target=\"_blank\"\u003eLlama 2  \u003c/a\u003e|\r\n| MetaMath-Mistral-7B | 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-Mistral-7B\" target=\"_blank\"\u003eHF Link\u003c/a\u003e  |  📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e| \t **77.7**  |  **28.2** |  \u003ca href=\"http://www.apache.org/licenses/\" target=\"_blank\"\u003eApache License 2.0  \u003c/a\u003e|\r\n| MetaMath-Llemma-7B | 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-Llemma-7B\" target=\"_blank\"\u003eHF Link\u003c/a\u003e  |  📃 \u003ca href=\"https://arxiv.org/abs/2309.12284\" target=\"_blank\"\u003e[MetaMath]\u003c/a\u003e| \t **69.2**  |  **30.0** |  \u003ca href=\"http://www.apache.org/licenses/\" target=\"_blank\"\u003eApache License 2.0  \u003c/a\u003e|\r\n                                                                                                                                                                                                                                                                                                   \r\n                                                                                                                                                                                                                                                                                                                                                                             \r\n\r\n## Comparing MetaMath with the LLM models.\r\n\r\n🔥 Comprehensive Results\r\n\r\n| Model               | GSM8k Pass@1 | MATH Pass@1 |\r\n|---------------------|--------------|-------------|\r\n| MPT-7B              | 6.8          | 3.0         |\r\n| Falcon-7B           | 6.8          | 2.3         |\r\n| LLaMA-1-7B          | 11.0         | 2.9         |\r\n| LLaMA-2-7B          | 14.6         | 2.5         |\r\n| MPT-30B             | 15.2         | 3.1         |\r\n| LLaMA-1-13B         | 17.8         | 3.9         |\r\n| GPT-Neo-2.7B        | 19.5         | --          |\r\n| Falcon-40B          | 19.6         | 2.5         |\r\n| Baichuan-chat-13B   | 23.9         | --          |\r\n| Vicuna-v1.3-13B     | 27.6         | --          |\r\n| LLaMA-2-13B         | 28.7         | 3.9         |\r\n| InternLM-7B         | 31.2         | --          |\r\n| ChatGLM-2-6B        | 32.4         | --          |\r\n| GPT-J-6B            | 34.9         | --          |\r\n| LLaMA-1-33B         | 35.6         | 3.9         |\r\n| LLaMA-2-34B         | 42.2         | 6.24        |\r\n| RFT-7B              | 50.3         | --          |\r\n| LLaMA-1-65B         | 50.9         | 10.6        |\r\n| Qwen-7B             | 51.6         | --          |\r\n| WizardMath-7B       | 54.9         | 10.7        |\r\n| LLaMA-2-70B         | 56.8         | 13.5        |\r\n| WizardMath-13B      | 63.9         | 14.0        |\r\n| 🔥 MetaMath-7B         | **66.5**     | **19.8**    |\r\n| 🔥 MetaMath-13B        | **72.3**     | **22.4**    |\r\n| 🔥 MetaMath-Mistral-7B | **77.7**     | **28.2**    |\r\n| 🔥 MetaMath-Llemma-7B  | **69.2**     | **30.0**    |\r\n| WizardMath-70B      | 81.6         | 22.7        |\r\n| 🔥 MetaMath-70B        | **82.3**     | **26.6**    |\r\n\r\n\u003ch2 id=\"env\"\u003eQuick Start\u003c/h2\u003e\r\n\r\nClone Metamath and install the required packages:\r\n\r\n```bash\r\ngit clone https://github.com/meta-math/MetaMath.git\r\ncd MetaMath\r\npip install -r requirements.txt\r\n```\r\n\r\nIf you encounter a Ray installation problem, please run:\r\n\r\n```bash\r\npip install --upgrade ray\r\npip install --upgrade pyarrow\r\npip install pandas\r\n```\r\n\r\n\u003ch2 id=\"Inference\"\u003eDataset Usage\u003c/h2\u003e\r\n\r\nRun the following command to load the data:\r\n\r\n```python\r\nfrom datasets import load_dataset\r\ndataset = load_dataset(\"meta-math/MetaMathQA\")\r\n```\r\n\r\n\r\n\u003ch2 id=\"train\"\u003eTraining\u003c/h2\u003e\r\n\r\nyou need to prepare the  llama-2 base model and our **MetaMathQA** dataset huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)\r\n\r\n```\r\nbash run.sh\r\n```\r\nor\r\n\r\n```\r\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nproc_per_node=8 --use_env train_math.py \\\r\n    --model_name_or_path \"meta-llama/Llama-2-7b-hf\" \\\r\n    --data_path \"path/to/metamathqa\" \\\r\n    --data_length 10000000 \\\r\n    --bf16 True \\\r\n    --output_dir \"path/to/save\" \\\r\n    --num_train_epochs 3 \\\r\n    --per_device_train_batch_size 4 \\\r\n    --per_device_eval_batch_size 4 \\\r\n    --gradient_accumulation_steps 4 \\\r\n    --evaluation_strategy \"no\" \\\r\n    --save_strategy \"steps\" \\\r\n    --save_steps 1000 \\\r\n    --save_total_limit 2 \\\r\n    --learning_rate 2e-5 \\\r\n    --weight_decay 0. \\\r\n    --warmup_ratio 0.03 \\\r\n    --lr_scheduler_type \"cosine\" \\\r\n    --logging_steps 1 \\\r\n    --fsdp \"full_shard auto_wrap\" \\\r\n    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \\\r\n    --tf32 True\r\n```\r\n\r\n### Supervised fine-tuning\r\n\r\nWe supervised fine-tune MetaMath-7B with the following hyperparameters:\r\n\r\n| Hyperparameter | LLaMA 2 7B |\r\n|----------------|-------------|\r\n| Batch size     | 128         |\r\n| Learning rate  | 2e-5        |\r\n| Epochs         | 3           |\r\n| Max length     | 512         |\r\n| LR scheduler   | cosine      |\r\n\r\n\u003ch2 id=\"evaluation\"\u003eEvaluation\u003c/h2\u003e\r\n\r\nwe use the vllm to help the fast generation:\r\n\r\n```\r\npython eval_gsm8k.py --model \"path/to/save\" --data_file ./data/test/GSM8K_test.jsonl\r\npython eval_math.py --model \"path/to/save\" --data_file ./data/test/MATH_test.jsonl\r\n```\r\nwhere the \"path/to/save\" should be replaced by the finetuned model, you can also download our series of MetaMath models in huggingface:  \r\n🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-7B-V1.0\" target=\"_blank\"\u003eMetaMath 7B\u003c/a\u003e 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-13B-V1.0\" target=\"_blank\"\u003eMetaMath 13B\u003c/a\u003e 🤗 \u003ca href=\"https://huggingface.co/meta-math/MetaMath-70B-V1.0\" target=\"_blank\"\u003eMetaMath 70B\u003c/a\u003e\r\n\r\nThe inference prompt for our MetaMath is:\r\n```\r\n\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction}\\n\\n### Response: Let's think step by step.\"\r\n```\r\n\r\nThanks for the open source code of [WizardMath](https://github.com/nlpxucan/WizardLM/tree/main/WizardMath) and [RFT](https://github.com/OFA-Sys/gsm8k-ScRel/tree/main). Some of our codes are based on them.\r\n\r\n\u003ch2 id=\"citation\"\u003eCitation\u003c/h2\u003e\r\nPlease cite the paper if you refer to our model, code, data or paper from MetaMath.\r\n\r\n```\r\n@article{yu2023metamath,\r\n  title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},\r\n  author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},\r\n  journal={arXiv preprint arXiv:2309.12284},\r\n  year={2023}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-math%2FMetaMath","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeta-math%2FMetaMath","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-math%2FMetaMath/lists"}