{"id":13907535,"url":"https://github.com/TIGER-AI-Lab/MAmmoTH","last_synced_at":"2025-07-18T05:32:56.359Z","repository":{"id":193891893,"uuid":"687815603","full_name":"TIGER-AI-Lab/MAmmoTH","owner":"TIGER-AI-Lab","description":"Code and data for \"MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning\" [ICLR 2024]","archived":false,"fork":false,"pushed_at":"2024-08-25T03:57:55.000Z","size":23307,"stargazers_count":374,"open_issues_count":6,"forks_count":49,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-06-13T07:08:14.120Z","etag":null,"topics":["llm","math","reasoning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TIGER-AI-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-09-06T04:12:15.000Z","updated_at":"2025-06-07T01:42:09.000Z","dependencies_parsed_at":"2023-09-10T16:43:57.787Z","dependency_job_id":"7cc6669a-4b1c-4256-b1a1-755543e07ee4","html_url":"https://github.com/TIGER-AI-Lab/MAmmoTH","commit_stats":null,"previous_names":["tiger-ai-lab/mammoth"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TIGER-AI-Lab/MAmmoTH","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FMAmmoTH","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FMAmmoTH/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FMAmmoTH/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FMAmmoTH/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TIGER-AI-Lab","download_url":"https://codeload.github.com/TIGER-AI-Lab/MAmmoTH/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FMAmmoTH/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265705433,"owners_count":23814454,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","math","reasoning"],"created_at":"2024-08-06T23:01:58.919Z","updated_at":"2025-07-18T05:32:53.495Z","avatar_url":"https://github.com/TIGER-AI-Lab.png","language":"Jupyter Notebook","funding_links":[],"categories":["HarmonyOS","Building","A01_文本生成_文本对话"],"sub_categories":["Windows Manager","LLM Models","大语言对话模型及数据"],"readme":"\n# **MAmmoTH** 🦣\nThis repo contains the code, data, and models for \"[MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning](https://arxiv.org/pdf/2309.05653.pdf)\". Our paper was accepted to ICLR 2024 as spotlight.\n\n\u003cdiv align=\"center\"\u003e\n 🔥 🔥 🔥 Check out our \u003ca href = \"https://tiger-ai-lab.github.io/MAmmoTH/\"\u003e[Project Page]\u003c/a\u003e for more results and analysis!\n\u003c/div\u003e\n\n\u003cbr\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"mammoth_github.png\" width=\"80%\" title=\"Introduction Figure\"\u003e\n\u003c/div\u003e\n\n### Datasets and Models\nOur dataset and models are all available at Huggingface.\n\n🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct)\n\n|     \t| Base Model: Llama-2                                           \t| Base Model: Code Llama                                                    \t| Base Model: Mistral | \n|-----\t|---------------------------------------------------------------\t|---------------------------------------------------------------------------\t|---------------------|\n| 7B  \t| 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B)   \t| 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B)   | 🦣 [MAmmoTH-7B-Mistral](https://huggingface.co/TIGER-Lab/MAmmoTH-7B-Mistral) |\n| 13B \t| 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) \t| 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B) |                    |\n| 34B \t| -                                                             \t| 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B) \t|                    |\n| 70B \t| 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) \t| -                                                                         \t|                    |\n\n## **What's New?**\n\n- [Dec. 4] We add the training and evaluation of MAmmoTH-7B-Mistral, which improves significantly over the LLaMA-2 version. We also have better support for vllm. \n- [Oct. 10] We update our decoding method to hybrid decoding: first try PoT to generate a program, if it is not excutable, we will regenerate a CoT solution as the final answer. This hybrid decoding method improves the peformance significantly. Check our updated paper Appendix for more details. \n\n## Highlights\nWe demonstrate the results of our small MAmmoTH-7B-Mistral as follows:\n\n| **Model**             \t| **Decoding** \t| **GSM**  \t| **MATH** \t| **MMLU-Math** |\n|---------------------------|---------------|-----------|-----------|-----------|\n| MAmmoTH-7B             \t| **Hybrid**   \t| 53.6  \t| 31.5 \t    | 44.5   \t|\n| MAmmoTH-Coder-7B  \t    | **Hybrid**   \t| 59.4  \t| 33.4  \t| 47.2  \t|\n| MetaMath-7B-Mistral       | **CoT**   \t| 77.7  \t| 28.2 \t    | 49.3      |\n| OpenChat-3.5-7B           | **CoT**   \t| 77.3 \t    | 28.6 \t    | 49.6      |\n| ChatGLM-3-6B              | **CoT**       | 72.3      | 25.7      | 45.6      | \n| DeepSeek-Coder-34B        | **PoT**   \t| 58.2   \t| 35.3 \t    | 46.5      |\n| Grok-1                    | **CoT**       | 62.9      | 15.7      | -         |\n| QWen-72B                  | **CoT**       | 78.9      | 35.2      | -         |\n| DeepSeek-67B-Chat         | **CoT**       | **84.1**  | 32.6      | -         |\n| MAmmoTH-7B-Mistral  \t    | **Hybrid**   \t| 75.0   \t| **40.0** \t| **52.5**  |\n\n## **Table of Contents**\n\n- [📌 Introduction](#introduction)\n- [⚙️ Installation](#installation)\n- [🛠️ Training and Inference](#training-and-inference)\n- [📜 License](#license)\n- [📖 Citation](#citation)\n\n## **Introduction**\nWe introduce MAmmoTH 🦣, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields. \n## **Installation**\n\nClone this repository and install the required packages:\n\n```bash\ngit clone https://github.com/TIGER-AI-Lab/MAmmoTH.git\ncd MAmmoTH\npip install -r requirements.txt\n```\n\n## **Training and Inference**\n\n### **Data Loading**\n\nRun the following command to preprocess the data:\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"TIGER-Lab/MathInstruct\")\n```\n\n### **Quick Start**\nTo play with our model, run:\n\n```python\nfrom transformers import pipeline\npipeline = pipeline(\"text-generation\", \"TIGER-Lab/MAmmoTH-Coder-7B\")\n\nalpaca_template = \"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n### Instruction:\\n{query}\\n\\n### Response:\"\n\nquery = \"Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\"\n\n### By default, MAmmoTH will output the Chain-of-thought (CoT) rationale\nrationale_prefix = \"\"\n\n### You can let MAmmoTH output Program-of-thought (PoT) rationale by simply adding\nrationale_prefix = \" Let's write a program.\"\n\ninput = alpaca_template.format(query = query + rationale_prefix)\n\noutput = pipeline(input)[0]['generated_text']\nprint(output)\n```\n\n### **Large-scale Evaluation**\n\nTo replicate the experimental results in our paper, run:\n\n```bash\n### For open-eneded questions, the dataset should be one of \n### ['gsm8k', 'svamp', 'math', 'numglue', 'deepmind', 'simuleq'] \n### We first try PoT and if the generated program is not executable, we shift to CoT\n\ndataset='math'\n\npython run_open.py \\\n  --model \"TIGER-Lab/MAmmoTH-7B-Mistral\" \\\n  --shots 0 \\\n  --stem_flan_type \"pot_prompt\" \\\n  --batch_size 8 \\\n  --dataset $dataset \\\n  --model_max_length 1500 \\\n  --cot_backup \\\n  --print \\\n  --use_vllm\n```\n\nIf you want to run self-consistency with PoT/CoT with 10 ensembles.\n\n```bash\n### For open-eneded questions, the dataset should be one of \n### ['gsm8k', 'svamp', 'math', 'numglue', 'deepmind', 'simuleq'] \n### We first try PoT and if the generated program is not executable, we shift to CoT\ndataset='gsm8k'\n\npython run_open_sc.py \\\n  --model \"TIGER-Lab/MAmmoTH-7B-Mistral\" \\\n  --shots 0 \\\n  --stem_flan_type \"pot_prompt\" \\\n  --batch_size 8 \\\n  --dataset $dataset \\\n  --model_max_length 1500 \\\n  --num_samples 10 \\\n  --print\n```\n\n```bash\n### For mutilple-choice questions, the dataset should be one of \n### ['aqua', 'sat', 'mmlu_mathematics'].\n### We first try PoT and if the generated program is not executable, we shift to CoT\ndataset='aqua'\n\npython run_choice.py \\\n  --model \"TIGER-Lab/MAmmoTH-7B-Mistral\" \\\n  --shots 0 \\\n  --stem_flan_type \"pot_prompt\" \\\n  --batch_size 8 \\\n  --dataset $dataset \\\n  --cot_backup \\\n  --print\n```\n\n### **Fine-tuning**\n\nTo train the 7B/13B model, run:\n\n```bash\ntorchrun --nproc_per_node [$WORKER_GPU] \\\n --master_addr [$WORKER_0_HOST] \\\n --node_rank [$ROLE_INDEX] \\\n --master_port [$WORKER_0_PORT] \\\n --nnodes [$WORKER_NUM] \\\ntrain.py \\\n    --model_name_or_path \"codellama/CodeLlama-7b-hf\" \\\n    --data_path \"TIGER-Lab/MathInstruct\" \\\n    --bf16 True \\\n    --output_dir checkpoints/MAmmoTH-Coder-7B \\\n    --num_train_epochs 3 \\\n    --per_device_train_batch_size 2 \\\n    --per_device_eval_batch_size 1 \\\n    --gradient_accumulation_steps 8 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"steps\" \\\n    --save_steps 2000\\\n    --save_total_limit 1 \\\n    --learning_rate 2e-5 \\\n    --weight_decay 0. \\\n    --warmup_ratio 0.03 \\\n    --lr_scheduler_type \"cosine\" \\\n    --logging_steps 1 \\\n    --fsdp \"full_shard auto_wrap\" \\\n    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \\\n    --tf32 True\n```\n\nTo train the 34B/70B model, run:\n```bash\ntorchrun --nproc_per_node [$WORKER_GPU] \\\n --master_addr [$WORKER_0_HOST] \\\n --node_rank [$ROLE_INDEX] \\\n --master_port [$WORKER_0_PORT] \\\n --nnodes [$WORKER_NUM] \\\ntrain.py \\\n    --model_name_or_path \"codellama/CodeLlama-34b-hf\" \\\n    --data_path \"TIGER-Lab/MathInstruct\" \\\n    --bf16 True \\\n    --output_dir checkpoints/MAmmoTH-Coder-34B \\\n    --num_train_epochs 3 \\\n    --per_device_train_batch_size 1 \\\n    --per_device_eval_batch_size 1 \\\n    --gradient_accumulation_steps 2 \\\n    --evaluation_strategy \"no\" \\\n    --save_strategy \"epoch\" \\\n    --save_total_limit 1 \\\n    --learning_rate 1e-5 \\\n    --weight_decay 0. \\\n    --warmup_ratio 0.03 \\\n    --lr_scheduler_type \"cosine\" \\\n    --logging_steps 1 \\\n    --deepspeed \"ds_config/ds_config_zero3.json\" \\\n    --tf32 True\n```\n\n## Prompt Format\n\nIf you want to do CoT:\n```\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n```\n\nIf you want to do PoT:\n```\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction} Let's write a program.\n\n### Response:\n```\n\n## WebUI\nWe use [llama2-webui](https://github.com/liltom-eth/llama2-webui) as our ui bankend. To use webui for MammoTH run:\n```\npip install gradio\ncd webui/llama2-webui\npython3 mammoth.py --model_path your_model_path --backend_type transformers \n```\n\n\n\n## **License**\nPlease check out the license of each subset in our curated dataset MathInstruct.\n| Dataset Name \t| License Type   \t|\n|--------------\t|----------------\t|\n| GSM8K        \t| MIT            \t|\n| GSM8K-RFT    \t| Non listed      |\n| AQuA-RAT     \t| Apache 2.0     \t|\n| MATH         \t| MIT            \t|\n| TheoremQA    \t| MIT            \t|\n| Camel-Math   \t| Attribution-NonCommercial 4.0 International    \t|\n| NumGLUE      \t| Apache-2.0          \t|\n| CrowdSourced (Lila)\t| Attribution 4.0 International     \t|\n| MathQA       \t| Apache-2.0     \t|\n| Our Curated   | MIT             |\n\n\n## **Citation**\n\nPlease cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers. \n\n```\n@article{yue2023mammoth,\n  title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},\n  author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},\n  journal={arXiv preprint arXiv:2309.05653},\n  year={2023}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTIGER-AI-Lab%2FMAmmoTH","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTIGER-AI-Lab%2FMAmmoTH","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTIGER-AI-Lab%2FMAmmoTH/lists"}