Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/meta-math/MetaMath

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https://github.com/meta-math/MetaMath

Last synced: about 1 month ago
JSON representation

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Awesome Lists containing this project

README

        

# MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](CODE_LICENSE)
[![Model Weight License](https://img.shields.io/badge/Model%20Weights%20License-LLaMA2-yellow)](MetaMath/LICENSE)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)


🤗 HF Repo • 📃 [MetaMath]


MetaMath

## News
- 🔥 Our **MetaMath-Llemma-7B** model achieves **30.0 pass@1** on the MATH Benchmarks, surpassing all the SOTA open-source LLM in 7B-13B scales! All the training scripts and the model are opened.
- 🔥 Our **MetaMath-Mistral-7B** model achieves **77.7 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), surpassing all the SOTA open-source LLM! All the training scripts and the model are opened.
- 🔥 The full **MetaMathQA** dataset is now released in the huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)!
- 🔥 We released the GSM8K_Backward dataset is also released in the huggingface [GSM8K_Backward](https://huggingface.co/datasets/meta-math/GSM8K_Backward) to evaluate the reversal mathematical reasoning ability!
- 🔥 Although the data augmentation for **MetaMathQA** is sourced from **ChatGPT 3.5**, Our **MetaMath-70B** model outperforms the closed-source LLMs **ChatGPT 3.5** on the GSM8K!
- 🔥 Our **MetaMath-7B** model achieves **66.5 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), **11.6** points higher than the SOTA open-source LLM!
- 🔥 Our **MetaMath-7B** model achieves **19.8 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), **9.1** points higher than the SOTA open-source LLM!

| Model | Checkpoint | Paper | GSM8k | MATH | License|
| ----- |------| ---- |------|-------| ----- |
| MetaMath-70B-V1.0 | 🤗 HF Link | 📃 [MetaMath]| **82.3** | **26.6** | Llama 2 |
| MetaMath-13B-V1.0 | 🤗 HF Link | 📃 [MetaMath]| **72.3** | **22.4** | Llama 2 |
| MetaMath-7B-V1.0 | 🤗 HF Link | 📃 [MetaMath]| **66.5** | **19.8** | Llama 2 |
| MetaMath-Mistral-7B | 🤗 HF Link | 📃 [MetaMath]| **77.7** | **28.2** | Apache License 2.0 |
| MetaMath-Llemma-7B | 🤗 HF Link | 📃 [MetaMath]| **69.2** | **30.0** | Apache License 2.0 |

## Comparing MetaMath with the LLM models.

🔥 Comprehensive Results

| Model | GSM8k Pass@1 | MATH Pass@1 |
|---------------------|--------------|-------------|
| MPT-7B | 6.8 | 3.0 |
| Falcon-7B | 6.8 | 2.3 |
| LLaMA-1-7B | 11.0 | 2.9 |
| LLaMA-2-7B | 14.6 | 2.5 |
| MPT-30B | 15.2 | 3.1 |
| LLaMA-1-13B | 17.8 | 3.9 |
| GPT-Neo-2.7B | 19.5 | -- |
| Falcon-40B | 19.6 | 2.5 |
| Baichuan-chat-13B | 23.9 | -- |
| Vicuna-v1.3-13B | 27.6 | -- |
| LLaMA-2-13B | 28.7 | 3.9 |
| InternLM-7B | 31.2 | -- |
| ChatGLM-2-6B | 32.4 | -- |
| GPT-J-6B | 34.9 | -- |
| LLaMA-1-33B | 35.6 | 3.9 |
| LLaMA-2-34B | 42.2 | 6.24 |
| RFT-7B | 50.3 | -- |
| LLaMA-1-65B | 50.9 | 10.6 |
| Qwen-7B | 51.6 | -- |
| WizardMath-7B | 54.9 | 10.7 |
| LLaMA-2-70B | 56.8 | 13.5 |
| WizardMath-13B | 63.9 | 14.0 |
| 🔥 MetaMath-7B | **66.5** | **19.8** |
| 🔥 MetaMath-13B | **72.3** | **22.4** |
| 🔥 MetaMath-Mistral-7B | **77.7** | **28.2** |
| 🔥 MetaMath-Llemma-7B | **69.2** | **30.0** |
| WizardMath-70B | 81.6 | 22.7 |
| 🔥 MetaMath-70B | **82.3** | **26.6** |

Quick Start

Clone Metamath and install the required packages:

```bash
git clone https://github.com/meta-math/MetaMath.git
cd MetaMath
pip install -r requirements.txt
```

If you encounter a Ray installation problem, please run:

```bash
pip install --upgrade ray
pip install --upgrade pyarrow
pip install pandas
```

Dataset Usage

Run the following command to load the data:

```python
from datasets import load_dataset
dataset = load_dataset("meta-math/MetaMathQA")
```

Training

you need to prepare the llama-2 base model and our **MetaMathQA** dataset huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)

```
bash run.sh
```
or

```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nproc_per_node=8 --use_env train_math.py \
--model_name_or_path "meta-llama/Llama-2-7b-hf" \
--data_path "path/to/metamathqa" \
--data_length 10000000 \
--bf16 True \
--output_dir "path/to/save" \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 2 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True
```

### Supervised fine-tuning

We supervised fine-tune MetaMath-7B with the following hyperparameters:

| Hyperparameter | LLaMA 2 7B |
|----------------|-------------|
| Batch size | 128 |
| Learning rate | 2e-5 |
| Epochs | 3 |
| Max length | 512 |
| LR scheduler | cosine |

Evaluation

we use the vllm to help the fast generation:

```
python eval_gsm8k.py --model "path/to/save" --data_file ./data/test/GSM8K_test.jsonl
python eval_math.py --model "path/to/save" --data_file ./data/test/MATH_test.jsonl
```
where the "path/to/save" should be replaced by the finetuned model, you can also download our series of MetaMath models in huggingface:
🤗 MetaMath 7B 🤗 MetaMath 13B 🤗 MetaMath 70B

The inference prompt for our MetaMath is:
```
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: Let's think step by step."
```

Thanks for the open source code of [WizardMath](https://github.com/nlpxucan/WizardLM/tree/main/WizardMath) and [RFT](https://github.com/OFA-Sys/gsm8k-ScRel/tree/main). Some of our codes are based on them.

Citation


Please cite the paper if you refer to our model, code, data or paper from MetaMath.

```
@article{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
journal={arXiv preprint arXiv:2309.12284},
year={2023}
}
```