https://github.com/tencentarc/llama-pro

[ACL 2024] Progressive LLaMA with Block Expansion.
https://github.com/tencentarc/llama-pro

llama llama2 llm

Last synced: about 1 year ago
JSON representation

[ACL 2024] Progressive LLaMA with Block Expansion.

Host: GitHub
URL: https://github.com/tencentarc/llama-pro
Owner: TencentARC
License: apache-2.0
Created: 2024-01-02T08:57:19.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-05-20T04:51:44.000Z (about 2 years ago)
Last Synced: 2024-10-29T17:12:22.717Z (over 1 year ago)
Topics: llama, llama2, llm
Language: Python
Homepage: https://tencentarc.github.io/LLaMA-Pro/
Size: 80.7 MB
Stars: 476
Watchers: 20
Forks: 35
Open Issues: 22
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          #    LLaMA Pro: Progressive LLaMA with Block Expansion



📃 Paper • 🤗 Demo & Model 



## News

* [2024/01/06] We open source the [LLaMA-Pro repository](https://github.com/TencentARC/LLaMA-Pro) and [Demo & Model](https://huggingface.co/TencentARC/LLaMA-Pro-8B). 

* [2024/01/07] Add how to run gradio demo locally in [demo](./demo/app.py)

* [2024/01/18] Add the training code in [open-instruct](https://github.com/hills-code/open-instruct/tree/llama-pro).

* [2024/02/23] We release the [Mistral-Pro-8B-v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1) with superior performance on a range of benchmarks. It enhances the code and math performance of Mistral and matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).

![assets/mistral_pro_performance.png](assets/mistral_pro_performance.png)

* [2024/02/23] We release the evaluation code of [Mistral-Pro-8B-v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1) in [lm-evaluation-harness](https://github.com/hills-code/lm-evaluation-harness).

* [2024/02/23] We release [MetaMath-Mistral-Pro](https://huggingface.co/TencentARC/MetaMath-Mistral-Pro) that surpasses previous MetaMath series 7B models at both GSM8k and MATH. The evaluation is following [the official MetaMath repo](https://github.com/meta-math/MetaMath).

* [2024/05/08] Add the pre-train example script for cosmopedia in [open-instruct](https://github.com/hills-code/open-instruct/tree/llama-pro).

* [2024/05/16] [LLaMA Pro](https://arxiv.org/abs/2401.02415) has been accepted to the main conference of ACL 2024!

🔥 Comprehensive Results

| Model               | GSM8k Pass@1 | MATH Pass@1 |

|---------------------|--------------|-------------|

| WizardMath-7B       | 54.9         | 10.7        |

| LLaMA-2-70B         | 56.8         | 13.5        |

| WizardMath-13B      | 63.9         | 14.0        |

| MetaMath-7B         | 66.5     | 19.8    |

| MetaMath-13B        | 72.3     | 22.4    |

| MetaMath-Mistral-7B | 77.7     | 28.2    |

| MetaMath-Llemma-7B  | 69.2     | 30.0    |

| 🔥 **MetaMath-Mistral-Pro** | **78.4**     | **30.3**        |

## Acknowledgement

The code of instruction tuning is based on the official implementation of [open-instruct](https://github.com/allenai/open-instruct).

Thanks [huggingface](https://huggingface.co/TencentARC/LLaMA-Pro-8B) & [wisemodel](https://wisemodel.cn/models/TencentARC/LLaMA-Pro-8B) for hosting our checkpoint.

## Citation

The code and model in this repository is mostly developed for or derived from the paper below. Please cite it if you find the repository helpful.

```

@article{wu2024llama,

  title={Llama pro: Progressive llama with block expansion},

  author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},

  journal={arXiv preprint arXiv:2401.02415},

  year={2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tencentarc/llama-pro

Awesome Lists containing this project

README