{"id":13437559,"url":"https://github.com/nlpxucan/WizardLM","last_synced_at":"2025-03-19T06:31:28.976Z","repository":{"id":155089712,"uuid":"631580207","full_name":"nlpxucan/WizardLM","owner":"nlpxucan","description":"LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath","archived":false,"fork":false,"pushed_at":"2024-08-05T08:31:05.000Z","size":12051,"stargazers_count":9357,"open_issues_count":168,"forks_count":730,"subscribers_count":110,"default_branch":"main","last_synced_at":"2025-03-18T23:41:58.635Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nlpxucan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-23T13:26:46.000Z","updated_at":"2025-03-18T09:01:21.000Z","dependencies_parsed_at":"2024-01-04T14:37:48.592Z","dependency_job_id":"fee3c3b3-f9cb-4d32-9fbf-afb5ba805b64","html_url":"https://github.com/nlpxucan/WizardLM","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nlpxucan%2FWizardLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nlpxucan%2FWizardLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nlpxucan%2FWizardLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nlpxucan%2FWizardLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nlpxucan","download_url":"https://codeload.github.com/nlpxucan/WizardLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244371238,"owners_count":20442344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:00:58.288Z","updated_at":"2025-03-19T06:31:28.970Z","avatar_url":"https://github.com/nlpxucan.png","language":"Python","funding_links":[],"categories":["Python","Statistics","🔥 2024-2025 Trending Models","Open Source LLM","A01_文本生成_文本对话","Other my awesome lists","others","HarmonyOS","🧠 Large Language Models (LLMs)","💻 Software for Large Language Models","Projekte","Uncategorized","🧠 AI Code Models","Specialized Models"],"sub_categories":["🚀 Specialized Models","大语言对话模型及数据","Local / Self-hosted","Windows Manager","🔬 Research \u0026 Experimental","Other Cloud Provider Credits","🦄 LLMs","Uncategorized","By Deployment Model"],"readme":"## WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions\n\n\u003cp style=\"font-size:50px;\" align=\"center\"\u003e\n🏠 \u003ca href=\"https://wizardlm.github.io/\" target=\"_blank\"\u003eHome Page\u003c/a\u003e \u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \n\u003cp align=\"center\"\u003e\n🤗 \u003ca href=\"https://huggingface.co/WizardLMTeam\" target=\"_blank\"\u003eHF Repo\u003c/a\u003e • 🐦 \u003ca href=\"https://twitter.com/WizardLM_AI\" target=\"_blank\"\u003eTwitter\u003c/a\u003e • 📃 \u003ca href=\"https://arxiv.org/abs/2304.12244\" target=\"_blank\"\u003e[WizardLM] @ICLR2024\u003c/a\u003e  • 📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder] @ICLR2024\u003c/a\u003e    • 📃 \u003ca href=\"https://arxiv.org/abs/2308.09583\" target=\"_blank\"\u003e[WizardMath]\u003c/a\u003e \u003cbr\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    👋 Join our \u003ca href=\"https://discord.gg/VZjjHtWrKs\" target=\"_blank\"\u003eDiscord\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca \u003e\u003cimg src=\"imgs/WizardLM.png\" alt=\"WizardLM\" style=\"width: 20%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)\n[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)\n\n**Unofficial Video Introductions**\n\nThanks to the enthusiastic friends, their video introductions are more lively and interesting.\n1. [NEW WizardLM 70b 🔥 Giant Model...Insane Performance](https://www.youtube.com/watch?v=WdpiIXrO4_o)\n2. [GET WizardLM NOW! 7B LLM KING That Can Beat ChatGPT! I'm IMPRESSED!](https://www.youtube.com/watch?v=SaJ8wyKMBds)\n3. [WizardLM: Enhancing Large Language Models to Follow Complex Instructions](https://www.youtube.com/watch?v=I6sER-qivYk)\n4. [WizardCoder AI Is The NEW ChatGPT's Coding TWIN!](https://www.youtube.com/watch?v=XjsyHrmd3Xo)\n\n## News\n\n- 🔥🔥🔥[2024/01/04] We released **WizardCoder-33B-V1.1**  trained from deepseek-coder-33b-base, the **SOTA OSS Code LLM** on [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html), achieves **79.9 pass@1** on HumanEval, **73.2 pass@1** on HumanEval-Plus, **78.9 pass@1** on MBPP, and **66.9 pass@1** on MBPP-Plus. **WizardCoder-33B-V1.1** outperforms **ChatGPT 3.5**, **Gemini Pro**, and **DeepSeek-Coder-33B-instruct** on HumanEval and HumanEval-Plus pass@1. **WizardCoder-33B-V1.1** is comparable with **ChatGPT 3.5**, and surpasses **Gemini Pro** on MBPP and MBPP-Plus pass@1.\n- [2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). For more details, please refer to [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder).\n- [2023/06/16] We released **WizardCoder-15B-V1.0** , which surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). For more details, please refer to [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder).\n\n\n|  Model  |  Checkpoint  | Paper    | HumanEval  |   HumanEval+ | MBPP | MBPP+ |\n| ----- |------| ---- |------|-------| ----- |  ----- |\n|  GPT-4-Turbo (Nov 2023)  | - | - | 85.4  | 81.7 | 83.0 | 70.7 |\n|  GPT-4 (May 2023)  | - | - | 88.4  | 76.8 | - | - |\n|  GPT-3.5-Turbo (Nov 2023)  | - | - | 72.6  | 65.9 | 81.7 | 69.4 |\n|  Gemini Pro  | - | - | 63.4  | 55.5 | 72.9 | 57.9 |\n|  DeepSeek-Coder-33B-instruct | - | - |  78.7 | 72.6 | 78.7 | 66.7 |\n|  WizardCoder-33B-V1.1  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-33B-V1.1\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  79.9  | 73.2 | 78.9 | 66.9 |\n|  WizardCoder-Python-34B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  73.2   | 64.6 | 73.2 | 59.9 |\n|  WizardCoder-15B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-15B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  59.8   | 52.4 | -- | -- |\n|  WizardCoder-Python-13B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  64.0   | -- | -- | -- |\n|  WizardCoder-Python-7B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  55.5   | -- | -- | -- |\n|  WizardCoder-3B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-3B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  34.8   | -- | -- | -- |\n|  WizardCoder-1B-V1.0  |   🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardCoder-1B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e   |  📃 \u003ca href=\"https://arxiv.org/abs/2306.08568\" target=\"_blank\"\u003e[WizardCoder]\u003c/a\u003e  |  23.8   | -- | -- | -- |\n\n\n\n\n- [12/19/2023] 🔥 We released **WizardMath-7B-V1.1** trained from Mistral-7B, the **SOTA 7B math LLM**, achieves **83.2 pass@1** on GSM8k, and **33.0 pass@1** on MATH.\n\n- [12/19/2023] 🔥 **WizardMath-7B-V1.1** outperforms **ChatGPT 3.5**, **Gemini Pro**, **Mixtral MOE**, and **Claude Instant** on GSM8K pass@1.\n\n- [12/19/2023] 🔥 **WizardMath-7B-V1.1** is comparable with **ChatGPT 3.5**, **Gemini Pro**, and surpasses **Mixtral MOE** on MATH pass@1.\n\n\n- 🔥 Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.\n- 🔥 Our **WizardMath-70B-V1.0** model achieves  **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM.\n- 🔥 Our **WizardMath-70B-V1.0** model achieves  **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.\n\n| Model | Checkpoint | Paper  | GSM8k | MATH  |\n| ----- |------| ---- |------|-------| \n| **WizardMath-7B-V1.1** | 🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardMath-7B-V1.1\" target=\"_blank\"\u003eHF Link\u003c/a\u003e  |  📃 \u003ca href=\"https://arxiv.org/abs/2308.09583\" target=\"_blank\"\u003e[WizardMath]\u003c/a\u003e| \t **83.2**  |  **33.0** | \n| WizardMath-70B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardMath-70B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e |  📃 \u003ca href=\"https://arxiv.org/abs/2308.09583\" target=\"_blank\"\u003e[WizardMath]\u003c/a\u003e| **81.6**  |  **22.7**\t|\n| WizardMath-13B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardMath-13B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e |  📃 \u003ca href=\"https://arxiv.org/abs/2308.09583\" target=\"_blank\"\u003e[WizardMath]\u003c/a\u003e| **63.9**  |  **14.0** |\n| WizardMath-7B-V1.0 | 🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardMath-7B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e  |  📃 \u003ca href=\"https://arxiv.org/abs/2308.09583\" target=\"_blank\"\u003e[WizardMath]\u003c/a\u003e| \t **54.9**  |  **10.7** |    \n\n\n- [08/09/2023] We released **WizardLM-70B-V1.0** model. Here is [Full Model Weight](https://huggingface.co/WizardLM/WizardLM-70B-V1.0). \n\n\u003cfont size=0.5\u003e\n    \n   \n| \u003csup\u003eModel\u003c/sup\u003e | \u003csup\u003eCheckpoint\u003c/sup\u003e | \u003csup\u003ePaper\u003c/sup\u003e |\u003csup\u003eMT-Bench\u003c/sup\u003e | \u003csup\u003eAlpacaEval\u003c/sup\u003e  | \u003csup\u003eGSM8k\u003c/sup\u003e | \u003csup\u003eHumanEval\u003c/sup\u003e  | \u003csup\u003eDemo\u003c/sup\u003e  | \u003csup\u003eLicense\u003c/sup\u003e|\n| ----- |------| ---- |------|-------| ----- | ----- | ----- | ----- | \n| \u003csup\u003e**WizardLM-70B-V1.0**\u003c/sup\u003e | \u003csup\u003e🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-70B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e \u003c/sup\u003e|\u003csup\u003e📃**Coming Soon**\u003c/sup\u003e| \u003csup\u003e**7.78**\u003c/sup\u003e | \u003csup\u003e**92.91%**\u003c/sup\u003e\t |\u003csup\u003e**77.6%**\u003c/sup\u003e\t | \u003csup\u003e   **50.6**\u003c/sup\u003e| |\u003csup\u003e \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/llama-downloads/\" target=\"_blank\"\u003eLlama 2 License \u003c/a\u003e\u003c/sup\u003e |\n| \u003csup\u003eWizardLM-13B-V1.2\u003c/sup\u003e | \u003csup\u003e🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-13B-V1.2\" target=\"_blank\"\u003eHF Link\u003c/a\u003e \u003c/sup\u003e|  | \u003csup\u003e7.06\u003c/sup\u003e | \u003csup\u003e89.17%\u003c/sup\u003e\t |\u003csup\u003e55.3%\u003c/sup\u003e\t | \u003csup\u003e36.6   \u003c/sup\u003e| [Demo](http://47.103.63.15:50087/) |\u003csup\u003e \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/llama-downloads/\" target=\"_blank\"\u003eLlama 2 License \u003c/a\u003e\u003c/sup\u003e |\n| \u003csup\u003eWizardLM-13B-V1.1\u003c/sup\u003e |\u003csup\u003e 🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-13B-V1.1\" target=\"_blank\"\u003eHF Link\u003c/a\u003e \u003c/sup\u003e |  | \u003csup\u003e6.76\u003c/sup\u003e  |\u003csup\u003e86.32%\u003c/sup\u003e\t | \t | \u003csup\u003e25.0   \u003c/sup\u003e|  | \u003csup\u003eNon-commercial\u003c/sup\u003e|\n| \u003csup\u003eWizardLM-30B-V1.0\u003c/sup\u003e | \u003csup\u003e🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-30B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e\u003c/sup\u003e  | | \u003csup\u003e7.01\u003c/sup\u003e |                    | |  \u003csup\u003e37.8  \u003c/sup\u003e|  | \u003csup\u003eNon-commercial\u003c/sup\u003e |\n| \u003csup\u003eWizardLM-13B-V1.0\u003c/sup\u003e | \u003csup\u003e🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-13B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e \u003c/sup\u003e |  | \u003csup\u003e6.35\u003c/sup\u003e | \u003csup\u003e75.31%\u003c/sup\u003e |  | \u003csup\u003e 24.0   \u003c/sup\u003e |  | \u003csup\u003eNon-commercial\u003c/sup\u003e|\n| \u003csup\u003eWizardLM-7B-V1.0 \u003c/sup\u003e|  \u003csup\u003e🤗 \u003ca href=\"https://huggingface.co/WizardLM/WizardLM-7B-V1.0\" target=\"_blank\"\u003eHF Link\u003c/a\u003e \u003c/sup\u003e |\u003csup\u003e 📃 \u003ca href=\"https://arxiv.org/abs/2304.12244\" target=\"_blank\"\u003e[WizardLM]\u003c/a\u003e \u003c/sup\u003e|  |  |  |\u003csup\u003e19.1 \u003c/sup\u003e|  | \u003csup\u003e Non-commercial\u003c/sup\u003e|\n\u003c/font\u003e\n\n### Citation\n\nPlease cite the paper if you use the data or code from WizardLM.\n\n```\n@inproceedings{\nxu2024wizardlm,\ntitle={Wizard{LM}: Empowering Large Pre-Trained Language Models to Follow Complex Instructions},\nauthor={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Qingwei Lin and Daxin Jiang},\nbooktitle={The Twelfth International Conference on Learning Representations},\nyear={2024},\nurl={https://openreview.net/forum?id=CfXh93NDgH}\n}\n```\nPlease cite the paper if you use the data or code from WizardCoder.\n\n```\n@inproceedings{\nluo2024wizardcoder,\ntitle={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},\nauthor={Ziyang Luo and Can Xu and Pu Zhao and Qingfeng Sun and Xiubo Geng and Wenxiang Hu and Chongyang Tao and Jing Ma and Qingwei Lin and Daxin Jiang},\nbooktitle={The Twelfth International Conference on Learning Representations},\nyear={2024},\nurl={https://openreview.net/forum?id=UnUwSIgK5W}\n}\n```\n\nPlease cite the paper if you refer to our model or code or data or paper from WizardMath.\n\n```\n@article{luo2023wizardmath,\n  title={WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct},\n  author={Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jianguang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Zhang, Dongmei},\n  journal={arXiv preprint arXiv:2308.09583},\n  year={2023}\n}\n```\n\n\n❗To commen concern about dataset:\n\nRecently, there have been clear changes in the open-source policy and regulations of our overall organization's code, data, and models.\nDespite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team .\nOur researchers have no authority to publicly release them without authorization.\nThank you for your understanding.\n\n## Hiring\n\n- \u0026#x1F4E3; We are looking for highly motivated students to join us as interns to create more intelligent AI together. Please contact caxu@microsoft.com\n\n\u003c!-- Although on our **complexity-balanced test set**, **WizardLM-7B has more cases that are preferred by human labelers than ChatGPT** in the high-complexity instructions (difficulty level \u003e= 8), it still lags behind ChatGPT on the entire test set, and we also consider WizardLM to still be in a **baby state**. This repository will **continue to improve WizardLM**, train on larger scales, add more training data, and innovate more advanced large-model training methods. --\u003e\n\n\n\u003cb\u003eNote for model system prompts usage:\u003c/b\u003e\n\nTo obtain results **identical to our demo**, please strictly follow the prompts and invocation methods provided in the **\"src/infer_wizardlm13b.py\"** to use our model for inference. Our model adopts the prompt format from \u003cb\u003eVicuna\u003c/b\u003e and supports **multi-turn** conversation.\n\n\u003cb\u003eFor WizardLM\u003c/b\u003e, the Prompt should be as following:\n\n```\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.\u003c/s\u003eUSER: Who are you? ASSISTANT: I am WizardLM.\u003c/s\u003e......\n```\n\n\u003cb\u003eFor WizardCoder \u003c/b\u003e, the Prompt should be as following:\n\n```\n\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction}\\n\\n### Response:\"\n```\n\n\u003cb\u003eFor WizardMath\u003c/b\u003e, the Prompts should be as following:\n\n**Default version:**\n\n```\n\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction}\\n\\n### Response:\"\n```\n\n\n**CoT Version:** （❗For the **simple** math questions, we do NOT recommend to use the CoT prompt.） \n\n\n```\n\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction}\\n\\n### Response: Let's think step by step.\"\n```\n\n### GPT-4 automatic evaluation\n\nWe adopt the automatic evaluation framework based on GPT-4 proposed by FastChat to assess the performance of chatbot models. As shown in the following figure, WizardLM-30B achieved better results than Guanaco-65B. \n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca \u003e\u003cimg src=\"imgs/WizarLM30b-GPT4.png\" alt=\"WizardLM\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n### WizardLM-30B performance on different skills.\n\nThe following figure compares WizardLM-30B and ChatGPT’s skill on Evol-Instruct testset. The result indicates that WizardLM-30B achieves 97.8% of ChatGPT’s performance on average, with almost 100% (or more than) capacity on 18 skills, and more than 90% capacity on 24 skills.\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca \u003e\u003cimg src=\"imgs/evol-testset_skills-30b.png\" alt=\"WizardLM\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n### WizardLM performance on NLP foundation tasks.\n\nThe following table provides a comparison of WizardLMs and other LLMs on NLP foundation tasks. The results indicate that WizardLMs consistently exhibit superior performance in comparison to the LLaMa models of the same size. Furthermore, our WizardLM-30B model showcases comparable performance to OpenAI's Text-davinci-003 on the MMLU and HellaSwag benchmarks.\n\n| Model            | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average    |\n|------------------|-------------|-------------|-------------------|-------------------|------------|\n| Text-davinci-003 | \u003cu\u003e56.9\u003cu/\u003e | **85.2**    | **59.3**          | \u003cu\u003e82.2\u003cu/\u003e       | **70.9**   |\n|Vicuna-13b 1.1   | 51.3        | 53.0        | 51.8              | 80.1              | 59.1       |\n|Guanaco 30B   | 57.6        | 63.7        | 50.7              | **85.1**              | 64.3       |   \n| WizardLM-7B 1.0      | 42.7        | 51.6        | 44.7              | 77.7              | 54.2       |\n| WizardLM-13B 1.0     | 52.3        | 57.2        | 50.5              | 81.0              | 60.2       |\n| WizardLM-30B 1.0    | **58.8**    | \u003cu\u003e62.5\u003cu/\u003e | \u003cu\u003e52.4\u003cu/\u003e       | 83.3          | \u003cu\u003e64.2\u003cu/\u003e|\n\n### WizardLM performance on code generation.\n\nThe following table provides a comprehensive comparison of WizardLMs and several other LLMs on the code generation task, namely HumanEval. The evaluation metric is pass@1. The results indicate that WizardLMs consistently exhibit superior performance in comparison to the LLaMa models of the same size. Furthermore, our WizardLM-30B model surpasses StarCoder and OpenAI's code-cushman-001. Moreover, our Code LLM, WizardCoder, demonstrates exceptional performance, achieving a pass@1 score of 57.3, surpassing the open-source SOTA by approximately 20 points.\n\n\n| Model            | HumanEval Pass@1 |\n|------------------|------------------|\n| LLaMA-7B         | 10.5             |\n| LLaMA-13B        | 15.8             |\n| CodeGen-16B-Multi| 18.3             |\n| CodeGeeX         | 22.9             |\n| LLaMA-33B        | 21.7             |\n| LLaMA-65B        | 23.7             |\n| PaLM-540B        | 26.2             |\n| CodeGen-16B-Mono | 29.3             |\n| code-cushman-001 | 33.5             |\n| StarCoder        | \u003cu\u003e33.6\u003cu/\u003e      |\n| WizardLM-7B 1.0      | 19.1             |\n| WizardLM-13B 1.0     | 24.0             |\n| WizardLM-30B  1.0   | **37.8**         |\n| WizardCoder-15B  1.0 | **57.3**     |\n\n## Call for Feedbacks\nWe welcome everyone to use your professional and difficult instructions to evaluate WizardLM, and show us examples of poor performance and your suggestions in the [issue discussion](https://github.com/nlpxucan/WizardLM/issues) area. We are focusing on improving the Evol-Instruct now and hope to relieve existing weaknesses and issues in the the next version of WizardLM. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it.\n\n\n\n## Overview of Evol-Instruct\n\n[Evol-Instruct](https://github.com/nlpxucan/evol-instruct) is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs. You can easily embark on your own evolutionary journey with the [Evol Script](https://github.com/nlpxucan/WizardLM/tree/main/Evol-Instruct) we provide.\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca \u003e\u003cimg src=\"imgs/git_overall.png\" alt=\"WizardLM\" style=\"width: 86%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca \u003e\u003cimg src=\"imgs/git_running.png\" alt=\"WizardLM\" style=\"width: 86%; min-width: 300px; display: block; margin: auto;\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## Disclaimer\n\nThe resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes. The content produced by any version of WizardLM is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=nlpxucan/WizardLM\u0026type=Timeline)](https://star-history.com/#nlpxucan/WizardLM\u0026Timeline)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnlpxucan%2FWizardLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnlpxucan%2FWizardLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnlpxucan%2FWizardLM/lists"}