{"id":14964615,"url":"https://github.com/internlm/xtuner","last_synced_at":"2025-10-30T20:56:13.281Z","repository":{"id":191515488,"uuid":"664913876","full_name":"InternLM/xtuner","owner":"InternLM","description":"An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)","archived":false,"fork":false,"pushed_at":"2025-05-07T10:15:05.000Z","size":2224,"stargazers_count":4532,"open_issues_count":249,"forks_count":341,"subscribers_count":35,"default_branch":"main","last_synced_at":"2025-05-08T21:15:09.507Z","etag":null,"topics":["agent","baichuan","chatbot","chatglm2","chatglm3","conversational-ai","internlm","large-language-models","llama2","llama3","llava","llm","llm-training","mixtral","msagent","peft","phi3","qwen","supervised-finetuning"],"latest_commit_sha":null,"homepage":"https://xtuner.readthedocs.io/zh-cn/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InternLM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-11T03:18:13.000Z","updated_at":"2025-05-08T16:48:07.000Z","dependencies_parsed_at":"2023-08-30T08:28:31.060Z","dependency_job_id":"25ed7b69-68cc-4580-8e7b-00871103677b","html_url":"https://github.com/InternLM/xtuner","commit_stats":{"total_commits":330,"total_committers":33,"mean_commits":10.0,"dds":0.5151515151515151,"last_synced_commit":"90192ffe42612b0f88409432e7b4860294432bcc"},"previous_names":["internlm/xtuner"],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2Fxtuner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2Fxtuner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2Fxtuner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InternLM%2Fxtuner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InternLM","download_url":"https://codeload.github.com/InternLM/xtuner/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253505644,"owners_count":21918941,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","baichuan","chatbot","chatglm2","chatglm3","conversational-ai","internlm","large-language-models","llama2","llama3","llava","llm","llm-training","mixtral","msagent","peft","phi3","qwen","supervised-finetuning"],"created_at":"2024-09-24T13:33:30.283Z","updated_at":"2025-10-30T20:56:13.274Z","avatar_url":"https://github.com/InternLM.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8\" width=\"600\"/\u003e\n  \u003cbr /\u003e\u003cbr /\u003e\n\n[![GitHub Repo stars](https://img.shields.io/github/stars/InternLM/xtuner?style=social)](https://github.com/InternLM/xtuner/stargazers)\n[![license](https://img.shields.io/github/license/InternLM/xtuner.svg)](https://github.com/InternLM/xtuner/blob/main/LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/xtuner)](https://pypi.org/project/xtuner/)\n[![Downloads](https://static.pepy.tech/badge/xtuner)](https://pypi.org/project/xtuner/)\n[![issue resolution](https://img.shields.io/github/issues-closed-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)\n[![open issues](https://img.shields.io/github/issues-raw/InternLM/xtuner)](https://github.com/InternLM/xtuner/issues)\n\n👋 join us on [![Static Badge](https://img.shields.io/badge/-grey?style=social\u0026logo=wechat\u0026label=WeChat)](https://cdn.vansin.top/internlm/xtuner.jpg)\n[![Static Badge](https://img.shields.io/badge/-grey?style=social\u0026logo=twitter\u0026label=Twitter)](https://twitter.com/intern_lm)\n[![Static Badge](https://img.shields.io/badge/-grey?style=social\u0026logo=discord\u0026label=Discord)](https://discord.gg/xa29JuW87d)\n\n🔍 Explore our models on\n[![Static Badge](https://img.shields.io/badge/-gery?style=social\u0026label=🤗%20Huggingface)](https://huggingface.co/xtuner)\n[![Static Badge](https://img.shields.io/badge/-gery?style=social\u0026label=🤖%20ModelScope)](https://www.modelscope.cn/organization/xtuner)\n[![Static Badge](https://img.shields.io/badge/-gery?style=social\u0026label=🧰%20OpenXLab)](https://openxlab.org.cn/usercenter/xtuner)\n[![Static Badge](https://img.shields.io/badge/-gery?style=social\u0026label=🧠%20WiseModel)](https://www.wisemodel.cn/organization/xtuner)\n\nEnglish | [简体中文](README_zh-CN.md)\n\n\u003c/div\u003e\n\n## 🚀 Speed Benchmark\n\n\u003cdiv align=center\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/fa42d587-068d-427b-b88c-25a164b3511c\" style=\"width:80%\"\u003e\n\u003c/div\u003e\n\n## 🎉 News\n\n- **\\[2025/09\\]** XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models\n\n## 📖 XTuner V1\n\nXTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research.\n\n### Key Features\n**📊 Dropless Training**\n\t\n  - **Scalable without complexity:** Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism\t\n  - **Optimized parallelism strategy:** Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training\n\n**📝 Long Sequence Support**\n\t\n  - **Memory-efficient design:** Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques\t\n  - **Flexible scaling:** Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length\t\n  - **Robust performance:** Maintains stability despite expert load imbalance during long sequence training\n\n**⚡ Superior Efficiency**\n\n  - **Massive scale:** Supports MoE training up to 1T parameters\t\n  - **Breakthrough performance:** First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale\n  - **Hardware optimization:** Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800\n\n\n\u003cdiv align=center\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/98519a93-1ce8-49f0-a7ab-d7968c9d67a6\" style=\"width:90%\"\u003e\n\u003c/div\u003e\n\n\n\n## 🔥 Roadmap\n\nXTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization.\n\n### 🚀 Training Engine\n\nOur vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem.\n\n\n|   Model    |  GPU(FP8) | GPU(BF16)| NPU(BF16) |\n|------------|-----------|----------|-----------|\n| Intern S1  |    ✅     |    ✅    |    ✅     |\n| Intern VL  |    ✅     |    ✅    |    ✅     |\n| Qwen3 Dense|    ✅     |    ✅    |    ✅     |\n| Qwen3 MoE  |    ✅     |    ✅    |    ✅     |\n| GPT OSS    |    ✅     |    ✅    |    🚧     |\n| Deepseek V3|    ✅     |    ✅    |    🚧     |\n| KIMI K2    |    ✅     |    ✅    |    🚧     |\n\n\n### 🧠 Algorithm\n\nThe algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes!\n\n**Implemented**\n\n\n- ✅ **Multimodal Pre-training** - Full support for vision-language model training\n- ✅ **Multimodal Supervised Fine-tuning** - Optimized for instruction following\t\n- ✅ [GRPO](https://arxiv.org/pdf/2402.03300) - Group Relative Policy Optimization\n\n\n**Coming Soon**\n\n- 🔄 [MPO](https://arxiv.org/pdf/2411.10442) - Mixed Preference Optimization\n- 🔄 [DAPO](https://arxiv.org/pdf/2503.14476) - Dynamic Sampling Policy Optimization\n- 🔄 **Multi-turn Agentic RL** - Advanced agent training capabilities\n\n\n### ⚡ Inference Engine Integration\n\nSeamless deployment with leading inference frameworks:\n- [x] LMDeploy\n- [ ] vLLM\n- [ ] SGLang\n\n\n\n### Data Preparation\n\n- You can use [GraphGen](https://github.com/open-sciencelab/GraphGen) to create synthetic data for fine-tuning.\n\n## 🤝 Contributing\n\nWe appreciate all contributions to XTuner. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.\n\n## 🙏 Acknowledgement\n\nThe development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects:\n\n**Training Engine:**\n\n- [Torchtitan](https://github.com/pytorch/torchtitan) - A PyTorch native platform for training generative AI models\n- [Deepspeed](https://github.com/deepspeedai/DeepSpeed) - Microsoft's deep learning optimization library\t\n- [MindSpeed](https://gitee.com/ascend/MindSpeed) - Ascend's high-performance training acceleration library\t\n- [Megatron](https://github.com/NVIDIA/Megatron-LM) - NVIDIA's large-scale transformer training framework\n\n\n**Reinforcement Learning:**\n\nXTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from:\n\n- [veRL](https://github.com/volcengine/verl) - Volcano Engine Reinforcement Learning for LLMs\t\n- [SLIME](https://github.com/THUDM/slime) - THU's scalable RLHF implementation\t\n- [AReal](https://github.com/inclusionAI/AReaL) - Ant Reasoning Reinforcement Learning for LLMs\n- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray\n\nWe are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training.\n\n\n## 🖊️ Citation\n\n```bibtex\n@misc{2023xtuner,\n    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},\n    author={XTuner Contributors},\n    howpublished = {\\url{https://github.com/InternLM/xtuner}},\n    year={2023}\n}\n```\n\n## License\n\nThis project is released under the [Apache License 2.0](LICENSE). Please also adhere to the Licenses of models and datasets being used.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finternlm%2Fxtuner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finternlm%2Fxtuner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finternlm%2Fxtuner/lists"}