https://github.com/ByteDance-Seed/VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
https://github.com/ByteDance-Seed/VeOmni

Last synced: about 1 month ago
JSON representation

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Host: GitHub
URL: https://github.com/ByteDance-Seed/VeOmni
Owner: ByteDance-Seed
License: apache-2.0
Created: 2025-03-28T03:42:42.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-04-27T21:54:31.000Z (about 2 months ago)
Last Synced: 2026-04-27T23:11:38.410Z (about 2 months ago)
Language: Python
Homepage:
Size: 11.5 MB
Stars: 1,866
Watchers: 15
Forks: 183
Open Issues: 88
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

awesome-LLM-resources - VeOmni - Centric Distributed Recipe Zoo. (微调 Fine-Tuning)
awesome-opensource-ai - VeOmni (ByteDance) - Versatile framework for both single- and multi-modal pre-training and post-training. Model-centric distributed recipe zoo supporting text, vision, audio, and video models with unified training interface. Apache 2.0 licensed. ![GitHub stars](https://img.shields.io/github/stars/ByteDance-Seed/VeOmni?style=social) (7. Training & Fine-tuning Ecosystem)

README

          






    VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

    


    




[![GitHub Repo stars](https://img.shields.io/github/stars/ByteDance-Seed/VeOmni)](https://github.com/ByteDance-Seed/VeOmni/stargazers)

[![Paper](https://img.shields.io/badge/Paper-red)](https://arxiv.org/abs/2508.02317)

[![Documentation](https://img.shields.io/badge/Documentation-blue)](https://veomni.readthedocs.io/en/latest/)

[![WeChat](https://img.shields.io/badge/WeChat-green?logo=wechat&amp)](https://raw.githubusercontent.com/ByteDance-Seed/VeOmni/refs/heads/main/docs/assets/wechat.png)



## 🍪 Overview

VeOmni is a versatile framework for both single- and multi-modal pre-training and post-training. It empowers users to seamlessly scale models of any modality across various accelerators, offering both flexibility and user-friendliness.

Our guiding principles when building VeOmni are:

- **Flexibility and Modularity**: VeOmni is built with a modular design, allowing users to decouple most components and replace them with their own implementations as needed.

- **Trainer-free**: VeOmni supports linear training scripts that avoid rigid, structured trainer classes (e.g., [PyTorch-Lightning](https://github.com/Lightning-AI/pytorch-lightning) or [HuggingFace](https://huggingface.co/docs/transformers/v4.50.0/en/main_classes/trainer#transformers.Trainer) Trainer). These training scripts expose the entire training logic to users for maximum transparency and control. Besides, VeOmni supports a basic trainer for text-only or vlm/omni models training and a rl trainer as a trainer backend in reinforcement learning.

- **Omni model native**: VeOmni enables users to effortlessly scale any omni-model across devices and accelerators.

- **Torch native**: VeOmni is designed to leverage PyTorch’s native functions to the fullest extent, ensuring maximum compatibility and performance.







## 🔥 Latest News

- [2025/11] Our Paper [OmniScale: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo](https://arxiv.org/abs/2508.02317) was accepted by AAAI 2026

- [2025/09] We release first offical release [v0.1.0](https://github.com/ByteDance-Seed/VeOmni/pull/75) of VeOmni.

- [2025/08] We release [VeOmni Tech report](https://arxiv.org/abs/2508.02317) and open the [WeChat group](./docs/assets/wechat.png). Feel free to join us!

- [2025/04] We release VeOmni!

## 📚 Key Features

- **FSDP**, **FSDP2** backend for training.

- **Sequence Parallelism** with [Deepspeed Ulysess](https://arxiv.org/abs/2309.14509), support with non-async and async mode.

- **Experts Parallelism** support large MOE model training, like [Qwen3-Moe](https://veomni.readthedocs.io/en/latest/key_features/ep_fsdp2.html).

- Efficient **GroupGemm** kernel for Moe model, [Liger-Kernel](https://github.com/linkedin/Liger-Kernel).

- Compatible with HuggingFace Transformers models. [Qwen3](https://veomni.readthedocs.io/en/latest/examples/qwen3.html), [Qwen3-VL](https://veomni.readthedocs.io/en/latest/examples/qwen3_vl.html), Qwen3-Moe, etc

- Dynamic batching strategy, Omnidata processing

- [**Torch Distributed Checkpoint**](https://docs.pytorch.org/docs/stable/distributed.checkpoint.html) for checkpoint.

- Support for both Nvidia-GPU and Ascend-NPU training.

- Experiment tracking with wandb

## 📝 Upcoming Features and Changes

- VeOmni v0.2 Roadmap https://github.com/ByteDance-Seed/VeOmni/issues/268, https://github.com/ByteDance-Seed/VeOmni/issues/271

- Vit balance tool https://github.com/ByteDance-Seed/VeOmni/issues/280

- Validation dataset during training https://github.com/ByteDance-Seed/VeOmni/issues/247

- RL post training for omni-modality models with VeRL https://github.com/ByteDance-Seed/VeOmni/issues/262

## 🚀 Getting Started

Documentation

### Quick Start

  - [Installation](https://veomni.readthedocs.io/en/latest/get_started/installation/install.html)

  - [Quick Start with Qwen3](https://veomni.readthedocs.io/en/latest/examples/qwen3.html)

## ✏️ Supported Models

| Model                                                    | Model size                    | Example config File                                                   |

| -------------------------------------------------------- | ----------------------------- | ----------------------------------------------------------------------|

| [DeepSeek2.5/3/R1](https://huggingface.co/deepseek-ai)   | 236B/671B                     | [deepseek.yaml](configs/text/deepseek.yaml)                           |

| [Llama3-3.3](https://huggingface.co/meta-llama)          | 1B/3B/8B/70B                  | [llama3.yaml](configs/text/llama3.yaml)                               |

| [Qwen2-3](https://huggingface.co/Qwen)                   | 0.5B/1.5B/3B/7B/14B/32B/72B/  | [qwen2_5.yaml](configs/text/qwen2_5.yaml)                             |

| [Qwen2-3 VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B              | [qwen3_vl_dense.yaml](configs/multimodal/qwen3_vl/qwen3_vl_dense.yaml)|

| [Qwen3-VL MoE](https://huggingface.co/Qwen)              | 30BA3B/235BA22B               | [qwen3_vl_moe.yaml](configs/multimodal/qwen3_vl/qwen3_vl_moe.yaml)    |

| [Qwen3-MoE](https://huggingface.co/Qwen)                 | 30BA3B/235BA22B               | [qwen3-moe.yaml](configs/text/qwen3-moe.yaml)                         |

| [Qwen2-3 Omni](https://huggingface.co/Qwen)              | 7B/30BA3B                     | [qwen25_omni.yaml](configs/multimodal/qwen25_omni/qwen25_omni.yaml)   |

| [Wan](https://huggingface.co/Wan-AI)                     | Wan2.1-I2V-14B-480P           | [wan_sft.yaml](configs/dit/wan_sft.yaml)                              |

| Omni Model                                               | Any Modality Training         | [seed_omni.yaml](configs/multimodal/omni/seed_omni.yaml)              |

Support new models to VeOmni see [Support New Models](https://veomni.readthedocs.io/en/latest/usage/support_new_models/guide_and_checklist.html)

## ⛰️ Performance







For more details, please refer to our [paper](https://arxiv.org/abs/2508.02317).

## 💡 Awesome work using VeOmni

- [dFactory: Easy and Efficient dLLM Fine-Tuning](https://github.com/inclusionAI/dFactory)

- [LMMs-Engine](https://github.com/EvolvingLMMs-Lab/lmms-engine)

- [UI-TARS: Pioneering Automated GUI Interaction with Native Agents](https://github.com/bytedance/UI-TARS)

- [OpenHA: A Series of Open-Source Hierarchical

Agentic Models in Minecraft](https://arxiv.org/pdf/2509.13347)

- [UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning](https://arxiv.org/abs/2509.02544)

- [Open-dLLM: Open Diffusion Large Language Models](https://github.com/pengzhangzhi/Open-dLLM)

- [LingBot-VLA: A Pragmatic VLA Foundation Model](https://github.com/Robbyant/lingbot-vla)

## 🎨 Contributing

Contributions from the community are welcome! Please check out [CONTRIBUTING.md](CONTRIBUTING.md) our project roadmap(To be updated),

## 📝 Citation and Acknowledgement

If you find VeOmni useful for your research and applications, feel free to give us a star ⭐ or cite us using:

```bibtex

@article{ma2025veomni,

  title={VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo},

  author={Ma, Qianli and Zheng, Yaowei and Shi, Zhelun and Zhao, Zhongkai and Jia, Bin and Huang, Ziyue and Lin, Zhiqi and Li, Youjie and Yang, Jiacheng and Peng, Yanghua and others},

  journal={arXiv preprint arXiv:2508.02317},

  year={2025}

}

```

Thanks to the following projects for their excellent work:

- [ByteCheckpoint](https://arxiv.org/abs/2407.20143)

- [veScale](https://github.com/volcengine/veScale)

- [Liger-Kernel](https://github.com/linkedin/Liger-Kernel)

- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)

- [torchtitan](https://github.com/pytorch/torchtitan/)

- [torchtune](https://github.com/pytorch/torchtune)

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=ByteDance-Seed/VeOmni&type=date&legend=top-left)](https://www.star-history.com/#ByteDance-Seed/VeOmni&type=date&legend=top-left)

## 🌱 About [ByteDance Seed Team](https://team.doubao.com/)







Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. You can get to know Bytedance Seed better through the following channels👇

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ByteDance-Seed/VeOmni

Awesome Lists containing this project

README