https://github.com/PKU-YuanGroup/Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
https://github.com/PKU-YuanGroup/Open-Sora-Plan

Last synced: 3 months ago
JSON representation

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Host: GitHub
URL: https://github.com/PKU-YuanGroup/Open-Sora-Plan
Owner: PKU-YuanGroup
License: mit
Created: 2024-02-20T14:01:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-29T09:07:51.000Z (8 months ago)
Last Synced: 2024-10-29T10:58:11.343Z (8 months ago)
Language: Python
Homepage:
Size: 8.38 MB
Stars: 11,481
Watchers: 155
Forks: 1,022
Open Issues: 241
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Video-Diffusion - Open-Sora-Plan
ai-game-devtools - Open-Sora - Sora Plan. | | | Video | (<span id="video">Video</span> / <span id="tool">Tool (AI LLM)</span>)
StarryDivineSky - PKU-YuanGroup/Open-Sora-Plan - Sora-Plan v1.1.0，它显着增强了视频生成质量和文本控制功能。该项目旨在创建一个简单且可扩展的存储库，以重现 Sora（OpenAI，但我们更愿意将其称为“ClosedAI”）。我们希望开源社区能够为这个项目做出贡献。欢迎拉取请求!!本项目希望通过开源社区的力量复现Sora，由北大-兔展AIGC联合实验室共同发起，当前版本离目标差距仍然较大，仍需持续完善和快速迭代。项目阶段：设置代码库并在landscape景观数据集上训练无条件模型。训练可提高分辨率和持续时间的模型。在landscape景观数据集上进行text2video实验。在 video2text 数据集上训练 1080p 模型。具有更多条件的控制模型。 (视频生成、补帧、摘要 / 网络服务_其他)
jimsghstars - PKU-YuanGroup/Open-Sora-Plan - This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project. (Python)
AiTreasureBox - PKU-YuanGroup/Open-Sora-Plan - 06-13_11984_-1](https://img.shields.io/github/stars/PKU-YuanGroup/Open-Sora-Plan.svg)|This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.| (Repos)
awesome-diffusion-categorized - [Code
awesome-llm-and-aigc - Open Sora Plan - YuanGroup/Open-Sora-Plan?style=social"/> : This project aim to reproducing [Sora](https://openai.com/sora) (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. 本项目希望通过开源社区的力量复现Sora，由北大-兔展AIGC联合实验室共同发起，当前我们资源有限仅搭建了基础架构，无法进行完整训练，希望通过开源社区逐步增加模块并筹集资源进行训练，当前版本离目标差距巨大，仍需持续完善和快速迭代，欢迎Pull request！！！[Project Page](https://pku-yuangroup.github.io/Open-Sora-Plan/) [中文主页](https://pku-yuangroup.github.io/Open-Sora-Plan/blog_cn.html) (Summary)
awesome-llm-and-aigc - Open Sora Plan - YuanGroup/Open-Sora-Plan?style=social"/> : This project aim to reproducing [Sora](https://openai.com/sora) (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. 本项目希望通过开源社区的力量复现Sora，由北大-兔展AIGC联合实验室共同发起，当前我们资源有限仅搭建了基础架构，无法进行完整训练，希望通过开源社区逐步增加模块并筹集资源进行训练，当前版本离目标差距巨大，仍需持续完善和快速迭代，欢迎Pull request！！！[Project Page](https://pku-yuangroup.github.io/Open-Sora-Plan/) [中文主页](https://pku-yuangroup.github.io/Open-Sora-Plan/blog_cn.html) (Summary)
awesome-LLM-resources - Open-Sora-Plan
Awesome-Open-AI-Sora - Open-Sora-Plan - to-video model. It invites the open-source community to contribute towards developing the model. (GitHub projects)

README

        

 Open-Sora Plan



This project aims to create a simple and scalable repo, to reproduce [Sora](https://openai.com/sora) (OpenAI, but we prefer to call it "ClosedAI" ). We wish the open-source community can contribute to this project. Pull requests are welcome! The current code supports complete training and inference using the Huawei Ascend AI computing system. Models trained on Huawei Ascend can also output video quality comparable to industry standards.

本项目希望通过开源社区的力量复现Sora，由北大-兔展AIGC联合实验室共同发起，当前版本离目标差距仍然较大，仍需持续完善和快速迭代，欢迎Pull request！目前代码同时支持使用国产AI计算系统（华为昇腾）进行完整的训练和推理。基于昇腾训练出的模型，也可输出持平业界的视频质量。

我们正在快速迭代新版本，欢迎更多合作者或算法工程师加入，[算法工程师招聘-兔展智能.pdf](https://github.com/user-attachments/files/19107972/-.pdf)



[![arXiv](https://img.shields.io/badge/Arxiv-Open--Sora%20Plan-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2412.00131)

[![arXiv](https://img.shields.io/badge/Arxiv-WF--VAE-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2411.17459)

[![License](https://img.shields.io/badge/License-Apache-yellow)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/LICENSE)  


[![slack badge](https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp)](https://discord.gg/DFZg5678)

[![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/issues/53#issuecomment-1987226516)

[![Twitter](https://img.shields.io/badge/-Twitter@LinBin46984-black?logo=twitter&logoColor=1D9BF0)](https://x.com/LinBin46984/status/1795018003345510687) 

[![Modelers](https://img.shields.io/badge/%E9%AD%94%E4%B9%90-%E6%A8%A1%E5%9E%8B%E4%BD%93%E9%AA%8C-blue)](https://modelers.cn/spaces/MindSpore-Lab/Open_Sora_Plan) 


[![GitHub repo stars](https://img.shields.io/github/stars/PKU-YuanGroup/Open-Sora-Plan?style=flat&logo=github&logoColor=whitesmoke&label=Stars)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/stargazers) 

[![GitHub repo forks](https://img.shields.io/github/forks/PKU-YuanGroup/Open-Sora-Plan?style=flat&logo=github&logoColor=whitesmoke&label=Forks)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/network) 

[![GitHub repo watchers](https://img.shields.io/github/watchers/PKU-YuanGroup/Open-Sora-Plan?style=flat&logo=github&logoColor=whitesmoke&label=Watchers)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/watchers) 

[![GitHub repo size](https://img.shields.io/github/repo-size/PKU-YuanGroup/Open-Sora-Plan?style=flat&logo=github&logoColor=whitesmoke&label=Repo%20Size)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/archive/refs/heads/main.zip) 


[![GitHub repo contributors](https://img.shields.io/github/contributors-anon/PKU-YuanGroup/Open-Sora-Plan?style=flat&label=Contributors)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/graphs/contributors) 

[![GitHub Commit](https://img.shields.io/github/commit-activity/m/PKU-YuanGroup/Open-Sora-Plan?label=Commit)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/commits/main/)

[![Pr](https://img.shields.io/github/issues-pr-closed-raw/PKU-YuanGroup/Open-Sora-Plan.svg?label=Merged+PRs&color=green)](https://github.com/PKU-YuanGroup/Open-Sora-Plan/pulls)

[![GitHub issues](https://img.shields.io/github/issues/PKU-YuanGroup/Open-Sora-Plan?color=critical&label=Issues)](https://github.com/PKU-YuanGroup/Video-LLaVA/issues?q=is%3Aopen+is%3Aissue)

[![GitHub closed issues](https://img.shields.io/github/issues-closed/PKU-YuanGroup/Open-Sora-Plan?color=success&label=Issues)](https://github.com/PKU-YuanGroup/Video-LLaVA/issues?q=is%3Aissue+is%3Aclosed)





 If you like our project, please give us a star ⭐ on GitHub for latest update.  

# 📣 News

* `COMING SOON` ⚡️⚡️⚡️ For large model parallelisation training, TP & SP and more strategies are coming...

  

  > 近期将新增华为昇腾多模态MindSpeed-MM分支，借助华为MindSpeed-MM套件的能力支撑Open-Sora Plan参数的扩增，为更大参数规模的模型训练提供TP、SP等分布式训练能力。

* **[2024.12.03]** ⚡️ We released our [arxiv paper](https://arxiv.org/abs/2412.00131) and WF-VAE [paper](https://arxiv.org/abs/2411.17459) for v1.3. The next more powerful version is coming soon.

* **[2024.10.16]** 🎉 We released version 1.3.0, featuring: **WFVAE**, **prompt refiner**, **data filtering strategy**, **sparse attention**, and **bucket training strategy**. We also support 93x480p within **24G VRAM**. More details can be found at our latest [report](docs/Report-v1.3.0.md).

* **[2024.08.13]** 🎉 We are launching Open-Sora Plan v1.2.0 **I2V** model, which is based on Open-Sora Plan v1.2.0. The current version supports image-to-video generation and transition generation (the starting and ending frames conditions for video generation). Check out the Image-to-Video section in this [report](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/docs/Report-v1.2.0.md#training-image-to-video-diffusion-model).

* **[2024.07.24]** 🔥🔥🔥 v1.2.0 is here! Utilizing a 3D full attention architecture instead of 2+1D. We released a true 3D video diffusion model trained on 4s 720p. Check out our latest [report](docs/Report-v1.2.0.md).

* **[2024.05.27]** 🎉 We are launching Open-Sora Plan v1.1.0, which significantly improves video quality and length, and is fully open source! Please check out our latest [report](docs/Report-v1.1.0.md). Thanks to [ShareGPT4Video's](https://sharegpt4video.github.io/) capability to annotate long videos.

* **[2024.04.09]** 🤝 Excited to share our latest exploration on metamorphic time-lapse video generation: [MagicTime](https://github.com/PKU-YuanGroup/MagicTime), which learns real-world physics knowledge from time-lapse videos.

* **[2024.04.07]** 🎉🎉🎉 Today, we are thrilled to present Open-Sora-Plan v1.0.0, which significantly enhances video generation quality and text control capabilities. See our [report](docs/Report-v1.0.0.md). Thanks to HUAWEI NPU for supporting us.

* **[2024.03.27]** 🚀🚀🚀 We release the report of [VideoCausalVAE](docs/CausalVideoVAE.md), which supports both images and videos. We present our reconstructed video in this demonstration as follows. The text-to-video model is on the way.

* **[2024.03.01]** 🤗 We launched a plan to reproduce Sora, called Open-Sora Plan! Welcome to **watch** 👀 this repository for the latest updates.

# 😍 Gallery

Text & Image to Video Generation. 

[![Demo Video of Open-Sora Plan V1.3](https://github.com/user-attachments/assets/4ff1d873-3dde-4905-a907-dbff51174c20)](https://www.bilibili.com/video/BV1KR2fYPEF5/?spm_id_from=333.999.0.0&vd_source=cfda99203e659100629b465161f1d87d)

# 😮 Highlights

Open-Sora Plan shows excellent performance in video generation.

### 🔥 High performance CausalVideoVAE, but with lower training cost

- High compression ratio with excellent performance, capable of **compressing videos by 256 times (4×8×8)**. Causal convolution supports simultaneous inference of images and videos but only need **1 node to train**.

### 🚀 Video Diffusion Model based on 3D attention, joint learning of spatiotemporal features.

- With **a new sparse attention architecture** instead of a 2+1D model, 3D attention can better capture joint spatial and temporal features.



    



# 🤗 Demo

### Gradio Web UI

Highly recommend trying out our web demo by the following command.

```bash

python -m opensora.serve.gradio_web_server --model_path "path/to/model" \

    --ae WFVAEModel_D8_4x8x8 --ae_path "path/to/vae" \

    --caption_refiner "path/to/refiner" \

    --text_encoder_name_1 "path/to/text_enc" --rescale_betas_zero_snr

```

### ComfyUI

Coming soon...

# 🐳 Resource

| Version | Architecture |  Diffusion Model | CausalVideoVAE | Data | Prompt Refiner |

|:---|:---|:---|:---|:---|:---|

| v1.3.0 [4] | Skiparse 3D | [Anysize in 93x640x640](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/any93x640x640)[3], [Anysize in 93x640x640_i2v](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/any93x640x640_i2v)[3] | [Anysize](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/vae)| [prompt_refiner](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/prompt_refiner) | [checkpoint](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0/tree/main/prompt_refiner)| |

| v1.2.0 | Dense 3D | [93x720p](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/93x720p), [29x720p](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/29x720p)[1], [93x480p](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/93x480p)[1,2], [29x480p](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/29x480p), [1x480p](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/1x480p), [93x480p_i2v](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/93x480p_i2v) | [Anysize](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.2.0/tree/main/vae)| [Annotations](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.2.0) | - |

| v1.1.0 | 2+1D | [221x512x512](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/221x512x512), [65x512x512](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/65x512x512) |[Anysize](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/vae) |[Data and Annotations](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0)| - |

| v1.0.0 | 2+1D | [65x512x512](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.0.0/tree/main/65x512x512), [65x256x256](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.0.0/tree/main/65x256x256), [17x256x256](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.0.0/tree/main/17x256x256) | [Anysize](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.0.0/tree/main/vae) | [Data and Annotations](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.0.0)| - |

> [1] Please note that the weights for v1.2.0 29×720p and 93×480p were trained on Panda70M and have not undergone final high-quality data fine-tuning, so they may produce watermarks.

> [2] We fine-tuned 3.5k steps from 93×720p to get 93×480p for community research use.

> [3] The model is trained arbitrarily on stride=32. So keep the resolution of the inference a multiple of 32. Frames need to be 4n+1, e.g. 93, 77, 61, 45, 29, 1 (image).

> [4] Model weights are also available at [OpenMind](https://modelers.cn/models/linbin/Open-Sora-Plan-v1.3.0) and [WiseModel](https://wisemodel.cn/models/PKU-YUAN/Open-Sora-Plan-v1.3.0).

> [!Warning]

>

> 


> 

> 🚨 For version 1.2.0, we no longer support 2+1D models.

> 

> 


# ⚙️ Requirements and Installation

1. Clone this repository and navigate to Open-Sora-Plan folder

```

git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan

cd Open-Sora-Plan

```

2. Install required packages

We recommend the requirements as follows.

* Python >= 3.8

* Pytorch >= 2.1.0

### GPU

```

conda create -n opensora python=3.8 -y

conda activate opensora

pip install -e .

```

### NPU

```

pip install torch_npu==2.1.0.post6

# ref https://github.com/dmlc/decord

git clone --recursive https://github.com/dmlc/decord

mkdir build && cd build 

cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release -DFFMPEG_DIR=/usr/local/ffmpeg 

make 

cd ../python 

pwd=$PWD 

echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bashrc 

source ~/.bashrc 

python3 setup.py install --user

```

3. Install optional requirements such as static type checking:

```

pip install -e '.[dev]'

```

# 🗝️ Training & Inferencing

## 🗜️ CausalVideoVAE

The data preparation, training, inferencing and evaluation can be found [here](docs/VAE.md)

## 📖 Prompt Refiner

The data preparation, training, inferencing can be found [here](docs/Prompt_Refiner.md)

## 📜 Text-to-Video 

The data preparation, training and inferencing can be found [here](docs/T2V.md)

## 🖼️ Image-to-Video

The data preparation, training and inferencing can be found [here](docs/I2V.md)

# ⚡️ Extra Save Memory

## 🔆 Training

During training, the entire EMA model remains in VRAM. You can enable `--offload_ema` or disable `--use_ema`. Additionally, VAE tiling is disabled by default, but you can pass `--enable_tiling` or disable `--vae_fp32`. Finally, a temporary but extreme saving memory option is enable `--extra_save_mem` to offload the text encoder and VAE to the CPU when not in use, though this will significantly slow down performance.

We currently have two plans: one is to continue using the Deepspeed/FSDP approach, sharding the EMA and text encoder across ranks with Zero3, which is sufficient for training 10-15B models. The other is to adopt MindSpeed for various parallel strategies, enabling us to scale the model up to 30B.

## ⚡️ 24G VRAM Inferencing

Please first ensure that you understand how to inference. Refer to the [inference](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/docs/T2V.md#inference) instructions in Text-to-Video.

Simply specify `--save_memory`, and during inference, `enable_model_cpu_offload()`, `enable_sequential_cpu_offload()`, and `vae.vae.enable_tiling()` will be automatically activated.

# 💡 How to Contribute

We greatly appreciate your contributions to the Open-Sora Plan open-source community and helping us make it even better than it is now!

For more details, please refer to the [Contribution Guidelines](docs/Contribution_Guidelines.md)

# 👍 Acknowledgement and Related Work

* [Allegro](https://github.com/rhymes-ai/Allegro): Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input based on our Open-Sora Plan. The significance of open-source is becoming increasingly tangible.

* [Latte](https://github.com/Vchitect/Latte): It is a wonderful 2+1D video generation model.

* [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha): Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.

* [ShareGPT4Video](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4Video): Improving Video Understanding and Generation with Better Captions.

* [VideoGPT](https://github.com/wilson1yan/VideoGPT): Video Generation using VQ-VAE and Transformers.

* [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.

* [FiT](https://github.com/whlzy/FiT): Flexible Vision Transformer for Diffusion Model.

* [Positional Interpolation](https://arxiv.org/abs/2306.15595): Extending Context Window of Large Language Models via Positional Interpolation.

# 🔒 License

* See [LICENSE](LICENSE) for details.

## ✨ Star History

[![Star History](https://api.star-history.com/svg?repos=PKU-YuanGroup/Open-Sora-Plan)](https://star-history.com/#PKU-YuanGroup/Open-Sora-Plan&Date)

# ✏️ Citing

```bibtex

@article{lin2024open,

  title={Open-Sora Plan: Open-Source Large Video Generation Model},

  author={Lin, Bin and Ge, Yunyang and Cheng, Xinhua and Li, Zongjian and Zhu, Bin and Wang, Shaodong and He, Xianyi and Ye, Yang and Yuan, Shenghai and Chen, Liuhan and others},

  journal={arXiv preprint arXiv:2412.00131},

  year={2024}

}

```

```bibtex

@article{li2024wf,

  title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model},

  author={Li, Zongjian and Lin, Bin and Ye, Yang and Chen, Liuhan and Cheng, Xinhua and Yuan, Shenghai and Yuan, Li},

  journal={arXiv preprint arXiv:2411.17459},

  year={2024}

}

```

# 🤝 Community contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/PKU-YuanGroup/Open-Sora-Plan

Awesome Lists containing this project

README

Open-Sora Plan