https://github.com/Yuanshi9815/Video-Infinity

Video-Infinity generates long videos quickly using multiple GPUs without extra training.
https://github.com/Yuanshi9815/Video-Infinity

Last synced: about 2 months ago
JSON representation

Video-Infinity generates long videos quickly using multiple GPUs without extra training.

Host: GitHub
URL: https://github.com/Yuanshi9815/Video-Infinity
Owner: Yuanshi9815
Created: 2024-05-22T20:36:32.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-04T11:34:30.000Z (11 months ago)
Last Synced: 2024-11-17T10:39:53.708Z (8 months ago)
Language: Python
Homepage: https://video-infinity.tanzhenxiong.com
Size: 1.64 MB
Stars: 163
Watchers: 1
Forks: 15
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-llm-projects - Video-Infinity - Infinity generates long videos quickly using multiple GPUs without extra training. (Projects / 🎥 Video)

README

# Video-Infinity

> **Video-Infinity: Distributed Long Video Generation**
>

> Zhenxiong Tan,
> [Xingyi Yang](https://adamdad.github.io/),
> [Songhua Liu](http://121.37.94.87/),
> and
> [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
>

> [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore
>

## TL;DR (Too Long; Didn't Read)
Video-Infinity generates long videos quickly using multiple GPUs without extra training. Feel free to visit our
[project page](https://video-infinity.tanzhenxiong.com)
for more information and generated videos.

## Features
* **Distributed 🌐**: Utilizes multiple GPUs to generate long-form videos.
* **High-Speed 🚀**: Produces 2,300 frames in just 5 minutes.
* **Training-Free 🎓**: Generates long videos without requiring additional training for existing models.

## Setup
### Installation Environment
```bash
conda create -n video_infinity_vc2 python=3.10
conda activate video_infinity_vc2
pip install -r requirements.txt
```

## Usage
### Quick Start
- **Basic Usage**
```bash
python inference.py --config examples/config.json
```
- **Multi-Prompts**
```bash
python inference.py --config examples/multi_prompts.json
```
- **Single GPU**
```bash
python inference.py --config examples/single_gpu.json
```

### Config
#### Basic Config
| Parameter | Description |
| ----------- | -------------------------------------- |
| `devices` | The list of GPU devices to use. |
| `base_path` | The path to save the generated videos. |

#### Pipeline Config
| Parameter | Description |
| ------------ | ---------------------------------------------------------------------------------------------------- |
| `prompts` | The list of text prompts. **Note**: The number of prompts should be greater than the number of GPUs. |
| `file_name` | The name of the generated video. |
| `num_frames` | The number of frames to generate on **each GPU**. |

#### Video-Infinity Config
| Parameter | Description |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `*.padding` | The number of local context frames. |
| `attn.topk` | The number of global context frames for `Attention` model. |
| `attn.local_phase` | When the denoise timestep is less than `t`, it bias the attention. This adds a `local_bias` to the local context frames and a `global_bias` to the global context frames. |
| `attn.global_phase` | It is similar to `local_phase`. But it bias the attention when the denoise timestep is greater than `t`. |
| `attn.token_num_scale` | If the value is `True`, the scale factor will be rescaled by the number of tokens. Default is `False`. More details can be referred to this [paper](https://arxiv.org/abs/2306.08645). |
#### How to Set Config
- To avoid the loss of high-frequency information, we recommend setting the sum of `padding` and `attn.topk` to be less than 24 (which is similar to the number of the default frames in the `VideoCrafter2` model).
- If you wish to have a larger `padding` or `attn.topk`, you should set the `attn.token_num_scale` to `True`.
- A higher `local_phase.t` and `global_phase.t` will result in more stable videos but may reduce the diversity of the videos.
- More `padding` will provide more local context.
- A higher `attn.topk` will bring about overall stability in the videos.

## Citation
```
@article{
tan2024videoinf,
title={Video-Infinity: Distributed Long Video Generation},
author={Zhenxiong Tan, Xingyi Yang, Songhua Liu, and Xinchao Wang},
journal={arXiv preprint arXiv:2406.16260},
year={2024}
}
```

## Acknowledgements
Our project is based on the [VideoCrafter2](https://ailab-cvc.github.io/videocrafter2) model. We would like to thank the authors for their excellent work! ❤️

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Yuanshi9815/Video-Infinity

Awesome Lists containing this project

README