https://github.com/skyworkai/skyreels-a1

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
https://github.com/skyworkai/skyreels-a1

condition-render portrait-animation video-diffusion-transformers

Last synced: about 1 month ago
JSON representation

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

Host: GitHub
URL: https://github.com/skyworkai/skyreels-a1
Owner: SkyworkAI
License: other
Created: 2025-02-13T02:37:51.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-04-23T15:09:10.000Z (about 2 months ago)
Last Synced: 2025-04-23T16:25:08.900Z (about 2 months ago)
Topics: condition-render, portrait-animation, video-diffusion-transformers
Language: Python
Homepage: https://www.skyreels.ai
Size: 56.1 MB
Stars: 488
Watchers: 10
Forks: 55
Open Issues: 17
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

Di Qiu
Zhengcong Fei
Rui Wang
Jialin Bai
Changqian Yu

Mingyuan Fan
Guibin Chen
Xiang Wen

Skywork AI, Kunlun Inc.

showcase

🔥 For more results, visit our homepage 🔥

👋 Join our Discord

This repo, named **SkyReels-A1**, contains the official PyTorch implementation of our paper [SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers](https://arxiv.org/abs/2502.10841).

## 🔥🔥🔥 News!!
* Apr 3, 2025: 🔥 We release [SkyReels-A2](https://github.com/SkyworkAI/SkyReels-A2). This is an open-sourced controllable video generation framework capable of assembling arbitrary visual elements.
* Mar 4, 2025: 🔥 We release audio-driven portrait image animation pipeline. Try out on [Huggingface Spaces Demo](https://huggingface.co/spaces/Skywork/skyreels-a1-talking-head) !
* Feb 18, 2025: 👋 We release the inference code and model weights of SkyReels-A1. [Download](https://huggingface.co/Skywork/SkyReels-A1)
* Feb 18, 2025: 🎉 We have made our technical report available as open source. [Read](https://skyworkai.github.io/skyreels-a1.github.io/report.pdf)
* Feb 18, 2025: 🔥 Our online demo of LipSync is available on SkyReels now! Try out on [LipSync](https://www.skyreels.ai/home/tools/lip-sync?refer=navbar) .
* Feb 18, 2025: 🔥 We have open-sourced I2V video generation model [SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1). This is the first and most advanced open-source human-centric video foundation model.

## 📑 TODO List
- [x] Checkpoints
- [x] Inference Code
- [x] Web Demo (Gradio)
- [x] Audio-driven Portrait Image Animation Pipeline
- [x] Inference Code for Long Videos
- [ ] User-Level GPU Inference on RTX4090
- [ ] ComfyUI

## Getting Started 🏁

### 1. Clone the code and prepare the environment 🛠️
First git clone the repository with code:
```bash
git clone https://github.com/SkyworkAI/SkyReels-A1.git
cd SkyReels-A1

# create env using conda
conda create -n skyreels-a1 python=3.10
conda activate skyreels-a1
```
Then, install the remaining dependencies:
```bash
pip install -r requirements.txt
```

### 2. Download pretrained weights 📥
You can download the pretrained weights is from HuggingFace:
```bash
# !pip install -U "huggingface_hub[cli]"
huggingface-cli download Skywork/SkyReels-A1 --local-dir local_path --exclude "*.git*" "README.md" "docs"
```

The FLAME, mediapipe, and smirk models are located in the SkyReels-A1/extra_models folder.

The directory structure of our SkyReels-A1 code is formulated as:
```text
pretrained_models
├── FLAME
├── SkyReels-A1-5B
│ ├── pose_guider
│ ├── scheduler
│ ├── tokenizer
│ ├── siglip-so400m-patch14-384
│ ├── transformer
│ ├── vae
│ └── text_encoder
├── mediapipe
└── smirk

```

#### Download DiffposeTalk assets and pretrained weights (For Audio-driven)

- We use [diffposetalk](https://github.com/DiffPoseTalk/DiffPoseTalk/tree/main) to generate flame coefficients from audio, thereby constructing motion signals.

- Download the diffposetalk code and follow its README to download the weights and related data.

- Then place them in the specified directory.

```bash
cp -r ${diffposetalk_root}/style pretrained_models/diffposetalk
cp ${diffposetalk_root}/experiments/DPT/head-SA-hubert-WM/checkpoints/iter_0110000.pt pretrained_models/diffposetalk
cp ${diffposetalk_root}/datasets/HDTF_TFHP/lmdb/stats_train.npz pretrained_models/diffposetalk
```

- Or you can download style files from [link](https://drive.google.com/file/d/1XT426b-jt7RUkRTYsjGvG-wS4Jed2U1T/view?usp=sharing) and stats_train.npz from [link](https://drive.google.com/file/d/1_I5XRzkMP7xULCSGVuaN8q1Upplth9xR/view?usp=sharing).

```text
pretrained_models
├── FLAME
├── SkyReels-A1-5B
├── mediapipe
├── diffposetalk
│ ├── style
│ ├── iter_0110000.pt
│ ├── stats_train.npz
└── smirk

```

#### Download Frame interpolation Model pretrained weights (For Long Video Inference and Dynamic Resolution)

- We use [FILM](https://github.com/dajes/frame-interpolation-pytorch) to generate transition frames, making the video transitions smoother (Set `use_interpolation` to True).

- Download [film_net_fp16.pt](https://github.com/dajes/frame-interpolation-pytorch/releases), and place it in the specified directory.

```text
pretrained_models
├── FLAME
├── SkyReels-A1-5B
├── mediapipe
├── diffposetalk
├── film_net
│ ├── film_net_fp16.pt
└── smirk
```

### 3. Inference 🚀
You can simply run the inference scripts as:
```bash
python inference.py

# inference audio to video
python inference_audio.py
```

If the script runs successfully, you will get an output mp4 file. This file includes the following results: driving video, input image or video, and generated result.

#### Long Video Inference

Now, you can run the long video inference scripts to obtain portrait animation of any length：
```bash
python inference_long_video.py

# inference audio to video
python inference_audio_long_video.py
```

#### Dynamic Resolution

All inference scripts now support dynamic resolution, simply set `target_fps` to any desired fps, recommended fps include: 12fps (Native), 24fps, 48fps, 60fps, other settings such as 25fps and 30fps may cause unstable frame rates.

## Gradio Interface 🤗

We provide a [Gradio](https://huggingface.co/docs/hub/spaces-sdks-gradio) interface for a better experience, just run by:

```bash
python app.py
```

The graphical interactive interface is shown as below:

![gradio](assets/gradio.png)

## Metric Evaluation 👓

We also provide all scripts for automatically calculating the metrics, including SimFace, FID, and L1 distance between expression and motion, reported in the paper.

All codes can be found in the ```eval``` folder. After setting the video result path, run the following commands in sequence:

```bash
python arc_score.py
python expression_score.py
python pose_score.py
```

## Acknowledgements 💐
We would like to thank the contributors of [CogvideoX](https://github.com/THUDM/CogVideo), [finetrainers](https://github.com/a-r-r-o-w/finetrainers) and [DiffPoseTalk](https://github.com/DiffPoseTalk/DiffPoseTalk)repositories, for their open research and contributions.

## Citation 💖
If you find SkyReels-A1 useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
```bibtex
@article{qiu2025skyreels,
title={Skyreels-a1: Expressive portrait animation in video diffusion transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=SkyworkAI/SkyReels-A1&type=Date)](https://www.star-history.com/#SkyworkAI/SkyReels-A1&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/skyworkai/skyreels-a1

Awesome Lists containing this project

README

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers