An open API service indexing awesome lists of open source software.

https://github.com/mayuelala/FollowYourPose

[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"
https://github.com/mayuelala/FollowYourPose

aaai-2024 aigc follow-your-pose laion-pose-dataset video-generation

Last synced: 7 months ago
JSON representation

[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"

Awesome Lists containing this project

README

          


🕺🕺🕺 Follow-Your-Pose 💃💃💃
Pose-Guided Text-to-Video Generation using Pose-Free Videos (AAAI 2024)

[Yue Ma*](https://mayuelala.github.io/), [Yingqing He*](https://github.com/YingqingHe), [Xiaodong Cun](http://vinthony.github.io/), [Xintao Wang](https://xinntao.github.io/), [Siran Chen](https://github.com/Sranc3), [Ying Shan](https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ), [Xiu Li](https://scholar.google.com/citations?user=Xrh1OIUAAAAJ&hl=zh-CN), and [Qifeng Chen](https://cqf.io)


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mayuelala/FollowYourPose/blob/main/quick_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/YueMafighting/FollowYourPose) [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/houshaowei/FollowYourPose) ![visitors](https://visitor-badge.laobi.icu/badge?page_id=mayuelala.FollowYourPose&left_color=green&right_color=red) [![GitHub](https://img.shields.io/github/stars/mayuelala/FollowYourPose?style=social)](https://github.com/mayuelala/FollowYourPose)




"The man is sitting on chair, on the park"
"The Iron man, on the street
"


"The stormtrooper, in the gym
"
"The astronaut, earth background, Cartoon Style
"

## 💃💃💃 Demo Video

https://github.com/mayuelala/FollowYourPose/assets/38033523/e021bce6-b9bd-474d-a35a-7ddff4ab8e75

## 💃💃💃 Abstract
TL;DR: We tune the text-to-image model (e.g., stable diffusion) to generate the character videos from pose and text description.

CLICK for full abstract

> Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint-image pairs are used only for a controllable textto-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models will be made publicly available.

## 🕺🕺🕺 Changelog

- **[2024.03.15]** 🔥 🔥 🔥 We release the Second Follower [Follow-Your-Click](https://follow-your-click.github.io/), the first framework to achieve regional image animation. Try it now! Please give us a star! ⭐️⭐️⭐️ 😄
- **[2023.12.09]** 🔥 The paper is accepted by AAAI 2024!
- **[2023.08.30]** 🔥 Release some new results!
- **[2023.07.06]** 🔥 Release A new version of `浦源内容平台 demo` [![浦源内容平台 Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20%E6%B5%A6%E6%BA%90%E5%86%85%E5%AE%B9%E5%B9%B3%E5%8F%B0-Spaces-blue)](https://openxlab.org.cn/apps/detail/houshaowei/FollowYourPose)! Thanks for the support of Shanghai AI Lab!
- **[2023.04.12]** 🔥 Release local gradio demo and you could run it locally, only need a A100/3090.
- **[2023.04.11]** 🔥 Release some cases in `huggingface demo`.
- **[2023.04.10]** 🔥 Release A new version of `huggingface demo` [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/YueMafighting/FollowYourPose), which support both `raw video` and `skeleton video` as input. Enjoy it!
- **[2023.04.07]** Release the first version of `huggingface demo`. Enjoy the fun of following your pose! You need to download the [skeleton video](https://github.com/mayuelala/FollowYourPose/tree/main/pose_example) or make your own skeleton video by [mmpose](https://mmpose.readthedocs.io/en/latest/model_zoo_papers/backbones.html#hrnet-cvpr-2019). Additionaly, the second version which regard the `video format` as input is comming.
- **[2023.04.07]** Release a `colab notebook` [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mayuelala/FollowYourPose/blob/main/quick_demo.ipynb) and updata the `requirements` for installation!
- **[2023.04.06]** Release `code`, `config` and `checkpoints`!
- **[2023.04.03]** Release Paper and Project page!

## 💃💃💃 HuggingFace Demo


## 🎤🎤🎤 Todo

- [X] Release the code, config and checkpoints for teaser
- [X] Colab
- [X] Hugging face gradio demo
- [ ] Release more applications

## 🍻🍻🍻 Setup Environment
Our method is trained using cuda11, accelerator and xformers on 8 A100.
```
conda create -n fupose python=3.8
conda activate fupose

pip install -r requirements.txt
```

`xformers` is recommended for A100 GPU to save memory and running time.

Click for xformers installation

We find its installation not stable. You may try the following wheel:

```bash
wget https://github.com/ShivamShrirao/xformers-wheels/releases/download/4c06c79/xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl
pip install xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl
```

Our environment is similar to Tune-A-video ([official](https://github.com/showlab/Tune-A-Video), [unofficial](https://github.com/bryandlee/Tune-A-Video)). You may check them for more details.

## 💃💃💃 Training
We fix the bug in Tune-a-video and finetune stable diffusion-1.4 on 8 A100.
To fine-tune the text-to-image diffusion models for text-to-video generation, run this command:

```bash
TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch \
--multi_gpu --num_processes=8 --gpu_ids '0,1,2,3,4,5,6,7' \
train_followyourpose.py \
--config="configs/pose_train.yaml"
```

## 🕺🕺🕺 Inference
Once the training is done, run inference:

```bash
TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch \
--gpu_ids '0' \
txt2video.py \
--config="configs/pose_sample.yaml" \
--skeleton_path="./pose_example/vis_ikun_pose2.mov"
```
You could make the pose video by [mmpose](https://github.com/open-mmlab/mmpose) , we detect the skeleton by [HRNet](https://mmpose.readthedocs.io/en/latest/model_zoo_papers/backbones.html#hrnet-cvpr-2019). You just need to run the video demo to obtain the pose video. Remember to replace the background with black.

## 💃💃💃 Local Gradio Demo
You could run the gradio demo locally, only need a `A100/3090`.
```bash
python app.py
```
then the demo is running on local URL: `http://0.0.0.0:Port`

## 🕺🕺🕺 Weight
[Stable Diffusion] [Stable Diffusion](https://arxiv.org/abs/2112.10752) is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The pre-trained Stable Diffusion models can be downloaded from Hugging Face (e.g., [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4))

[FollowYourPose] We also provide our pretrained checkpoints in [Huggingface](https://huggingface.co/YueMafighting/FollowYourPose_v1/tree/main). you could download them and put them into `checkpoints` folder to inference our models.

```bash
FollowYourPose
├── checkpoints
│ ├── followyourpose_checkpoint-1000
│ │ ├──...
│ ├── stable-diffusion-v1-4
│ │ ├──...
│ └── pose_encoder.pth
```

## 💃💃💃 Results
We show our results regarding various pose sequences and text prompts.

Note mp4 and gif files in this github page are compressed.
Please check our [Project Page](https://follow-your-pose.github.io/) for mp4 files of original video results.



"Trump, on the mountain
"
"man, on the mountain
"
"astronaut, on mountain"



"girl, simple background"
"A Iron man, on the beach"
"A Hulk, on the mountain"



"A policeman, on the street"
"A girl, in the forest"
"A Iron man, on the street"



"A Robot, in Sahara desert"
"A Iron man, on the beach"
"A panda, son the sea"



"A man in the park, Van Gogh style"
"The fireman in the beach"
"Batman, brown background"



"A Hulk, on the sea"
"A superman, in the forest"
"A Iron man, in the snow"



"A man in the forest, Minecraft."
"A man in the sea, at sunset"
"James Bond, grey simple background"



"A Panda on the sea."
"A Stormtrooper on the sea"
"A astronaut on the moon"



"A astronaut on the moon."
"A Robot in Antarctica."
"A Iron man on the beach."



"The Obama in the desert"
"Astronaut on the beach."
"Iron man on the snow"



"A Stormtrooper on the sea"
"A Iron man on the beach."
"A astronaut on the moon."



"Astronaut on the beach"
"Superman on the forest"
"Iron man on the beach"



"Astronaut on the beach"
"Robot in Antarctica"
"The Stormtroopers, on the beach"

## 🎼🎼🎼 Citation
If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
```bibtex
@article{ma2023follow,
title={Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos},
author={Ma, Yue and He, Yingqing and Cun, Xiaodong and Wang, Xintao and Shan, Ying and Li, Xiu and Chen, Qifeng},
journal={arXiv preprint arXiv:2304.01186},
year={2023}
}
```

## 👯👯👯 Acknowledgements

This repository borrows heavily from [Tune-A-Video](https://github.com/showlab/Tune-A-Video) and [FateZero](https://github.com/ChenyangQiQi/FateZero). thanks the authors for sharing their code and models.

## 🕺🕺🕺 Maintenance

This is the codebase for our research work. We are still working hard to update this repo and more details are coming in days. If you have any questions or ideas to discuss, feel free to contact [Yue Ma](mailto:y-ma21@mails.tsinghua.edu.cn) or [Yingqing He](https://github.com/YingqingHe) or [Xiaodong Cun](mailto:vinthony@gmail.com).

## ⭐️⭐️⭐️ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=mayuelala/FollowYourPose&type=Date)](https://star-history.com/#mayuelala/FollowYourPose&Date)