https://github.com/MRzzm/HDTF

the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"
https://github.com/MRzzm/HDTF

Last synced: 19 days ago
JSON representation

the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"

Host: GitHub
URL: https://github.com/MRzzm/HDTF
Owner: MRzzm
License: gpl-3.0
Created: 2021-03-28T13:04:29.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-05-12T10:40:56.000Z (about 1 year ago)
Last Synced: 2024-11-10T09:37:51.639Z (6 months ago)
Language: Python
Size: 3.38 MB
Stars: 347
Watchers: 14
Forks: 65
Open Issues: 18
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Talking-Head-Synthesis - Download link - definition Talking-Face Dataset, is a large in-the-wild high-resolution audio-visual dataset consisting of approximately 362 different videos totaling 15.8 hours. Original video resolutions are 720 P or 1080 P, and each cropped video is resized to 512 × 512. | (Datasets)

README

# HDTF
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset
paper supplementary [demo video](https://www.youtube.com/watch?v=uJdBgWYBTww)

## Details of HDTF dataset
**./HDTF_dataset** consists of *youtube video url*, *video resolution* (in our method, may not be the best resolution), *time stamps of talking face*, *facial region* (in the our method) and *the zoom scale* of the cropped window.

**xx_video_url.txt:**

```
format: video name | video youtube url
```
**xx_resolution.txt:**
```
format: video name | resolution(in our method)
```

## Processing of HDTF dataset
When using HDTF dataset,

- We provide video and url in **xx_video_url.txt**. (the highest definition of videos are 1080P or 720P). Transform video into **.mp4** format and transform interlaced video to progressive video as well.

- We split long original video into talking head clips with time stamps in **xx_annotion_time.txt**. Name the splitted clip as **video name_clip index.mp4**. For example, split the video *Radio11.mp4 00:30-01:00 01:30-02:30* into *Radio11_0.mp4* and *Radio11_1.mp4* .

- Our work does not always download videos with the best resolution, so we provide two cropping methods. Thanks @universome and @Feii Yin for pointing out this problem!

1. Download the video with reference resulotion in **xx_resolution.txt** and crop the facial region with fixed window size in **xx_crop_wh.txt**. (This method is as same as ours, but the downloaded video may not be the best resolution).
2. First, download the video with best resulotion. Then, detect the facial landmark in the splitted talking head clips and count the square window of the face, specifically, count the facial region in each frame and merge all regions into one square range. Next, enlarge the window size with **xx_crop_ratio.txt**. Finally, crop the facial region.

- We resize all cropped videos into **512 x 512** resolution.

The HDTF dataset is available to download under a Creative Commons Attribution 4.0 International License. **Thanks @universome for provding the the script of data processing, pls visit [here](https://github.com/universome/HDTF) for more details.** If you face any problems when processing HDTF, pls contact me.

## Inference code
#### code of audio-to-animation
coming soon......

#### code of constructing approximate dense flow
The code is in **./code_constructing_Fapp**, pls visit [here](https://github.com/MRzzm/HDTF/tree/main/code_constructing_Fapp) for more details.

#### code of animation-to-video module
The code is in **./code_animation2video**, pls visit [here](https://github.com/MRzzm/HDTF/tree/main/code_animation2video) for more details.

#### code of reproducing other works
coming soon......

## Reference
if you use HDTF, pls reference

```
@inproceedings{zhang2021flow,
title={Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset},
author={Zhang, Zhimeng and Li, Lincheng and Ding, Yu and Fan, Changjie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3661--3670},
year={2021}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MRzzm/HDTF

Awesome Lists containing this project

README