Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Awesome-Talking-Head-Synthesis

πŸ’¬ An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🀩
https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis

Last synced: 3 days ago
JSON representation

  • Datasets

    • εœ¨θΏ™ι‡Œζ’ε…₯图片描述
    • Download link
    • Download link
    • Download link - visual dataset for speaker recognition, encompasses both VoxCeleb1 and VoxCeleb2 datasets. |
    • Download link
    • Download link - visual dataset, it is primarily used for speaker recognition tasks. However, it can also be utilized for training talking-head generation models. To obtain download permission and access the dataset, apply [here](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/). Requires 300 GB+ storage space. |
    • Download link - visual dataset focused on analyzing the visual speech of former US President Barack Obama. All video samples are collected from his weekly address footage. Unlike previous datasets, it exclusively centers on Barack Obama and does not provide any human annotations. |
    • Download link
    • Download link - speaking video dataset from the BBC program, features over 1000 speakers with various speaking styles and head poses. Each video is 1.16 seconds long (29 frames) and involves the target word along with context. |
    • Download link
    • Download link - HQ is a high-quality video dataset comprising 35,666 clips with a resolution of at least 512x512. It includes 15,653 identities, and each clip is manually labeled with 83 facial attributes, spanning appearance, action, and emotion. The dataset's diversity and temporal coherence make it a valuable resource for tasks like unconditional video generation and video facial attribute editing. |
    • Download link - definition Talking-Face Dataset, is a large in-the-wild high-resolution audio-visual dataset consisting of approximately 362 different videos totaling 15.8 hours. Original video resolutions are 720 P or 1080 P, and each cropped video is resized to 512 Γ— 512. |
    • Download link - D is a diverse dataset with 7,442 original clips featuring 91 actors, including 48 male and 43 female actors aged 20 to 74, representing various races and ethnicities. The dataset includes recordings of actors speaking from a set of 12 sentences, expressing six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) at four emotion levels (Low, Medium, High, and Unspecified). Emotion and intensity ratings were gathered through crowd-sourcing, with 2,443 participants rating 90 unique clips each (30 audio, 30 visual, and 30 audio-visual). Over 95% of the clips have more than 7 ratings. For additional details on CREMA-D, refer to the [paper link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4313618/). |
    • Download link
Programming Languages
Categories
Sub Categories