https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis

💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis
List: Awesome-Talking-Head-Synthesis
arxiv audio-driven paper synthesis talking-face-generation talking-head talking-head-video-generation
Last synced: 5 months ago
JSON representation
💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
Host: GitHub
URL: https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis
Owner: Kedreamix
License: mit
Created: 2023-12-06T13:27:59.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-27T07:23:27.000Z (9 months ago)
Last Synced: 2024-08-27T08:39:35.432Z (9 months ago)
Topics: arxiv, audio-driven, paper, synthesis, talking-face-generation, talking-head, talking-head-video-generation
Homepage: https://kedreamix.github.io/
Size: 44.9 KB
Stars: 663
Watchers: 24
Forks: 34
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - Awesome-Talking-Head-Synthesis - 💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩. (Other Lists / Julia Lists)
README

        # Awesome-Talking-Head-Synthesis

- [Datasets](#datasets)

- [Survey](#survey)

- [Funny Work](#funny-work)

- [Audio-driven](#audio-driven)

- [Text-driven](#text-driven)

- [NeRF \& 3D \& Gaussian Splatting](#nerf--3d--gaussian-splatting)

- [Metrics](#metrics)

- [Tools \& Software](#tools--software)

- [Slides \& Presentations](#slides--presentations)

- [References](#references)

- [Star History](#star-history)

This repository organizes papers, codes and resources related to generative adversarial networks (GANs) 🤗 and neural radiance fields (NeRF) 🎨, with a main focus on image-driven and audio-driven talking head synthesis papers and released codes. 👤

Papers for Talking Head Synthesis, released codes collections. ✍️

Most papers are linked to PDFs on "arXiv" or journal/conference websites 📚. However, some papers require an academic license to view 🔐.

🔆 This project Awesome-Talking-Head-Synthesis is ongoing - pull requests are welcome! If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and submit a PR. You can also open an issue or contact me directly via email. 📩

⭐ If you find this repo useful, please give it a star! 🤩

**2023.12 Update** 📆

Thank you to https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, I have added some of its contents, such as `Tools & Software` and `Slides & Presentations`. 🙏 I hope this will be helpful.😊

If you have any feedback or ideas on extending this aggregated resource, please open an issue or PR - community contributions are vital to advancing this shared knowledge. 🤝

Let's keep pushing forward to recreate ever more realistic digital human faces! 💪 We've come so far but still have a long way to go. With continued research 🔬 and collaboration, I'm sure we'll get there! 🤗

Please feel free to star ⭐ and share this repo if you find it a valuable resource. Your support helps motivate me to keep maintaining and improving it. 🥰 Let me know if you have any other questions!

## Datasets

![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/841257d9dee74547bbd4f717794a9492.png#pic_center)

| Dataset                       | Download Link                                                | Description                                                  |

| ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |

| Faceforensics++               | [Download link](https://github.com/ondyari/FaceForensics)    |                                                              |

| CelebV                        | [Download link](https://drive.google.com/file/d/1jQ6d76T5GQuvQH4dq8_Wq1T0cxvN0_xp/view) |                                                              |

| VoxCeleb                      | [Download link](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) | `VoxCeleb`, a comprehensive audio-visual dataset for speaker recognition, encompasses both VoxCeleb1 and VoxCeleb2 datasets. |

| VoxCeleb1                     | [Download link](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) | `VoxCeleb1` contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube. |

| VoxCeleb2                     | [Download link](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html) | Extracted from YouTube videos, VoxCeleb2 includes video URLs and discourse timestamps. As the largest public audio-visual dataset, it is primarily used for speaker recognition tasks. However, it can also be utilized for training talking-head generation models. To obtain download permission and access the dataset, apply [here](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/). Requires 300 GB+ storage space. |

| ObamaSet                      | [Download link](https://github.com/supasorn/synthesizing_obama_network_training) | `ObamaSet` is a specialized audio-visual dataset focused on analyzing the visual speech of former US President Barack Obama. All video samples are collected from his weekly address footage. Unlike previous datasets, it exclusively centers on Barack Obama and does not provide any human annotations. |

| TalkingHead-1KH               | [Download link](https://github.com/tcwang0509/TalkingHead-1KH) | The dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos. |

| LRW (Lip Reading in the Wild) | [Download link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html) | LRW, a diverse English-speaking video dataset from the BBC program, features over 1000 speakers with various speaking styles and head poses. Each video is 1.16 seconds long (29 frames) and involves the target word along with context. |

| MEAD 2020                     | [Download link](https://github.com/uniBruce/Mead)            | MEAD 2020 is a Talking Head dataset annotated with emotion labels and intensity labels. The dataset focuses on facial generation for natural emotional speech, covering eight different emotions on three intensity levels. |

| CelebV-HQ                     | [Download link](https://github.com/CelebV-HQ/CelebV-HQ)      | CelebV-HQ is a high-quality video dataset comprising 35,666 clips with a resolution of at least 512x512. It includes 15,653 identities, and each clip is manually labeled with 83 facial attributes, spanning appearance, action, and emotion. The dataset's diversity and temporal coherence make it a valuable resource for tasks like unconditional video generation and video facial attribute editing.  [百度网盘](https://pan.baidu.com/s/1TGzOwUcXsRw72l4gaWre_w?pwd=pg71) [GoogleDriver](https://drive.google.com/drive/folders/19DLr27P9xMOTn_W6hxpxxm8_5jJoX-nR) |

| HDTF                          | [Download link](https://github.com/MRzzm/HDTF)               | HDTF, the High-definition Talking-Face Dataset, is a large in-the-wild high-resolution audio-visual dataset consisting of approximately 362 different videos totaling 15.8 hours. Original video resolutions are 720 P or 1080 P, and each cropped video is resized to 512 × 512. |

| CREMA-D                       | [Download link](https://github.com/CheyneyComputerScience/CREMA-D) | CREMA-D is a diverse dataset with 7,442 original clips featuring 91 actors, including 48 male and 43 female actors aged 20 to 74, representing various races and ethnicities. The dataset includes recordings of actors speaking from a set of 12 sentences, expressing six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) at four emotion levels (Low, Medium, High, and Unspecified). Emotion and intensity ratings were gathered through crowd-sourcing, with 2,443 participants rating 90 unique clips each (30 audio, 30 visual, and 30 audio-visual). Over 95% of the clips have more than 7 ratings. For additional details on CREMA-D, refer to the [paper link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4313618/). |

| LRS2                          | [Download link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html) | LRS2 is a lip reading dataset that includes videos recorded in diverse settings, suitable for studying lip reading and visual speech recognition. |

| GRID                          | [Download link](http://spandh.dcs.shef.ac.uk/avlombard/)     | The GRID dataset was recorded in a laboratory setting with 34 volunteers, each speaking 1000 phrases, totaling 34,000 utterance instances. Phrases follow specific rules, with six words randomly selected from six categories: "command," "color," "preposition," "letter," "number," and "adverb." Access the dataset [here](https://spandh.dcs.shef.ac.uk/gridcorpus/). |

| SAVEE                         | [Download link](http://kahlan.eps.surrey.ac.uk/savee/Download.html) | The SAVEE (Surrey Audio-Visual Expressed Emotion) database is a crucial component for developing an automatic emotion recognition system. It features recordings from 4 male actors expressing 7 different emotions, totaling 480 British English utterances. These sentences, selected from the standard TIMIT corpus, are phonetically balanced for each emotion. Recorded in a high-quality visual media lab, the data undergoes processing and labeling. Performance evaluation involves 10 subjects rating recordings under audio, visual, and audio-visual conditions. Classification systems for each modality achieve speaker-independent recognition rates of 61%, 65%, and 84% for audio, visual, and audio-visual, respectively. |

| BIWI(3D)                      | [Download link](https://data.vision.ee.ethz.ch/cvl/datasets/b3dac2.en.html) | The Biwi 3D Audiovisual Corpus of Affective Communication serves as a compromise between data authenticity and quality, acquired at ETHZ in collaboration with SYNVO GmbH. |

| VOCA                          | [Download link](https://voca.is.tue.mpg.de/)                 | VOCA is a 4D-face dataset with approximately 29 minutes of 4D face scans and synchronized audio from 12-bit speakers. It greatly facilitates research in 3D VSG. |

| Multiface(3D)                 | [Download link](https://github.com/facebookresearch/multiface) | The Multiface Dataset consists of high-quality multi-view video recordings of 13 people displaying various facial expressions. It contains approximately 12,200 to 23,000 frames per subject, captured at 30 fps from around 40 to 160 camera views with uniform lighting. The dataset's size is 65TB and includes raw images (2048x1334 resolution), tracked and meshed heads, 1024x1024 unwrapped face textures, camera calibration metadata, and audio. This repository provides code for downloading the dataset and building a codec avatar using a deep appearance model. |

| MMFace4D                      | [Download link](https://wuhaozhe.github.io/mmface4d/)        | The MMFace4D dataset is a large-scale multi-modal dataset for audio-driven 3D facial animation research. It contains over 35,000 sequences captured from 431 subjects ranging in age from 15 to 68 years old. Various sentences from scenarios such as news broadcasting, conversations and storytelling were recorded, totaling around 11,000 utterances. High-fidelity data was captured using three synchronized RGB-D cameras to obtain high-resolution 3D meshes and textures. A reconstruction pipeline was developed to fuse the multi-view data and generate topology-consistent 3D mesh sequences. In addition to the 3D facial motions, synchronized speech audio is also provided. The final dataset covers a wide range of expressive talking styles and facial expressions through a diverse set of subjects and utterances. With its large scale, high quality of data and strong diversity, the MMFace4D dataset provides an ideal benchmark for developing and evaluating audio-driven 3D facial animation models. |

| VFHQ                          | [Download link](https://liangbinxie.github.io/projects/vfhq) | Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over **16,000 high-fidelity clips** of **diverse interview scenarios**. To verify the necessity of VFHQ, we further conduct experiments and demonstrate that VFSR models trained on our VFHQ dataset can generate results with sharper edges and finer textures than those trained on VoxCeleb1. In addition, we show that the temporal information plays a pivotal role in eliminating video consistency issues as well as further improving visual performance. Based on VFHQ, by analyzing the benchmarking study of several state-of-the-art algorithms under bicubic and blind settings. |

| MultiTalk                     | [Download link](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset) | MultiTalk dataset is a new multilingual 2D video dataset featuring over 420 hours of talking videos across 20 languages. It contains 293,812 clips with a resolution of 512x512, a frame rate of 25 fps, and an average duration of 5.19 seconds per clip. The dataset shows a balanced distribution across languages, with each language representing between 2.0% and 9.7% of the total. |

| CN-CVS                        | [Download link](https://cnceleb.org)                         | CN-Celeb-AV is a multi-genre audio-visual person recognition dataset covering 11 different genres in the real world, collected from multiple Chinese open media sources. CN-CVS is a large-scale continuous visual-speech dataset in Mandarin Chinese consisting of short clips collected from TV news and Internet speech shows. |

---

## Survey

| Year | Title                                                        | Conference/Journal |

| ---- | ------------------------------------------------------------ | ------------------ |

| 2024 | [Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey](https://arxiv.org/abs/2411.17911v1) | arXiv 2024         |

| 2024 | [A Survey on 3D Human Avatar Modeling — From Reconstruction to Generation](http://arxiv.org/abs/2406.04253v1) | arXiv 2024         |

| 2024 | [Deepfake Generation and Detection: A Benchmark and Survey](https://arxiv.org/abs/2403.17881v1) [Github](https://github.com/flyingby/Awesome-Deepfake-Generation-and-Detection) | arXiv 2024         |

| 2024 | [A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos ](http://arxiv.org/abs/2403.06421v1) [Code](https://github.com/zwx8981/ADTH-QA) | arXiv 2024         |

| 2024 | [How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey](https://arxiv.org/abs/2402.13255) 3DGS+SLAM🔥🔥🔥 | arXiv 2024         |

| 2024 | [3D Gaussian as a New Vision Era: A Survey](https://arxiv.org/abs/2402.07181) 3DGS🔥🔥🔥 | arXiv 2024         |

| 2024 | [Advances in 3D Generation: A Survey](https://arxiv.org/abs/2401.17807) | arXiv 2024         |

| 2024 | [A Survey on 3D Gaussian Splatting](https://arxiv.org/pdf/2401.03890.pdf) 3DGS🔥🔥🔥**on going** | arXiv 2024         |

| 2024 | [Neural Radiance Fields: Past, Present, and Future](https://arxiv.org/pdf/2304.10050.pdf)  NeRF🔥🔥🔥 **Amazing 413 pages** | arXiv 2024         |

| 2023 | [From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications](https://arxiv.org/abs/2308.16041) | arXiv 2023         |

| 2023 | [Human-Computer Interaction System: A Survey of Talking-Head Generation](https://www.mdpi.com/2079-9292/12/1/218) | IEEE               |

| 2023 | [Talking human face generation: A survey](https://dl.acm.org/doi/10.1016/j.eswa.2023.119678) | ACM                |

| 2022 | [Deep Learning for Visual Speech Analysis: A Survey](https://arxiv.org/abs/2205.10839) | arXiv 2022         |

| 2020 | [What comprises a good talking-head video generation?: A Survey and Benchmark](https://arxiv.org/abs/2005.03201) | arXiv 2020         |

---

## Funny Work

| Year | Title                                                        | Code                                                         | Project                                                      | Keywords                  |

| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------- |

| 2024 | [Audio2Photoreal] [From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations](https://arxiv.org/pdf/2401.01885.pdf) | [Code](https://github.com/facebookresearch/audio2photoreal/) | [Project](https://people.eecs.berkeley.edu/~evonne_ng/projects/audio2photoreal/#) | Photoreal                 |

| 2024 | [Animate Anyone] [Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation](https://arxiv.org/pdf/2311.17117.pdf) | [Code](https://github.com/HumanAIGC/AnimateAnyone)           | [Project](https://humanaigc.github.io/animate-anyone/)       | 🔥Animate (阿里科目三驱动) |

| 2024 | [3DGAN] [What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs](https://research.nvidia.com/labs/nxp/wysiwyg/media/WYSIWYG.pdf) |                                                              | [Project](https://research.nvidia.com/labs/nxp/wysiwyg/)     | 🔥Nvidia                   |

| 2024 | [LivePortrait] [LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control](https://arxiv.org/pdf/2407.03168) | [Codea](https://github.com/KwaiVGI/LivePortrait)             | [Project](https://liveportrait.github.io/)                   | 🔥快手                     |

---

## Audio-driven

| Year | Title                                                        | Conference/Journal                                           | Code                                                         | Project                                                      | Keywords                                                     |

| ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |

| 2024 | [GLCF] [GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection](http://arxiv.org/abs/2412.13656v1) | Arxiv 2024                                                   |                                                              |                                                              | Dataset                                                      |

| 2024 | [VQTalker] [VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization](http://arxiv.org/abs/2412.09892v1) | Arxiv 2024                                                   | [Code](https://github.com/X-LANCE/VQTalker)                  | [Project](https://x-lance.github.io/VQTalker)                | visemes, code book                                           |

| 2024 | [PointTalk] [PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis](http://arxiv.org/abs/2412.08504v1) | AAAI 2025                                                    |                                                              |                                                              | Point Cloud, Gaussian Splatting                              |

| 2024 | [EmotiveTalk] [EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion](http://arxiv.org/abs/2411.16726v2) | CVPR 2025 review                                             |                                                              | [Project](https://emotivetalk.github.io)                     | Emotion, Expressive, Diffusion                               |

| 2024 | [GoHD] [GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression](http://arxiv.org/abs/2412.09296v2) | AAAI 2025                                                    | [Code](https://github.com/Jia1018/GoHD)                      |                                                              | Gaze-oriented                                                |

| 2024 | [EmoDubber] [EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing](http://arxiv.org/abs/2412.08988v1) | Arxiv 2024                                                   |                                                              | [Project](https://galaxycong.github.io/EmoDub/)              | Emotion, Dubber                                              |

| 2024 | [PortraitTalk] [PortraitTal: Towards Customizable One-Shot Audio-to-Talking Face Generation](http://arxiv.org/abs/2412.07754v1) | Arxiv 2024                                                   |                                                              |                                                              | Diffusion, Attention                                         |

| 2024 | [DEEPTalk] [DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation](http://arxiv.org/abs/2408.06010v2) | AAAI 2025                                                    | [Code](https://github.com/whwjdqls/DEEPTalk)                 | [Project](http://whwjdqls.github.io/deeptalk.github.io/)     | 3D face, Emotion                                             |

| 2024 | [LatentSync] [LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync](http://arxiv.org/abs/2412.09262v1) | Arxiv 2024                                                   |                                                              |                                                              | Diffusion, SyncNet                                           |

| 2024 | [GoHD] [GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression](http://arxiv.org/abs/2412.09296v1) | AAAI 2025                                                    | [Code](https://github.com/Jia1018/GoHD)                      |                                                              | Gaze                                                         |

| 2024 | [FLOAT] [FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait](http://arxiv.org/abs/2412.01064v2) | Arxiv 2024                                                   | Soon                                                         | [Project](https://deepbrainai-research.github.io/float/)     | Diffusion                                                    |

| 2024 | [SVP] [SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model](http://arxiv.org/abs/2409.03270v2) | Arxiv 2024                                                   |                                                              |                                                              | Diffusion, Style                                             |

| 2024 | [Mini-Omni] [Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming](http://arxiv.org/abs/2408.16725v3) | Technical report                                             | [Code](https://github.com/gpt-omni/mini-omni)                |                                                              | Omni！！！                                                   |

| 2024 | [ControlTalk] [Controllable Talking Face Generation by Implicit Facial Keypoints Editing](http://arxiv.org/abs/2406.02880v2) | Arxiv 2024                                                   | [Code](https://github.com/NetEase-Media/ControlTalk)         |                                                              | Face Edit                                                    |

| 2024 | [SPEAK] [SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation](http://arxiv.org/abs/2405.07257v3) | Arxiv 2024                                                   |                                                              | [Demo](https://anonymous.4open.science/r/SPEAK-8A22)         |                                                              |

| 2024 | [LokiTalk] [LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis](http://arxiv.org/abs/2411.19525v1) | Arxiv 2024                                                   |                                                              |                                                              | NeRF                                                         |

| 2024 | [MEMO] [MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation](http://arxiv.org/abs/2412.04448v1) | Arxiv 2024                                                   | [Code](https://github.com/memoavatar/memo)                   | [Project](https://memoavatar.github.io/)                     | Memory                                                       |

| 2024 | [INFP] [INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations](http://arxiv.org/abs/2412.04037v1) | Arxiv 2024                                                   |                                                              | [Project](https://grisoon.github.io/INFP/)                   | Dyadic Conversations                                         |

| 2024 | [IF-MDM] [IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation](http://arxiv.org/abs/2412.04000v1) | Arxiv 2024                                                   |                                                              | [Project](https://bit.ly/ifmdm_supplementary)                | Motion Diffusion Model                                       |

| 2024 | [MemFace] [Memories are One-to-Many Mapping Alleviators in Talking Face Generation](https://arxiv.org/abs/2212.05005v4) | IEEE 2024                                                    |                                                              | [Project](https://memoryface.github.io/)                     | Memory                                                       |

| 2024 | [Ditto] [Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis](http://arxiv.org/abs/2411.19509v1) | Arxiv 2024                                                   |                                                              |                                                              | Diffusion                                                    |

| 2024 | [GaussianSpeech] [GaussianSpeech: Audio-Driven Gaussian Avatars](http://arxiv.org/abs/2411.18675v1) | Arxiv 2024                                                   | [Code](https://github.com/shivangi-aneja/GaussianSpeech)     | [Project](https://shivangi-aneja.github.io/projects/gaussianspeech/) | 3DGS, 3D                                                     |

| 2024 | [LetsTalk] [LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis](http://arxiv.org/abs/2411.16748v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [EmotiveTalk] [EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion](http://arxiv.org/abs/2411.16726v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [Sonic] [Sonic: Shifting Focus to Global Audio Perception in Portrait Animation](http://arxiv.org/abs/2411.16331v1) | Arxiv 2024                                                   |                                                              | [Project](https://jixiaozhong.github.io/Sonic/)              | Diffusion                                                    |

| 2024 | [S^3D-NeRF] [S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis](https://arxiv.org/abs/2408.09347) | ECCV 2024                                                    |                                                              |                                                              |                                                              |

| 2024 | [LES-Talker] [LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space](http://arxiv.org/abs/2411.09268v1) | Arxiv 2024                                                   |                                                              | [Project](https://peterfanfan.github.io/LES-Talker/)         | Fine-Grained Emotion                                         |

| 2024 | [JoyVASA] [JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation](http://arxiv.org/abs/2411.09209v1) | Arxiv 2024                                                   | [Code](https://github.com/jdh-algo/JoyVASA)                  | [Project](https://jdh-algo.github.io/JoyVASA)                | Diffusion, VASA                                              |

| 2024 | [JoyHallo] [JoyHallo: Digital human model for Mandarin](https://arxiv.org/abs/2409.13268) | Arxiv 2024                                                   | [Code](https://github.com/jdh-algo/JoyHallo)                 | [Project](https://jdh-algo.github.io/JoyHallo/)              | Diffusion, Hallo                                             |

| 2024 | [Hallo2] [Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation](https://arxiv.org/abs/2410.07718) | Arxiv 2024                                                   | [Code](https://github.com/fudan-generative-vision/hallo2)    | [Project](https://fudan-generative-vision.github.io/hallo2/#/) | Diffusion, Hallo                                             |

| 2024 | [Audio-Driven Emotional 3D Talking-Head Generation](http://arxiv.org/abs/2410.17262v1) | Arxiv 2024                                                   |                                                              |                                                              | Emotion                                                      |

| 2024 | [Stereo-Talker] [Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts](http://arxiv.org/abs/2410.23836v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [Takin-ADA] [Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization](http://arxiv.org/abs/2410.14283v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [DAWN] [DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation](http://arxiv.org/abs/2410.13726v2) | Arxiv 2024                                                   | [Code](https://github.com/Hanbo-Cheng/DAWN-pytorch)          | [Project](https://hanbo-cheng.github.io/DAWN/)               | Non-autoregressive Diffusion                                 |

| 2024 | [LaDTalk] [LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details](http://arxiv.org/abs/2410.00990v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [Diverse Code Query Learning for Speech-Driven Facial Animation](http://arxiv.org/abs/2409.19143v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [TalkinNeRF] [TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans](http://arxiv.org/abs/2409.16666v1) | ECCVW 2024                                                   |                                                              | [Project](https://aggelinacha.github.io/TalkinNeRF/)         | NeRF                                                         |

| 2024 | [ProbTalk3D] [ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE](http://arxiv.org/abs/2409.07966v2) | SIGGRAPH MIG 2024                                            | [Code](https://github.com/uuembodiedsocialai/ProbTalk3D/)    | [Project](https://uuembodiedsocialai.github.io/ProbTalk3D/)  | 3D                                                           |

| 2024 | [JEAN] [JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation](http://arxiv.org/abs/2409.12156v1) | BMVC 2024                                                    |                                                              | [Project](https://starc52.github.io/publications/2024-07-19-JEAN) | NeRF                                                         |

| 2024 | [3DFacePolicy] [3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy](http://arxiv.org/abs/2409.10848v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [LawDNet] [LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation](http://arxiv.org/abs/2409.09326v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [StyleTalk++] [StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads](http://arxiv.org/abs/2409.09292v1) | TPAMI 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [DiffTED] [DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures](http://arxiv.org/abs/2409.07649v1) | Arxiv 2024                                                   |                                                              |                                                              | diffusion                                                    |

| 2024 | [EMOdiffhead] [EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion](http://arxiv.org/abs/2409.07255v1) | Arxiv 2024                                                   |                                                              |                                                              | Diffusion                                                    |

| 2024 | [PersonaTalk] [PersonaTalk: Bring Attention to Your Persona in Visual Dubbing](http://arxiv.org/abs/2409.05379v1) | SIGGRAPH Asia 2024                                           |                                                              | [Project](https://grisoon.github.io/PersonaTalk/)            |                                                              |

| 2024 | [KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation](http://arxiv.org/abs/2409.05330v1) | Arxiv 2024                                                   |                                                              |                                                              | KAN                                                          |

| 2024 | [TalkLoRA] [TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation](http://arxiv.org/abs/2408.13714v1) | Arxiv 2024                                                   |                                                              |                                                              | LoRA                                                         |

| 2024 | [Avatar Concept Slider] [Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control](http://arxiv.org/abs/2408.13995v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [G3FA] [G3FA: Geometry-guided GAN for Face Animation](http://arxiv.org/abs/2408.13049v1) | BMVC 2024                                                    | [Code](https://github.com/dfki-av/G3FA)                      |                                                              |                                                              |

| 2024 | [EmoFace] [EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention](http://arxiv.org/abs/2408.11518v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [Meta-Face] [Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation](http://arxiv.org/abs/2408.09357v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [DEEPTalk] [DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation](http://arxiv.org/abs/2408.06010v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model](http://arxiv.org/abs/2408.05416v1) | TIP ?                                                        |                                                              |                                                              |                                                              |

| 2024 | [Style-Preserving Lip Sync via Audio-Aware Style Reference](http://arxiv.org/abs/2408.05412v1) | TIP ?                                                        |                                                              |                                                              |                                                              |

| 2024 | [Talk to the Wall] [Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics](http://arxiv.org/abs/2408.03813v2) | TVCG 2024                                                    |                                                              | [Project](https://osf.io/8gpv2/)                             | Collaborative                                                |

| 2024 | [MDT-A2G] [MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation](http://arxiv.org/abs/2408.03312v1) | Arxiv 2024                                                   |                                                              |                                                              | Co-Speech Gesture                                            |

| 2024 | [GLDiTalker] [GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer](http://arxiv.org/abs/2408.01826v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [UniTalker] [UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model](http://arxiv.org/abs/2408.00762v1) | Arxiv 2024                                                   | [Code](https://github.com/X-niper/UniTalker)                 |                                                              |                                                              |

| 2024 | [DiM-Gesture] [DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework](http://arxiv.org/abs/2408.00370v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [What if Red Can Talk?] [What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models](http://arxiv.org/abs/2407.20382v1) | ACL Wordplay 2024                                            |                                                              |                                                              |                                                              |

| 2024 | [LinguaLinker] [LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement](http://arxiv.org/abs/2407.18595v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [RealTalk] [RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network](http://arxiv.org/abs/2406.18284v2) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation](http://arxiv.org/abs/2408.01732v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [JambaTalk] [JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model](http://arxiv.org/abs/2408.01627v1) | Arxiv 2024                                                   |                                                              |                                                              | 3D                                                           |

| 2024 | [Talk Less, Interact Better] [Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs](http://arxiv.org/abs/2408.01417v1) | COLM 2024                                                    | [Code](https://github.com/lil-lab/ICCA)                      |                                                              | LLM                                                          |

| 2024 | [Digital Avatars] [Digital Avatars: Framework Development and Their Evaluation](http://arxiv.org/abs/2408.04068v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [EmoTalk3D] [EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head](http://arxiv.org/abs/2408.00297v1) | ECCV 2024                                                    |                                                              | [Project](https://nju-3dv.github.io/projects/EmoTalk3D)      |                                                              |

| 2024 | [PAV] [PAV: Personalized Head Avatar from Unstructured Video Collection](http://arxiv.org/abs/2407.21047v1) | ECCV 2024                                                    |                                                              | [Project](https://akincaliskan3d.github.io/PAV/)             |                                                              |

| 2024 | [Text-based Talking Video Editing with Cascaded Conditional Diffusion](http://arxiv.org/abs/2407.14841v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [EmoFace] [EmoFace: Audio-driven Emotional 3D Face Animation](http://arxiv.org/abs/2407.12501v1) | IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 2024 | [Code](https://github.com/SJTU-Lucy/EmoFace)                 |                                                              |                                                              |

| 2024 | [Learning Online Scale Transformation for Talking Head Video Generation](http://arxiv.org/abs/2407.09965v1) | Arxiv 2024                                                   |                                                              |                                                              |                                                              |

| 2024 | [EchoMimic] [EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning](https://github.com/BadToBest/EchoMimic) | Arxiv 2024                                                   | [Code](https://github.com/BadToBest/EchoMimic)               | [Project](https://badtobest.github.io/echomimic.html)        | 🔥阿里                                                        |

| 2024 | [Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN](http://arxiv.org/abs/2407.05577v1) | Arxiv 2024                                                   |                                                              |                                                              | StyleGAN                                                     |

| 2024 | [Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert](http://arxiv.org/abs/2407.01034v1) | Interspeech 2024                                             |                                                              | [Project](https://3d-talking-head-avguide.github.io/)        | 3D                                                           |

| 2024 | [MultiTalk] [MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset](http://arxiv.org/abs/2406.14272v1) | Interspeech 2024                                             | [Code](https://github.com/postech-ami/MultiTalk)             | [Project](https://multi-talk.github.io/)                     | 3D, Dataset                                                  |

| 2024 | [NLDF] [NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation](http://arxiv.org/abs/2406.11259v1) | Arxiv 2024                                                   |                                                              |                                                              | NeRF                                                         |

| 2024 | [Make Your Actor Talk] [Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement](http://arxiv.org/abs/2406.08096v2) | Arxiv 2024                                                   |                                                              | [Project](https://ingrid789.github.io/MyTalk/)               |                                                              |

| 2024 | [Talk With Human-like Agents] [Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction](http://arxiv.org/abs/2406.12707v1) | ACL 2024                                                     | [Code](https://github.com/Haoqiu-Yan/PerceptiveAgent)        |                                                              |                                                              |

| 2024 | [V-Express] [V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation](https://arxiv.org/abs/2406.02511) | Technique Report                                             | [Code](https://github.com/tencent-ailab/V-Express)           | [Project](https://tenvence.github.io/p/v-express/)           | 🔥EMO, Diffusion, Open-source                                 |

| 2024 | [CVTHead] [CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer](https://arxiv.org/abs/2311.06443) | WACV 2024                                                    | [Code](https://github.com/HowieMa/CVTHead)                   |                                                              |                                                              |

| 2024 | [Hallo] [Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation](https://arxiv.org/abs/2406.08801) | Arxiv 2024                                                   | [Code](https://github.com/fudan-generative-vision/hallo)     | [Project](https://fudan-generative-vision.github.io/hallo)   | 🔥EMO, Diffusion, Open-source                                 |

| 2024 | [Emotional Conversation] [Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation](http://arxiv.org/abs/2406.07895v1) | Arxiv 2024                                                   |                                                              |                                                              | Emotion                                                      |

| 2024 | [MultiDialog] [Let’s Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation](http://arxiv.org/abs/2406.07867v1) | ACL 2024                                                     | [Code](https://github.com/MultiDialog/MultiDialog)           | [Project](https://multidialog.github.io)                     | [dataset](https://huggingface.co/datasets/IVLLab/MultiDialog) |

| 2024 | [ControlTalk] [Controllable Talking Face Generation by Implicit Facial Keypoints Editing](http://arxiv.org/abs/2406.02880v1) | Arxiv 2024                                                   |                                                              |                                                              | Controller                                                   |

| 2024 | [InstructAvatar] [InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation](http://arxiv.org/abs/2405.15758v1) | Arxiv 2024                                                   | [Code](https://github.com/wangyuchi369/InstructAvatar)       | [Project](https://wangyuchi369.github.io/InstructAvatar/)    | Text-Guided                                                  |

| 2024 | [Listen, Disentangle, and Control] [Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation](http://arxiv.org/abs/2403.17881v4) | Arxiv 2024                                                   | [Code](https://github.com/flyingby/Awesome-Deepfake-Generation-and-Detection) |                                                              | A Brenchmark and Survey                                      |

| 2024 | [NeRFFaceSpeech] [NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior](http://arxiv.org/abs/2405.05749v2) | CVPRW 2024                                                   | [Code](https://github.com/rlgnswk/NeRFFaceSpeech_Code/)      | [Project](https://rlgnswk.github.io/NeRFFaceSpeech_ProjectPage/) | SadTalker+NeRF                                               |

| 2024 | [SwapTalk] [SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space](http://arxiv.org/abs/2405.05636v1) | Arxiv 2024                                                   |                                                              | [Project](https://swaptalk.cc/)                              |                                                              |

| 2024 | [AniTalker] [AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding](http://arxiv.org/abs/2405.03121v1) | Arxiv 2024                                                   | [Code](https://github.com/X-LANCE/AniTalker)                 | [Project](https://x-lance.github.io/AniTalker/)              |                                                              |

| 2024 | [EMOPortraits] [EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars](http://arxiv.org/abs/2404.19110v1) | Arxiv 2024                                                   |                                                              |                                                              | EMO                                                          |

| 2024 | [GaussianTalker] [GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting](https://arxiv.org/abs/2404.16012v2) | ACMM 2024                                                    | [Code](https://github.com/KU-CVLAB/gaussiantalker)           | [Project](https://ku-cvlab.github.io/GaussianTalker/)        | 🔥Gaussian Splatting                                          |

| 2024 | [CSTalk] [CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation](http://arxiv.org/abs/2404.18604v1) | Arxiv 2024                                                   |                                                              |                                                              | Emotion                                                      |

| 2024 | [GSTalker] [GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting](http://arxiv.org/abs/2404.19040v1) | Arxiv 2024                                                   |                                                              |                                                              | 🔥Gaussian Splatting                                          |

| 2024 | [GaussianTalker] [GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting](https://arxiv.org/abs/2404.14037v3) | ACMM 2024                                                    |                                                              | [Project](https://yuhongyun777.github.io/GaussianTalker/)    | 🔥Gaussian Splatting                                          |

| 2024 | [TalkingGaussian] [TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting](https://arxiv.org/pdf/2404.15264.pdf) | ECCV 2024                                                    | [Code](https://github.com/Fictionarry/TalkingGaussian)       | [Project](https://fictionarry.github.io/TalkingGaussian/)    | 🔥Gaussian Splatting                                          |

| 2024 | [Learn2Talk] [Learn2Talk: 3D Talking Face Learns from 2D Talking Face](http://arxiv.org/abs/2404.12888v1) | Arxiv 2024                                                   |                                                              |                                                              | 🔥Gaussian Splatting                                          |

| 2024 | [VASA-1] [VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time](https://arxiv.org/abs/2404.10667) | NeurIPS 2024 (Oral)                                          |                                                              | [Project](https://www.microsoft.com/en-us/research/project/vasa-1/) | 🔥🔥🔥Awesome，Microsoft                                        |

| 2024 | [Pose-Aware 3D Talking Face Synthesis using Geometry-guided Audio-Vertices Attention](https://ieeexplore.ieee.org/abstract/document/10452856) | IEEE 2024                                                    | [Code](https://github.com/sharlingw/PATFS)                   |                                                              |                                                              |

| 2024 | [THQA] [THQA: A Perceptual Quality Assessment Database for Talking Heads](http://arxiv.org/abs/2404.09003v1) | Arix 2024                                                    | [Code](https://github.com/zyj-2000/THQA)                     |                                                              |                                                              |

| 2024 | [EDTalk] [EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis](https://arxiv.org/abs/2404.01647v1) | ECCV 2024 (Oral)                                             | [Code](https://github.com/tanshuai0219/EDTalk)               | [Project](https://tanshuai0219.github.io/EDTalk/)            | Emotion                                                      |

| 2024 | [FaceChain-ImagineID] [FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio](https://arxiv.org/abs/2403.01901v2) | Arix 2024                                                    | [Code](https://github.com/modelscope/facechain)              |                                                              |                                                              |

| 2024 | [Talk3D] [Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior](http://arxiv.org/abs/2403.20153v1) | Arix 2024                                                    | [Code](https://github.com/KU-CVLAB/Talk3D)                   | [Project](https://ku-cvlab.github.io/Talk3D/)                |                                                              |

| 2024 | [AniPortrait] [AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation](https://arxiv.org/abs/2403.17694) | Arix 2024                                                    | [Code](https://github.com/Zejun-Yang/AniPortrait)            |                                                              | 🔥🔥🔥Similar to EMO                                            |

| 2024 | [Make-Your-Anchor] [Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework](http://arxiv.org/abs/2403.17881v1) | CVPR 2024                                                    | [Code](https://github.com/ICTMCG/Make-Your-Anchor)           |                                                              |                                                              |

| 2024 | [Adaptive Super Resolution For One-Shot Talking-Head Generation](http://arxiv.org/abs/2403.15944v1) | ICASSP 2024                                                  | [Code](https://github.com/Songluchuan/AdaSR-TalkingHead/)    |                                                              |                                                              |

| 2024 | [VLOGGER] [VLOGGER: Multimodal Diffusion for Embodied](https://enriccorona.github.io/vlogger/paper.pdf) | Arix 2024                                                    |                                                              | [Project](https://enriccorona.github.io/vlogger/)            | Embodied                                                     |

| 2024 | [EmoVOCA] [EmoVOCA: Speech-Driven Emotional 3D Talking Heads](http://arxiv.org/abs/2403.12886v1) | Arix 2024                                                    |                                                              |                                                              | 3D, VOCA                                                     |

| 2024 | [ScanTalk] [ScanTalk: 3D Talking Heads from Unregistered Scans](https://arxiv.org/abs/2403.10942) | Arix 2024                                                    |                                                              |                                                              | 3D                                                           |

| 2024 | [Style2Talker] [Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style](http://arxiv.org/abs/2403.06365v2) | Arix 2024                                                    |                                                              |                                                              |                                                              |

| 2024 | [EMO] [EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions](https://arxiv.org/abs/2402.17485v1) | Arix 2024                                                    | [Code](https://github.com/HumanAIGC/EMO)                     | [Project](https://humanaigc.github.io/emote-portrait-alive/) | 🔥🔥🔥Amazing, Diffusion                                        |

| 2024 | [G4G] [G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment](http://arxiv.org/abs/2402.18122v1) | Arix 2024                                                    |                                                              |                                                              | A Generic Framework                                          |

| 2024 | [Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis](http://arxiv.org/abs/2402.17364v1) | CVPR 2024                                                    |                                                              |                                                              | High-Quality                                                 |

| 2024 | [DiffSpeaker] [DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer](http://arxiv.org/abs/2402.05712v1) | Arix 2024                                                    | [Code](https://github.com/theEricMa/DiffSpeaker)             |                                                              | 3D                                                           |

| 2024 | [EmoSpeaker] [EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation](http://arxiv.org/abs/2402.01422v1) | Arix 2024                                                    | [Code](https://github.com/PeterFanFan/Emospeaker_code)       | [Project](https://peterfanfan.github.io/EmoSpeaker/)         | Emotion                                                      |

| 2024 | [NeRF-AD] [NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis](http://arxiv.org/abs/2401.12568v1) | ICASSP 2024                                                  | [Code](https://github.com/Xiaoxingliu02/NeRF-AD_code)        | [Project](https://xiaoxingliu02.github.io/NeRF-AD/)          | AU                                                           |

| 2024 | [Real3D-Portrait] [Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis](http://arxiv.org/abs/2401.08503v2) | ICLR 2024                                                    | [Code](https://github.com/yerfor/Real3DPortrait)             | [Project](https://real3dportrait.github.io/)                 | 3D, One-Shot,Realistic                                       |

| 2024 | [SyncTalk] [SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis](https://arxiv.org/abs/2311.17590) | CVPR 2024                                                    | [Code](https://github.com/ZiqiaoPeng/SyncTalk)               | [Project](https://ziqiaopeng.github.io/synctalk/)            | 😈Talking Head                                                |

| 2024 | [AdaMesh] [AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive   Speech-Driven 3D Facial Animation](http://arxiv.org/abs/2310.07236v2) | Arix 2024                                                    | [Code](https://github.com/adamesh/adamesh)                   | [Project](https://adamesh.github.io)                         | 3D,Mesh                                                      |

| 2024 | [DREAM-Talk] [DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation](http://arxiv.org/abs/2312.13578v1) | Arix 2024                                                    |                                                              | [Project](https://magic-research.github.io/dream-talk/)      | Emotion                                                      |

| 2024 | [AE-NeRF] [AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis](http://arxiv.org/abs/2312.10921v1) | AAAI 2024                                                    |                                                              |                                                              |                                                              |

| 2024 | [R2-Talker] [R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning](http://arxiv.org/abs/2312.05572v1) | Arxiv 2024                                                   | [Code](https://github.com/KylinYee/R2-Talker-code)           |                                                              | based-RAD-NeRF                                               |

| 2024 | [DT-NeRF] [DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis](https://arxiv.org/pdf/2309.07752) | ICASSP 2024                                                  | -                                                            | -                                                            | ER-NeRF                                                      |

| 2023 | [ER-NeRF] [Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Efficient_Region-Aware_Neural_Radiance_Fields_for_High-Fidelity_Talking_Portrait_Synthesis_ICCV_2023_paper.pdf) | ICCV 2023                                                    | [Code](https://github.com/Fictionarry/ER-NeRF)               | [Project](https://fictionarry.github.io/ER-NeRF/)            | Tri-plane                                                    |

| 2023 | [LipNeRF] [LipNeRF: What is the right feature space to lip-sync a NeRF?](https://www.amazon.science/publications/lipnerf-what-is-the-right-feature-space-to-lip-sync-a-nerf) | FG 2023                                                      | [Code](https://github.com/yerfor/GeneFacePlusPlus)           | [Project](https://aggelinacha.github.io/LipNeRF/)            | Wav2lip                                                      |

| 2024 | [VectorTalker] [VectorTalker: SVG Talking Face Generation with Progressive Vectorisation](http://arxiv.org/abs/2312.11568v1) | Arix 2024                                                    |                                                              |                                                              | SVG                                                          |

| 2024 | [Mimic] [Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation](http://arxiv.org/abs/2312.10877v1) | AAAI 2024                                                    |                                                              |                                                              | 3D                                                           |

| 2024 | [DreamTalk] [DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models](http://arxiv.org/abs/2312.09767v1) | Arix 2024                                                    | [Code](https://github.com/damo-vilab/i2vgen-xl)              | [Project](https://dreamtalk-project.github.io)               | Diffusion                                                    |

| 2024 | [FaceTalk] [FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models](http://arxiv.org/abs/2312.08459v1) | Arix 2024                                                    | [Code](https://github.com/shivangi-aneja/FaceTalk)           | [Project](https://shivangi-aneja.github.io/projects/facetalk/) |                                                              |

| 2024 | [GSmoothFace] [GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained   3D Face Guidance](http://arxiv.org/abs/2312.07385v1) | Arix 2024                                                    |                                                              |                                                              | 3D                                                           |

| 2024 | [GMTalker] [GMTalker: Gaussian Mixture based Emotional talking video Portraits](http://arxiv.org/abs/2312.07669v1) | Arix 2024                                                    |                                                              | [Project](https://bob35buaa.github.io/GMTalker)              | Emotion                                                      |

| 2024 | [VividTalk] [VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior](http://arxiv.org/abs/2312.01841v2) | Arix 2024                                                    |                                                              |                                                              | Mesh                                                         |

| 2024 | [GAIA] [GAIA: Zero-shot Talking Avatar Generation](https://arxiv.org/pdf/2311.15230.pdf) | Arix 2024                                                    | Code(coming)                                                 | [Project](https://microsoft.github.io/GAIA/)                 | 😲😲😲                                                          |

| 2023 | [Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation](https://arxiv.org/abs/2307.10008) | ICCV 2023                                                    | [Code](https://github.com/harlanhong/ICCV2023-MCNET)         | [Project](https://harlanhong.github.io/publications/mcnet.html) | -                                                            |

| 2023 | [ToonTalker] [ToonTalker: Cross-Domain Face Reenactment](https://openaccess.thecvf.com/content/ICCV2023/papers/Gong_ToonTalker_Cross-Domain_Face_Reenactment_ICCV_2023_paper.pdf) | ICCV 2023                                                    | -                                                            | -                                                            | -                                                            |

| 2023 | [Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation](https://arxiv.org/abs/2309.04946) | ICCV 2023                                                    | [Code](https://github.com/yuangan/EAT_code)                  | [Project](https://yuangan.github.io/eat/)                    | -                                                            |

| 2023 | [EMMN] [EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation](https://openaccess.thecvf.com/content/ICCV2023/papers/Tan_EMMN_Emotional_Motion_Memory_Network_for_Audio-driven_Emotional_Talking_Face_ICCV_2023_paper.pdf) | ICCV 2023                                                    | -                                                            | -                                                            | Emotion                                                      |

| 2023 | [Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_Emotional_Listener_Portrait_Neural_Listener_Head_Generation_with_Emotion_ICCV_2023_paper.pdf) | ICCV 2023                                                    | -                                                            | -                                                            | Emotion,LHG                                                  |

| 2023 | [MODA] [MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions](https://arxiv.org/abs/2307.10008) | ICCV 2023                                                    | -                                                            | -                                                            | -                                                            |

| 2023 | [Facediffuser] [Facediffuser: Speech-driven 3d facial animation synthesis using diffusion](https://dl.acm.org/doi/abs/10.1145/3623264.3624447) | ACM SIGGRAPH MIG 2023                                        | [Code](https://github.com/uuembodiedsocialai/FaceDiffuser)   | [Project](https://uuembodiedsocialai.github.io/FaceDiffuser/) | 🔥Diffusion,3D                                                |

| 2023 | [Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis](https://arxiv.org/abs/2309.00030) | TCSVT 2023                                                   | -                                                            | -                                                            |                                                              |

| 2023 | [SadTalker] [SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation](https://arxiv.org/pdf/2211.12194.pdf) | CVPR 2023                                                    | [Code](https://github.com/Winfredy/SadTalker)                | [Project](https://sadtalker.github.io/)                      | 3D,Single Image                                              |

| 2023 | [EmoTalk] [EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation](https://arxiv.org/abs/2303.11089) | ICCV 2023                                                    | [Code](https://github.com/ZiqiaoPeng/EmoTalk)                |                                                              | 3D,Emotion                                                   |

| 2023 | [Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks](https://arxiv.org/abs/2306.03594) | InterSpeech 2023                                             |                                                              |                                                              | Emotion                                                      |

| 2023 | [DINet] [DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video](https://fuxivirtualhuman.github.io/pdf/AAAI2023_FaceDubbing.pdf) | AAAI 2023                                                    | [Code](https://github.com/MRzzm/DINet)                       | -                                                            |                                                              |

| 2023 | [StyleTalk] [StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles](https://arxiv.org/abs/2301.01081) | AAAI 2023                                                    | [Code](https://github.com/FuxiVirtualHuman/styletalk)        | -                                                            | Style                                                        |

| 2023 | [High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning](https://arxiv.org/abs/2305.02572) | CVPR 2023                                                    | -                                                            | -                                                            | Emotion                                                      |

| 2023 | [StyleSync] [StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator](https://arxiv.org/pdf/2305.05445.pdf) | CVPR 2023                                                    | [Code](https://github.com/guanjz20/StyleSync)                | [Project](https://hangz-nju-cuhk.github.io/projects/StyleSync) | -                                                            |

| 2023 | [TalkLip] [TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading Expert](https://arxiv.org/pdf/2303.17480.pdf) | CVPR 2023                                                    | [Code](https://github.com/Sxjdwang/TalkLip)                  | -                                                            | -                                                            |

| 2023 | [CodeTalker] [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior](https://arxiv.org/abs/2301.02379) | CVPR 2023                                                    | [Code](https://github.com/Doubiiu/CodeTalker)                | [Project](https://doubiiu.github.io/projects/codetalker/)    | 3D,codebook                                                  |

| 2023 | [EmoGen] [Emotionally Enhanced Talking Face Generation](https://arxiv.org/pdf/2303.11548.pdf) | Arxiv 2023                                                   | [Code](https://github.com/sahilg06/EmoGen)                   | -                                                            | Emotion                                                      |

| 2023 | [DAE-Talker] [DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder](https://arxiv.org/pdf/2303.17550.pdf) | ACMM 2023                                                    | -                                                            | [Project](https://mstypulkowski.github.io/diffusedheads/)    | 🔥Diffusion                                                   |

| 2023 | [READ Avatars] [READ Avatars: Realistic Emotion-controllable Audio Driven Avatars](https://arxiv.org/abs/2303.00744) | Arxiv 2023                                                   | -                                                            | [Project](https://www.youtube.com/watch?v=QSyMl3vV0pA)       | -                                                            |

| 2023 | [DiffTalk] [DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis](https://arxiv.org/abs/2301.03786) | CVPR 2023                                                    | [Code](https://github.com/sstzal/DiffTalk)                   | [Project](https://sstzal.github.io/DiffTalk/)                | 🔥Diffusion                                                   |

| 2023 | [Diffused Heads] [Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation](https://mstypulkowski.github.io/diffusedheads/diffused_heads.pdf) | Arxiv 2023                                                   | -                                                            | [Project](https://mstypulkowski.github.io/diffusedheads/)    | 🔥Diffusion                                                   |

| 2022 | [VideoReTalking] [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild](https://arxiv.org/abs/2211.14758) | SIGGRAPH 2022                                                | [Code](https://github.com/OpenTalker/video-retalking)        | [Project](https://vinthony.github.io/video-retalking/)       |                                                              |

| 2022 | [MemFace] [Expressive Talking Head Generation with Granular Audio-Visual Control](https://openaccess.thecvf.com/content/CVPR2022/papers/Liang_Expressive_Talking_Head_Generation_With_Granular_Audio-Visual_Control_CVPR_2022_paper.pdf) | CVPR 2022                                                    | -                                                            | -                                                            | -                                                            |

| 2022 | [Talking Face Generation with Multilingual TTS](https://openaccess.thecvf.com/content/CVPR2022/papers/Song_Talking_Face_Generation_With_Multilingual_TTS_CVPR_2022_paper.pdf) | CVPR 2022                                                    | [Demo Track](https://huggingface.co/spaces/CVPR/ml-talking-face) | -                                                            | -                                                            |

| 2022 | [EAMM] [EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model](https://arxiv.org/pdf/2205.15278.pdf) | SIGGRAPH 2022                                                | -                                                            | -                                                            | Emotion                                                      |

| 2022 | [SPACEx] [SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression](https://arxiv.org/pdf/2211.09809.pdf) | arXiv 2022                                                   | -                                                            | [Project](https://deepimagination.cc/SPACEx/)                | -                                                            |

| 2022 | [AV-CAT] [Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers](https://arxiv.org/pdf/2212.04970.pdf) | SIGGRAPH Asia 2022                                           | -                                                            | -                                                            | -                                                            |

| 2022 | [MemFace] [Memories are One-to-Many Mapping Alleviators in Talking Face Generation](https://arxiv.org/pdf/2212.05005.pdf) | arXiv 2022                                                   | -                                                            | -                                                            | -                                                            |

| 2021 | [PC-AVS] [PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation](https://arxiv.org/abs/2104.11116) | CVPR 2021                                                    | [Code](https://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS) | [Project](https://hangz-nju-cuhk.github.io/projects/PC-AVS)  | -                                                            |

| 2021 | [IATS] [Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis](https://arxiv.org/abs/2111.00203) | ACM MM 2021                                                  | -                                                            | -                                                            | -                                                            |

| 2021 | [Speech2Talking-Face] [Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation](https://www.ijcai.org/proceedings/2021/0141.pdf) | IJCAI 2021                                                   | -                                                            | -                                                            | -                                                            |

| 2021 | [FAU] [Talking Head Generation with Audio and Speech Related Facial Action Units](https://arxiv.org/pdf/2110.09951.pdf) | BMVC 2021                                                    | -                                                            | -                                                            | AU                                                           |

| 2021 | [EVP] [Audio-Driven Emotional Video Portraits](https://openaccess.thecvf.com/content/CVPR2021/papers/Ji_Audio-Driven_Emotional_Video_Portraits_CVPR_2021_paper.pdf) | CVPR 2021                                                    | [Code](https://github.com/jixinya/EVP)                       | -                                                            | Emotion                                                      |

| 2021 | [IATS] [IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis](https://dl.acm.org/doi/pdf/10.1145/3474085.3475280) | ACM Multimedia 2021                                          | -                                                            | -                                                            | -                                                            |

| 2020 | [Wav2Lip] [A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild](http://arxiv.org/pdf/2008.10010.pdf) | ACM Multimedia 2020                                          | [Code](https://github.com/Rudrabha/Wav2Lip)                  | [Project](http://cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild/) | -                                                            |

| 2020 | [RhythmicHead] [Talking-head Generation with Rhythmic Head Motion](https://arxiv.org/pdf/2007.08547v1.pdf) | ECCV 2020                                                    | [Code](https://github.com/lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion) | -                                                            | -                                                            |

| 2020 | [MakeItTalk] [Speaker-Aware Talking-Head Animation](https://arxiv.org/pdf/2006.09661.pdf) | SIGGRAPH Asia 2020                                           | [Code](https://github.com/yzhou359/MakeItTalk)               | [Project](https://people.umass.edu/~yangzhou/MakeItTalk/)    | -                                                            |

| 2020 | [Neural Voice Puppetry] [Audio-driven Facial Reenactment](https://arxiv.org/pdf/1912.05566.pdf) | ECCV 2020                                                    | -                                                            | [Project](https://justusthies.github.io/posts/neural-voice-puppetry/) | -                                                            |

| 2020 | [MEAD] [A Large-scale Audio-visual Dataset for Emotional Talking-face Generation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660698.pdf) | ECCV 2020                                                    | [Code](https://github.com/uniBruce/Mead)                     | [Project](https://wywu.github.io/projects/MEAD/MEAD.html)    | -                                                            |

| 2020 | [Realistic Speech-Driven Facial Animation with GANs](https://arxiv.org/pdf/1906.06337.pdf) | IJCV 2020                                                    | -                                                            | -                                                            | -                                                            |

| 2019 | [DAVS] [Talking Face Generation by Adversarially Disentangled Audio-Visual Representation](https://arxiv.org/pdf/1807.07860.pdf) | AAAI 2019                                                    | [Code](https://github.com/Hangz-nju-cuhk/Talking-Face-Generation-DAVS) | -                                                            | -                                                            |

| 2019 | [ATVGnet] [Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss](https://www.cs.rochester.edu/~cxu22/p/cvpr2019_facegen_paper.pdf) | CVPR 2019                                                    | [Code](https://github.com/lelechen63/ATVGnet)                | -                                                            | -                                                            |

| 2018 | [Lip Movements Generation at a Glance](https://openaccess.thecvf.com/content_ECCV_2018/papers/Lele_Chen_Lip_Movements_Generation_ECCV_2018_paper.pdf) | ECCV 2018                                                    | [Code](https://github.com/lelechen63/3d_gan)                 | -                                                            | -                                                            |

| 2018 | [VisemeNet] [Audio-Driven Animator-Centric Speech Animation](https://arxiv.org/pdf/1805.09488.pdf) | SIGGRAPH 2018                                                | -                                                            | -                                                            | -                                                            |

| 2017 | [Synthesizing Obama] [Learning Lip Sync From Audio](https://grail.cs.washington.edu/projects/AudioToObama/siggraph17_obama.pdf) | SIGGRAPH 2017                                                | -                                                            | [Project](https://grail.cs.washington.edu/projects/AudioToObama/) | -                                                            |

| 2017 | [You Said That?] [Synthesising Talking Faces From Audio](https://arxiv.org/abs/1705.02966) | BMVC 2019                                                    | [Code](https://github.com/joonson/yousaidthat)               | -                                                            | -                                                            |

| 2017 | [Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion](https://users.aalto.fi/~laines9/publications/karras2017siggraph_paper.pdf) | SIGGRAPH 2017                                                | -                                                            | -                                                            | -                                                            |

| 2017 | [A Deep Learning Approach for Generalized Speech Animation](https://home.ttic.edu/~taehwan/taylor_etal_siggraph2017.pdf) | SIGGRAPH 2017                                                | -                                                            | -                                                            | -                                                            |

| 2016 | [LRW] [Lip Reading in the Wild](https://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16/chung16.pdf) | ACCV 2016                                                    | -                                                            | -                                                            | -                                                            |

---

## Text-driven

| Year | Title                                                        | Conference/Journal | Code/Proj                                                    |

| ---- | ------------------------------------------------------------ | ------------------ | ------------------------------------------------------------ |

| 2024 | [FT2TF: First-Person Statement Text-To-Talking Face Generation](http://arxiv.org/abs/2312.05430v2) | WACV 2025          |                                                              |

| 2024 | [HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04681.pdf) | ECCV 2024          | [Code](https://github.com/ZhenglinZhou/HeadStudio) [Project](https://zhenglinzhou.github.io/HeadStudio-ProjectPage/) |

| 2024 | [Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models](https://arxiv.org/abs/2306.16003v2) | ICASSP 2024        |                                                              |

| 2024 | [GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars](http://arxiv.org/abs/2408.13674v1) | Arxiv 2024         |                                                              |

| 2024 | [T3M: Text Guided 3D Human Motion Synthesis from Speech](http://arxiv.org/abs/2408.12885v1) | Arxiv              | [Code](https://github.com/Gloria2tt/T3M)                     |

| 2024 | [STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting](http://arxiv.org/abs/2406.04629v1) | Tech report        | [Project](https://star-avatar.github.io)                     |

| 2023 | [Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation](https://arxiv.org/pdf/2308.06457) | Arxiv              |                                                              |

| 2023 | [TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles](https://arxiv.org/pdf/2304.00334.pdf) | Arxiv              |                                                              |

| 2023 | [FT2TF: First-Person Statement Text-To-Talking Face Generation](https://arxiv.org/abs/2312.05430) | Arxiv              |                                                              |

| 2023 | [Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation](https://arxiv.org/pdf/2308.06457) | Arxiv              |                                                              |

| 2022 | [Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary](https://arxiv.org/pdf/2104.14631.pdf) | ICASSP 2022        | [Project](https://sites.google.com/view/sibozhang/text2video)   [Code](https://github.com/sibozhang/Text2Video) |

| 2021 | [Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation](https://arxiv.org/abs/2104.07995) | AAAI               | [Code](https://github.com/FuxiVirtualHuman/Write-a-Speaker)  |

| 2021 | [Txt2vid: Ultra-low bitrate compression of talking-head videos via text](https://arxiv.org/abs/2106.14014v3) | Arxiv              | [Code](https://github.com/tpulkit/txt2vid)                   |

---

## NeRF & 3D & Gaussian Splatting

| Year | Title                                                        | Conference/Journal | Code                                                         | Project                                                      | Keywords                            |

| ---- | ------------------------------------------------------------ | ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------------------- |

| 2024 | [Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters](http://arxiv.org/abs/2412.14333v1) | Arxiv 2024         |                                                              |                                                              | Co-speech                           |

| 2024 | [GraphAvatar] [GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians](http://arxiv.org/abs/2412.13983v1) | AAAI 2025          | [Code](https://github.com/ucwxb/GraphAvatar)                 |                                                              | GNN-Generated, 3DGS                 |

| 2024 | [CAP4D] [CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models](http://arxiv.org/abs/2412.12093v1) | Arxiv 2024         | [Code](https://github.com/felixtaubner/cap4d/)               | [Project](https://felixtaubner.github.io/cap4d/)             | Multi-View Diffusion                |

| 2024 | [3D$^2$-Actor] [3D$^2$-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling](http://arxiv.org/abs/2412.11599v1) | AAAI 2025          | [Code](https://github.com/silence-tang/GaussianActor)        | [Project](https://github.com/silence-tang/GaussianActor)     | 3DGS                                |

| 2024 | [StrandHead] [StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors](http://arxiv.org/abs/2412.11586v1) | Arxiv 2024         |                                                              | [Project](https://xiaokunsun.github.io/StrandHead.github.io) |                                     |

| 2024 | [Barbie] [Barbie: Text to Barbie-Style 3D Avatars](http://arxiv.org/abs/2408.09126v5) | Arxiv 2024         | [Code](https://github.com/XiaokunSun/Barbie)                 | [Project](https://xiaokunsun.github.io/Barbie.github.io)     | Text to Avatars                     |

| 2024 | [Human-3Diffusion] [Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models](http://arxiv.org/abs/2406.08475v2) | NIPS 2024          |                                                              | [Project](https://yuxuan-xue.com/human-3diffusion/)          | Diffusion                           |

| 2024 | [GAF] [GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion](http://arxiv.org/abs/2412.10209v1) | Arxiv 2024         | [Code](https://github.com/tangjiapeng/GAF) [Demo](https://youtu.be/QuIYTljvhyg) | [Project](https://tangjiapeng.github.io/projects/GAF)        | Diffusion                           |

| 2024 | [SimAvatar] [SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing](http://arxiv.org/abs/2412.09545v1) | Arxiv 2024         |                                                              | [Project](https://nvlabs.github.io/SimAvatar/)               | NVIDIA, Hair and Clothing           |

| 2024 | [GASP] [GASP: Gaussian Avatars with Synthetic Priors](http://arxiv.org/abs/2412.07739v1) | Arxiv 2024         |                                                              | [Project](https://microsoft.github.io/GASP/)                 |                                     |

| 2024 | [MixedGaussianAvatar] [MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting](http://arxiv.org/abs/2412.04955v1) | Arxiv 2024         |                                                              | [Project](https://chenvoid.github.io/MGA/)                   | 3DGS, 2D-3D                         |

| 2024 | [PBDyG] [PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars](http://arxiv.org/abs/2412.04433v2) | Arxiv 2024         |                                                              |                                                              | Clothed Avatar                      |

| 2024 | [Topology-aware Human Avatars with Semantically-guided Gaussian Splatting](http://arxiv.org/abs/2408.09665v2) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning](http://arxiv.org/abs/2408.09663v3) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [AniFaceDiff] [AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models](http://arxiv.org/abs/2406.13272v2) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [HHAvatar] [HHAvatar: Gaussian Head Avatar with Dynamic Hairs](http://arxiv.org/abs/2312.03029v3) | Arxiv 2024         |                                                              | [Project](https://liaozhanfeng.github.io/HHAvatar/)          | Hair                                |

| 2024 | [InstantGeoAvatar] [InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video](http://arxiv.org/abs/2411.01512v2) | ACCV 2024          | [Code](https://github.com/alvaro-budria/InstantGeoAvatar)    |                                                              |                                     |

| 2024 | [GAST] [GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context](http://arxiv.org/abs/2411.16768v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Bundle Adjusted Gaussian Avatars Deblurring](http://arxiv.org/abs/2411.16758v1) | Arxiv 2024         | [Code](https://github.com/MyNiuuu/BAGA)                      |                                                              |                                     |

| 2024 | [DynamicAvatars] [DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models](http://arxiv.org/abs/2411.15732v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [FATE] [FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video](http://arxiv.org/abs/2411.15604v1) | Arxiv 2024         |                                                              | [Project](https://zjwfufu.github.io/FATE-page/)              |                                     |

| 2024 | [ConsistentAvatar] [ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance](http://arxiv.org/abs/2411.15436v1) | Arxiv 2024         |                                                              | [Project](https://njust-yang.github.io/ConsistentAvatar.github.io/) |                                     |

| 2024 | [DAGSM] [DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh](http://arxiv.org/abs/2411.15205v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [DanceFusion] [DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction](https://arxiv.org/abs/2411.04646) | Arxiv 2024         |                                                              | [Project](https://th-mlab.github.io/DanceFusion/)            |                                     |

| 2024 | [Allo-AVA] [Allo-AVA: A Large-Scale Multimodal Conversational AI Dataset for Allocentric Avatar Gesture Animation](http://arxiv.org/abs/2410.16503v1) | Arxiv 2024         |                                                              |                                                              | dataset                             |

| 2024 | [EgoAvatar] [EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars](http://arxiv.org/abs/2410.01835v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Towards Native Generative Model for 3D Head Avatar](http://arxiv.org/abs/2410.01226v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality](http://arxiv.org/abs/2408.07041v2) | IEEE 2024          |                                                              |                                                              |                                     |

| 2024 | [Stable Video Portraits](http://arxiv.org/abs/2409.18083v1)  | ECCV 2024          |                                                              | [Project](https://svp.is.tue.mpg.de/)                        | Diffusion                           |

| 2024 | [LightAvatar] [LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field](http://arxiv.org/abs/2409.18057v1) | ECCV’24 CADL       | [Code](https://github.com/MingSun-Tse/LightAvatar-TensorFlow) |                                                              |                                     |

| 2024 | [DreamWaltz-G] [DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion](http://arxiv.org/abs/2409.17145v1) | Arxiv 2024         |                                                              | [Project](https://yukun-huang.github.io/DreamWaltz-G/)       |                                     |

| 2024 | [Gaussian Déjà-vu] [Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities](http://arxiv.org/abs/2409.16147v1) | WACV 2025          | [Code](https://github.com/PeizhiYan/gaussian-dejavu)         | [Project](https://peizhiyan.github.io/docs/dejavu/)          |                                     |

| 2024 | [Barbie] [Barbie: Text to Barbie-Style 3D Avatars](http://arxiv.org/abs/2408.09126v4) | Arxiv 2024         |                                                              | [Project](https://xiaokunsun.github.io/Barbie.github.io/)    |                                     |

| 2024 | [GaussianHeads] [GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations](http://arxiv.org/abs/2409.11951v1) | SIGGRAPH Asia 2024 |                                                              | [Project](https://vcai.mpi-inf.mpg.de/projects/GaussianHeads/) | 🔥Gaussian Splatting                 |

| 2024 | [Avatar Concept Slider] [Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control](http://arxiv.org/abs/2408.13995v2) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Barbie] [Barbie: Text to Barbie-Style 3D Avatars](https://arxiv.org/abs/2408.09126v3) | Arxiv 2024         | [Code](https://github.com/XiaokunSun/Barbie)                 | [Project](https://xiaokunsun.github.io/Barbie.github.io/)    |                                     |

| 2024 | [GenCA] [GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars](http://arxiv.org/abs/2408.13674v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Barbie] [Barbie: Text to Barbie-Style 3D Avatars](http://arxiv.org/abs/2408.09126v2) | Arxiv 2024         | [Code](https://github.com/XiaokunSun/Barbie)                 | [Project](https://xiaokunsun.github.io/Barbie.github.io/)    | Text                                |

| 2024 | [DEGAS] [DEGAS: Detailed Expressions on Full-Body Gaussian Avatars](http://arxiv.org/abs/2408.10588v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [CHASE] [CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning](http://arxiv.org/abs/2408.09663v2) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Expressive Whole-Body 3D Gaussian Avatar](http://arxiv.org/abs/2407.21686v1) | ECCV 2024          |                                                              | [Project](https://mks0601.github.io/ExAvatar/)               |                                     |

| 2024 | [AvatarPose] [AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos](https://feichilu.github.io/AvatarPose/assets/paper.pdf) | ECCV 2024          | [Code](https://github.com/feichilu/AvatarPose)               | [Project](https://feichilu.github.io/AvatarPose/)            |                                     |

| 2024 | [XHand] [XHand: Real-time Expressive Hand Avatar](http://arxiv.org/abs/2407.21002v1) | Arxiv 2024         | [Code](https://github.com/agnJason/XHand)                    |                                                              | Hand                                |

| 2024 | [Bridging the Gap] [Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture](http://arxiv.org/abs/2407.19593v2) | ECCV 2024          |                                                              | [Project](http://shahrukhathar.github.io/2024/07/22/Bridging.html) |                                     |

| 2024 | [CanonicalFusion] [CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images](http://arxiv.org/abs/2407.04345v2) | ECCV 2024          | [Code](https://github.com/jsshin98/CanonicalFusion)          |                                                              |                                     |

| 2024 | [WildAvatar] [WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation](http://arxiv.org/abs/2407.02165v3) | Arxiv 2024         | [Code](https://github.com/wildavatar/WildAvatar_Toolbox)     | [Project](https://wildavatar.github.io)                      | Dataset                             |

| 2024 | [Instant 3D Human Avatar Generation using Image Diffusion Models](http://arxiv.org/abs/2406.07516v2) | Arxiv 2024         |                                                              | [Project](https://www.nikoskolot.com/avatarpopup/)           |                                     |

| 2024 | [Gaussian Eigen Models for Human Heads](http://arxiv.org/abs/2407.04545v1) | Arxiv 2024         |                                                              | [Project](https://zielon.github.io/gem/)                     |                                     |

| 2024 | [MobilePortrait] [MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices](http://arxiv.org/abs/2407.05712v1) | Arxiv 2024         |                                                              |                                                              | Real-Time                           |

| 2024 | [CanonicalFusion] [CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images](http://arxiv.org/abs/2407.04345v1) | ECCV 2024          | [Code](https://github.com/jsshin98/CanonicalFusion)          |                                                              |                                     |

| 2024 | [Expressive Gaussian Human Avatars from Monocular RGB Video](http://arxiv.org/abs/2407.03204v1) | Arxiv 2024         |                                                              | [Project](https://evahuman.github.io/)                       |                                     |

| 2024 | [WildAvatar] [WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation](http://arxiv.org/abs/2407.02165v1) | Arxiv 2024         |                                                              |                                                              | Dataset                             |

| 2024 | [Human 3Diffusion] [Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models](http://arxiv.org/abs/2406.08475v1) | Arxiv 2024         |                                                              | [Project](https://yuxuan-xue.com/human-3diffusion)           | Diffusion                           |

| 2024 | [Instant 3D Human Avatar Generation using Image Diffusion Models](http://arxiv.org/abs/2406.07516v1) | Arxiv 2024         |                                                              | [Project](https://www.nikoskolot.com/avatarpopup/)           | Diffusion                           |

| 2024 | [Representing Animatable Avatar via Factorized Neural Fields](http://arxiv.org/abs/2406.00637v1) | Arxiv 2024         |                                                              |                                                              |                                     |

| 2024 | [Stratified Avatar Generation from Sparse Observations](http://arxiv.org/abs/2405.20786v2) | CVPR 2024 (Oral)   |                                                              |                                                              |                                     |

| 2024 | [NPGA] [NPGA: Neural Parametric Gaussian Avatars](http://arxiv.org/abs/2405.19331v1) | Arxiv 2024         |                                                              | [Project](https://simongiebenhain.github.io/NPGA/)           |                                     |

| 2024 | [E3Gen] [E3Gen: Efficient, Expressive and Editable Avatars Generation](http://arxiv.org/abs/2405.19203v2) | Arxiv 2024         | [Code](https://github.com/olivia23333/E3Gen)                 | [Project](https://olivia23333.github.io/E3Gen)               |                                     |

| 2024 | [GaussianVTON] [GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting](http://arxiv.org/abs/2405.07472v2) | On going work      |                                                              |                                                              | Try-ON                              |

| 2024 | [X-Oscar] [X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation](http://arxiv.org/abs/2405.00954v1) | ICML 2024          | [Code](https://github.com/LinZhekai/X-Oscar)                 | [Project](https://xmu-xiaoma666.github.io/Projects/X-Oscar/) |                                     |

| 2024 | [MeGA] [MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing](http://arxiv.org/abs/2404.19026v1) | Arxiv 2024         | [Code](https://github.com/conallwang/MeGA)                   | [Project](https://conallwang.github.io/MeGA_Pages/)          | 🔥Gaussian Splatting                 |

| 2024 | [Dynamic Gaussians Mesh] [Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos](http://arxiv.org/abs/2404.12379v2) | Arxiv 2024         | [Code](https://github.com/Isabella98Liu/DG-Mesh)             | [Project](https://www.liuisabella.com/DG-Mesh/)              | 🔥Gaussian Splatting                 |

| 2024 | [GeneAvatar] [GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image](https://arxiv.org/abs/2404.02152v1) | CVPR 2024          | [Code](https://github.com/zju3dv/GeneAvatar)                 | [Project](https://zju3dv.github.io/geneavatar/)              | Editing                             |

| 2024 | [Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes](http://arxiv.org/abs/2404.01543v1) | CVPR 2024          |                                                              | [Project](https://augmentedperception.github.io/monoavatar-plus/) | Blendshapes                         |

| 2024 | [SplattingAvatar] [SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting](https://arxiv.org/abs/2403.05087) | CVPR 2024          | [Code](https://github.com/initialneil/SplattingAvatar)       | [Project](https://initialneil.github.io/SplattingAvatar)     | 🔥Gaussian Splatting                 |

| 2024 | [MagicMirror] [MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space](https://arxiv.org/abs/2404.01296v1) | Arxiv 2024         |                                                              | [Project](https://syntec-research.github.io/MagicMirror/)    |                                     |

| 2024 | [HAHA] [HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior](https://arxiv.org/abs/2404.01053v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [UV Gaussians] [UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling](http://arxiv.org/abs/2403.11589v1) | Arxiv 2024         |                                                              | [Project](https://alex-jyj.github.io/UV-Gaussians/ )         | 🔥Gaussian Splatting                 |

| 2024 | [NECA] [NECA: Neural Customizable Human Avatar](http://arxiv.org/abs/2403.10335v1) | CVPR 2024          | [Code](https://github.com/iSEE-Laboratory/NECA)              |                                                              |                                     |

| 2024 | [V3D] [V3D: Video Diffusion Models are Effective 3D Generators](http://arxiv.org/abs/2403.06738v1) | Arxiv 2024         | [Code](https://github.com/heheyas/V3D)                       | [Project](https://heheyas.github.io/V3D/)                    | 🔥Gaussian Splatting, Video          |

| 2024 | [DNGaussian] [DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization](http://arxiv.org/abs/2403.06912v1) | CVPR 2024          | [Code](https://github.com/Fictionarry/DNGaussian)            | [Project](https://fictionarry.github.io/DNGaussian/)         | 🔥Gaussian Splatting, Sparse-View    |

| 2024 | [GEA] [GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video](http://arxiv.org/abs/2402.16607v1) | Arxiv 2024         |                                                              | [Project](https://3d-aigc.github.io/GEA/)                    | 🔥Gaussian Splatting, Avatar         |

| 2024 | [Magic-Me] [Magic-Me: Identity-Specific Video Customized Diffusion](http://arxiv.org/abs/2402.09368v1) | Arxiv 2024         | [Code](https://github.com/Zhen-Dong/Magic-Me)                | [Project](https://magic-me-webpage.github.io/)               |                                     |

| 2024 | [HeadStudio] [HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting](https://arxiv.org/abs/2402.06149) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting, Avatar         |

| 2024 | [GaussianHair] [GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians](https://arxiv.org/abs/2402.10483) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [ImplicitDeepfake] [ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting](https://arxiv.org/abs/2402.06390) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting, Deepfake       |

| 2024 | [Consolidating Attention Features for Multi-view Image Editing](https://arxiv.org/abs/2402.14792) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting, Edit           |

| 2024 | [Rig3DGS] [Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos](http://arxiv.org/abs/2402.03723v1) | Arxiv 2024         |                                                              | [Project](http://shahrukhathar.github.io/2024/02/05/Rig3DGS.html) | Portraits                           |

| 2024 | [4D Gaussian Splatting] [4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes](http://arxiv.org/abs/2402.03307v2) | Arxiv 2024         |                                                              |                                                              | Dynamic Scenes                      |

| 2024 | [ViCA-NeRF] [ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields](http://arxiv.org/abs/2402.00864v1) | NIPS 2023          | [Code](https://github.com/Dongjiahua/VICA-NeRF)              | [Project](https://dongjiahua.github.io/VICA-NeRF/)           | 3D Edit                             |

| 2024 | [CoSSegGaussians] [CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion](http://arxiv.org/abs/2401.05925v3) | Arxiv 2024         |                                                              | [Project](https://david-dou.github.io/CoSSegGaussians)       | Segmentic                           |

| 2024 | [Sketch2NeRF] [Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation](http://arxiv.org/abs/2401.14257v1) | Arxiv 2024         |                                                              |                                                              | Text to 3D                          |

| 2024 | [CoSSegGaussians] [CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion](http://arxiv.org/abs/2401.05925v2) | Arxiv 2024         |                                                              | [Project](https://david-dou.github.io/CoSSegGaussians/)      | 🔥Gaussian Splatting, Segmentation   |

| 2024 | [UltrAvatar] [UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures](http://arxiv.org/abs/2401.11078v1) | Arxiv 2024         |                                                              | [Project](http://usrc-sea.github.io/UltrAvatar/)             | Diffusion,Avatar                    |

| 2024 | [GaussianBody] [GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting](http://arxiv.org/abs/2401.09720v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [FED-NeRF] [FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF](http://arxiv.org/abs/2401.02616v1) | Arxiv 2024         | [Code](https://github.com/ZHANG1023/FED-NeRF)                |                                                              | 4D face video editor                |

| 2024 | [AGG] [AGG: Amortized Generative 3D Gaussians for Single Image to 3D](http://arxiv.org/abs/2401.04099v1) | Arxiv 2024         |                                                              | [Project](https://ir1d.github.io/AGG/)                       | 🔥Gaussian Splatting                 |

| 2024 | [Gaussian Shadow Casting for Neural Characters](http://arxiv.org/abs/2401.06116v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [Human101] [Human101: Training 100+FPS Human Gaussians in 100s from 1 View](http://arxiv.org/abs/2312.15258v1) | Arxiv 2024         | [Code](https://github.com/longxiang-ai/Human101)             | [Project](https://longxiang-ai.github.io/Human101/)          | 🔥Gaussian Splatting                 |

| 2024 | [Deformable 3D Gaussian Splatting for Animatable Human Avatars](http://arxiv.org/abs/2312.15059v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [4DGen] [4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency](http://arxiv.org/abs/2312.17225v1) | Arxiv 2024         | [Code](https://github.com/VITA-Group/4DGen)                  | [Project](https://vita-group.github.io/4DGen/ )              | 🔥Gaussian Splatting                 |

| 2024 | [3DGAN] [What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs](https://research.nvidia.com/labs/nxp/wysiwyg/media/WYSIWYG.pdf) | Arxiv 2024         |                                                              | [Project](https://research.nvidia.com/labs/nxp/wysiwyg/)     |                                     |

| 2024 | [3DGS-Avatar] [3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting](http://arxiv.org/abs/2312.09228v2) | Arxiv 2024         | [Code](https://github.com/mikeqzy/3dgs-avatar-release)       | [Project](https://neuralbodies.github.io/3DGS-Avatar)        | 🔥Gaussian Splatting                 |

| 2024 | [Learning Dense Correspondence for NeRF-Based Face Reenactment](http://arxiv.org/abs/2312.10422v2) | AAAI 2024          |                                                              |                                                              | one-shot multi-view face reenactmen |

| 2024 | [GaussianHead] [GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural Field](https://arxiv.org/abs/2312.01632) | Arxiv 2024         | [Code](https://github.com/chiehwangs/gaussian-head)          |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [MonoGaussianAvatar] [MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar](https://arxiv.org/abs/2312.00846) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [Gaussian Head Avatar] [Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians](http://arxiv.org/abs/2312.03029v1) | Arxiv 2024         | [Code](https://github.com/YuelangX/Gaussian-Head-Avatar)     | [Project]( https://yuelangx.github.io/gaussianheadavatar/ )  |                                     |

| 2024 | [HeadGaS] [HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting](http://arxiv.org/abs/2312.02902v1) | Arxiv 2024         |                                                              |                                                              | 🔥Gaussian Splatting                 |

| 2024 | [GaussianAvatars] [GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians](http://arxiv.org/abs/2312.02069v1) | CVPR 2024          | [Code](https://github.com/ShenhanQian/GaussianAvatars)       | [Project](https://shenhanqian.github.io/gaussian-avatars)    | 🔥Gaussian Splatting                 |

| 2024 | [VAST] [VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer](http://arxiv.org/abs/2308.04830v3) | ICCV2023           |                                                              |                                                              |                                     |

| 2023 | [SD-NeRF] [SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs](https://ieeexplore.ieee.org/document/10229247) | IEEE 2023          | -                                                            | -                                                            |                                     |

| 2023 | [Instruct-NeuralTalker] [Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions](https://arxiv.org/abs/2306.10813) | Arxiv 2023         |                                                              |                                                              |                                     |

| 2023 | [GeneFace++] [Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation](https://arxiv.org/abs/2305.00787) | Arxiv 2023         | [Code](https://github.com/yerfor/GeneFacePlusPlus)           | [Project](https://genefaceplusplus.github.io/)               | -                                   |

| 2023 | [GeneFace] [GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis](https://arxiv.org/abs/2301.13430) | ICLR 2023          | [Code](https://github.com/yerfor/GeneFace)                   | [Project](https://geneface.github.io/)                       | -                                   |

| 2022 | [RAD-NeRF] [RAD-NeRF: Real-time Neural Talking Portrait Synthesis](https://arxiv.org/pdf/2211.12368.pdf) | Arxiv 2022         | [Code](https://github.com/ashawkey/RAD-NeRF)                 | [Project](https://ashawkey.github.io/radnerf/)               | InstantNGP                          |

| 2022 | [DFRF] [DFRF：Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis](https://arxiv.org/abs/2207.11770) | ECCV 2022          | [Code](https://github.com/sstzal/DFRF)                       | [Project](https://sstzal.github.io/DFRF/)                    |                                     |

| 2022 | [DialogueNeRF] [DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation](https://arxiv.org/pdf/2203.07931.pdf) | Arxiv 2022         | -                                                            | -                                                            | -                                   |

| 2022 | [NeRFInvertor] [NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation](https://arxiv.org/pdf/2211.17235.pdf) | Arxiv 2022         | [Code](https://github.com/YuYin1/NeRFInvertor)               | [Project](https://yuyin1.github.io/NeRFInvertor_Homepage/)   | -                                   |

| 2022 | [Next3D] [Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars](https://arxiv.org/pdf/2211.11208.pdf) | Arxiv 2022         | [Code](https://mrtornado24.github.io/Next3D/)                | [Project](https://mrtornado24.github.io/Next3D/)             | -                                   |

| 2022 | [3DFaceShop] [3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation](https://arxiv.org/pdf/2209.05434) | Arxiv 2022         | [Code](https://github.com/junshutang/3DFaceShop)             | [Project](https://junshutang.github.io/control/index.html)   | -                                   |

| 2022 | [FNeVR] [FNeVR: Neural Volume Rendering for Face Animation](https://arxiv.org/abs/2209.10340) | Arxiv 2022         | [Code](https://github.com/zengbohan0217/FNeVR)               | -                                                            | -                                   |

| 2022 | [ROME] [ROME: Realistic One-shot Mesh-based Head Avatars](https://arxiv.org/pdf/2206.08343.pdf) | ECCV 2022          | [Code](https://github.com/SamsungLabs/rome)                  | [Project](https://samsunglabs.github.io/rome/)               | -                                   |

| 2022 | [IMavatar] [IMavatar: Implicit Morphable Head Avatars from Videos](https://openaccess.thecvf.com/content/CVPR2022/papers/Zheng_I_M_Avatar_Implicit_Morphable_Head_Avatars_From_Videos_CVPR_2022_paper.pdf) | CVPR 2022          | [Code](https://ait.ethz.ch/projects/2022/IMavatar/)          | [Project](https://ait.ethz.ch/projects/2022/IMavatar/)       | -                                   |

| 2022 | [HeadNeRF] [HeadNeRF: A Real-time NeRF-based Parametric Head Model](https://openaccess.thecvf.com/content/CVPR2022/papers/Grassal_Neural_Head_Avatars_From_Monocular_RGB_Videos_CVPR_2022_paper.pdf) | CVPR 2022          | [Code](https://github.com/CrisHY1995/headnerf)               | [Project](https://hy1995.top/HeadNeRF-Project/)              | -                                   |

| 2022 | [SSP-NeRF] [Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation](https://arxiv.org/pdf/2201.07786.pdf) | Arxiv 2022         | [Code](https://github.com/alvinliu0/SSP-NeRF)                | [Project](https://alvinliu0.github.io/projects/SSP-NeRF)     | -                                   |

| 2021 | [AD-NeRF] [AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis](https://arxiv.org/abs/2103.11078) | ICCV 2021          | [Code](https://github.com/YudongGuo/AD-NeRF)                 | [Project](https://yudongguo.github.io/ADNeRF/)               | -                                   |

| 2021 | [NerFACE] [NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction](https://arxiv.org/pdf/2012.03065) | CVPR 2021 Oral     | [Code](https://github.com/gafniguy/4D-Facial-Avatars)        | [Project](https://gafniguy.github.io/4D-Facial-Avatars/)     | -                                   |

| 2021 | [DFA-NeRF] [DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering](https://arxiv.org/pdf/2201.00791v1.pdf) | Arxiv 2021         | [Code](https://github.com/ShunyuYao/DFA-NeRF)                | -                                                            | -                                   |

---

## Metrics

| Metrics                                             | Paper                                                        | Link                                                         |

| --------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |

| PSNR (peak signal-to-noise ratio)                   | -                                                            |                                                              |

| SSIM (structural similarity index measure)          | Image quality assessment: from error visibility to structural similarity. |                                                              |

| CPBD(cumulative probability of blur detection)      | A no-reference image blur metric based on the cumulative probability of blur detection |                                                              |

| LPIPS (Learned Perceptual Image Patch Similarity) - | The Unreasonable Effectiveness of Deep Features as a Perceptual Metric | [paper](https://arxiv.org/pdf/1801.03924.pdf)                |

| NIQE (Natural Image Quality Evaluator)              | Making a ‘Completely Blind’ Image Quality Analyzer           | [paper](http://live.ece.utexas.edu/research/Quality/niqe_spl.pdf) |

| FID (Fréchet inception distance)                    | GANs trained by a two time-scale update rule converge to a local nash equilibrium |                                                              |

| LMD (landmark distance error)                       | Lip Movements Generation at a Glance                         |                                                              |

| LRA (lip-reading accuracy)                          | Talking Face Generation by Conditional Recurrent Adversarial Network | [paper](https://arxiv.org/pdf/1804.04786.pdf)                |

| WER(word error rate)                                | Lipnet: end-to-end sentencelevel lipreading.                 |                                                              |

| LSE-D (Lip Sync Error - Distance)                   | Out of time: automated lip sync in the wild                  |                                                              |

| LSE-C (Lip Sync Error - Confidence)                 | Out of time: automated lip sync in the wild                  |                                                              |

| ACD(Average content distance)                       | Facenet: a unified embedding for face recognition and clustering. |                                                              |

| CSIM(cosine similarity)                             | Arcface: additive angular margin loss for deep face recognition. |                                                              |

| EAR(eye aspect ratio)                               | Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop |                                                              |

| ESD(emotion similarity distance)                    | What comprises a good talking-head video generation?: A Survey and Benchmark |                                                              |

---

## Tools & Software

| Tool/Resource                                                | Description                                                  |

| ------------------------------------------------------------ | ------------------------------------------------------------ |

| [LUCIA](https://sourceforge.net/projects/lucia/)             | Development of a MPEG-4 Talking Head Engine. 💻               |

| [Yepic Studio](https://www.g2.com/products/yepic-studio/reviews) | Create and dub talking head-style videos in minutes without expensive equipment. 🎥 |

| [Mel McGee's Talkbots](https://sourceforge.net/projects/talkbots/) | A complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️ |

| [face3D_chung](https://sourceforge.net/projects/face3dchung/) | Create 3D character avatar head objects with texture from a single photo for your games. 🎮 |

| [CrazyTalk](https://www.g2.com/products/crazytalk/reviews)   | Exciting features for 3D head creation and automation. 🤪     |

| [tts avatar free download - SourceForge](https://sourceforge.net/directory/?q=tts%20avatar) | Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄) |

| [Verbatim AI - Product Information, Latest Updates, and Reviews 2023](https://www.producthunt.com/products/verbatim-ai) | A simple yet powerful API to generate AI "talking head" videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄) |

| [Best Open Source BASIC 3D Modeling Software](https://sourceforge.net/directory/3d-modeling/basic/) | Includes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭) |

| [DVDStyler / Discussion / Help: ffmpeg-vbr or internal](https://sourceforge.net/p/dvdstyler/discussion/318795/thread/82dcb647/) | Talking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄) |

| [puffin web browser free download - SourceForge](https://sourceforge.net/directory/lisp/?q=puffin+web+browser) | Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄) |

| [12 best AI video generators to use in 2023 Free and paid \|Product ...](https://www.producthunt.com/stories/best-ai-video-generators-free) | Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥) |

---

## Slides & Presentations

| Presentation Title                                           | Description                                                  |

| ------------------------------------------------------------ | ------------------------------------------------------------ |

| [Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://www.slideshare.net/ssuserc9d82a/paper-reviewfewshot-adversarial-learning-of-realistic-neural-talking-head-models) | Presentation reviewing the few-shot adversarial learning of realistic neural talking head models. |

| [Nethania Michelle's Character](https://www.slideshare.net/ZULHICZARARIETINARBU/nethania-michelles-character) | PPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room. |

| [Presenting you: Top tips on presenting with Prezi Video – Prezi](https://support.prezi.com/hc/en-us/articles/360036679953-Presenting-you-Top-tips-on-presenting-with-Prezi-Video) | Article providing top tips for presenting with Prezi Video.  |

| [Research Presentation](https://pt.slideshare.net/willg_36/research-presentation-presentation-956726) | PPT: Resident Research Presentation Slide Deck.              |

| [Adding narration to your presentation (using Prezi Video) – Prezi](https://support.prezi.com/hc/en-us/articles/360038281894-Adding-narration-to-your-presentation-using-Prezi-Video-) | Learn how to add narration to your Prezi presentation with Prezi Video. |

---

## References

| Website                                                      | Description                                                  |

| ------------------------------------------------------------ | ------------------------------------------------------------ |

| [arXiv](https://arxiv.org/)                                  | Provides preprints in various academic fields, serving as an important platform for accessing the latest research findings. |

| [CVF Open Access](https://openaccess.thecvf.com/)            | The Computer Vision Foundation's open-access platform, offering open-access papers from top conferences such as CVPR, ICCV, ECCV, and more. |

| [Papers with Code](https://paperswithcode.com/)              | A platform that aggregates research papers with accompanying code implementations, making it convenient to find the latest research findings and their corresponding implementations. |

| [ICCV - International Conference on Computer Vision](https://dblp.uni-trier.de/db/conf/eccv/index.html) | The International Conference on Computer Vision, gathering the latest research findings in the field of computer vision. |

| [ECCV - European Conference on Computer Vision](http://www.informatik.uni-trier.de/~ley/db/conf/eccv/index.html) | The European Conference on Computer Vision, providing the latest research results and related information in the field of computer vision. |

| [CVPR - Conference on Computer Vision and Pattern Recognition](http://dblp.uni-trier.de/db/conf/cvpr/index.html) | The Conference on Computer Vision and Pattern Recognition, one of the top conferences in computer vision, showcasing numerous important research findings. |

---

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Kedreamix/Awesome-Talking-Head-Synthesis&type=Date)](https://star-history.com/#Kedreamix/Awesome-Talking-Head-Synthesis&Date)

[1]: https://github.com/YunjinPark/awesome_talking_face_generation

[2]: https://github.com/LTT-O/Awesome-Talking-Head-Generation

[3]: https://github.com/JosephPai/Awesome-Talking-Face "Greate Project"

[4]: https://github.com/weishida01/Awesome-Talking-Face-Generation

[5]: https://github.com/harlanhong/awesome-talking-head-generation "nice job"

[6]:https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads "full of tools"
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis

Awesome Lists containing this project

README