https://github.com/JosephPai/Awesome-Talking-Face

📖 A curated list of resources dedicated to talking face.
https://github.com/JosephPai/Awesome-Talking-Face
Last synced: 2 months ago
JSON representation
📖 A curated list of resources dedicated to talking face.
Host: GitHub
URL: https://github.com/JosephPai/Awesome-Talking-Face
Owner: JosephPai
License: mit
Created: 2020-03-15T12:39:10.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-05-12T08:32:08.000Z (about 1 year ago)
Last Synced: 2024-05-23T01:06:04.368Z (about 1 year ago)
Size: 49.8 KB
Stars: 1,082
Watchers: 72
Forks: 94
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-awesome-artificial-intelligence - Awesome Talking Face - Talking-Face?style=social) | (Computer Vision)
awesome-awesome-artificial-intelligence - Awesome Talking Face - Talking-Face?style=social) | (Computer Vision)
ultimate-awesome - Awesome-Talking-Face - 📖 A curated list of resources dedicated to talking face. (Other Lists / Julia Lists)
README

        # Awesome Talking Face [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome#readme)

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

#### :high_brightness: This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.

#### :star: If you find this repo useful, please star it!!!

#### 2022.09 Update!

Thanks for PR from everybody! From now on, I'll occasionally include some papers about **video-driven** talking face generation. Because I found that the community is trying to include the **video-driven** methods into the talking face generation scope, though it is originally termed as **Face Reenactment**.

So, if you are looking for **video-driven talking face generation**, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

#### 2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the **audio-driven** talking face generation works. However, I found several **text-based** research works are also very interesting. So I included them here. Enjoy it!

#### TO DO LIST

- [x] Main paper list

- [x] Add paper link

- [x] Add codes if have

- [x] Add project page if have

- [x] Datasets and survey

## Papers

### 2D Video - Person independent

#### 2024

- VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization [arXiv 2024] [Paper](https://arxiv.org/abs/2412.09892)

- PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2412.07754)

- IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2412.04000) [Project](http://ec2-3-25-102-128.ap-southeast-2.compute.amazonaws.com/IF-MDM/ifmdm_supplementary/index.html)

- INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations [arXiv 2024] [Paper](https://arxiv.org/abs/2412.04037) [Project](https://grisoon.github.io/INFP/)

- MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2412.04448) [Code](https://github.com/memoavatar/memo) [Project](https://memoavatar.github.io/)

- FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait [arXiv 2024] [Paper](https://arxiv.org/abs/2412.01064) [Project](https://deepbrainai-research.github.io/float/)

- Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks [arXiv 2024] [Paper](https://arxiv.org/abs/2412.00733)

- Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2412.00719) [Code](https://github.com/ShaelynZ/synergize-motion-appearance)

- Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2411.19509)

- LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2411.16748) [Code](https://github.com/zhang-haojie/letstalk) [Project](https://zhang-haojie.github.io/project-pages/letstalk.html)

- EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion [arXiv 2024] [Paper](https://www.arxiv.org/abs/2411.16726)

- LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space [arXiv 2024] [Paper](https://arxiv.org/abs/2411.09268) [Project](https://peterfanfan.github.io/LES-Talker/)

- JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2411.09209) [Code](https://github.com/jdh-algo/JoyVASA)

- HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models [arXiv 2024] [Paper](https://arxiv.org/abs/2410.22901) [Code](https://github.com/HelloVision/HelloMeme)

- DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2410.13726) [Code](https://github.com/Hanbo-Cheng/DAWN-pytorch) [ProjectPage](https://hanbo-cheng.github.io/DAWN/)

- Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization [arXiv 2024] [Paper](https://arxiv.org/abs/2410.14283)

- MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting [arXiv 2024] [Paper](https://arxiv.org/abs/2410.10122) [Code](https://github.com/TMElyralab/MuseTalk)

- 3D-Aware Text-driven Talking Avatar Generation [ECCV 2024] [Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12305.pdf)

- LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details [arXiv 2024] [Paper](https://arxiv.org/abs/2410.00990)

- TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans [ECCVW 2024] [Paper](https://arxiv.org/abs/2409.16666)

- JoyHallo: Digital human model for Mandarin [arXiv 2024] [Paper](https://arxiv.org/abs/2409.13268) [Code](https://github.com/jdh-algo/JoyHallo)

- JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation [BMVC 2024] [Paper](https://arxiv.org/abs/2409.12156)

- StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads [TPAMI 2024] [Paper](https://www.arxiv.org/abs/2409.09292)

- DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures [CVPRW 2024] [Paper](https://arxiv.org/abs/2409.07649)

- EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [arXiv 2024] [Paper](https://www.arxiv.org/abs/2409.07255)

- SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2409.03270)

- SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing [arXiv 2024] [Paper](https://www.arxiv.org/abs/2409.03605)

- Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency [arXiv 2024] [Paper](https://arxiv.org/abs/2409.02634) [ProjectPage](https://loopyavatar.github.io/)

- PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2409.02657)

- CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2409.01876) [ProjectPage](https://cyberhost.github.io/)

- TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.13714)

- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2408.09347v1)

- FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.09384v1)

- LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control [arXiv 2024] [Paper](https://arxiv.org/abs/2407.03168) [ProjectPage](https://liveportrait.github.io/) [Code](https://github.com/KwaiVGI/LivePortrait)

- High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.05416)

- Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.01732)

- LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement [arXiv 2024] [Paper](https://arxiv.org/abs/2407.18595) [ProjectPage](https://tencentqqgylab.github.io/LinguaLinker/) [Code](https://github.com/TencentQQGYLab/LinguaLinker?tab=readme-ov-file)

- Learning Online Scale Transformation for Talking Head Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2407.09965)

- EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions [arXiv 2024] [Paper](https://arxiv.org/abs/2407.08136) [ProjectPage](https://badtobest.github.io/echomimic.html) [GitHub](https://github.com/BadToBest/EchoMimic)

- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.08801) [ProjectPage](https://fudan-generative-vision.github.io/hallo/#/) [GitHub](https://github.com/fudan-generative-vision/hallo)

- RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network [arXiv 2024] [Paper](https://arxiv.org/abs/2406.18284)

- Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.07895)

- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.07867) [ProjectPage](https://multidialog.github.io/)

- Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement [arXiv 2024] [Paper](https://arxiv.org/abs/2406.08096) [ProjectPage](https://ingrid789.github.io/MyTalk/)

- Controllable Talking Face Generation by Implicit Facial Keypoints Editing [arXiv 2024] [Paper](https://arxiv.org/abs/2406.02880)

- InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2405.15758) [ProjectPage](https://wangyuchi369.github.io/InstructAvatar/)

- Faces that Speak: Jointly Synthesising Talking Face and Speech from Text [arXiv 2024] [Paper](https://arxiv.org/abs/2405.10272) [ProjectPage](https://mm.kaist.ac.kr/projects/faces-that-speak/)

- Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2405.07257)

- SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space [arXiv 2024] [Paper](https://arxiv.org/abs/2405.05636) [ProjectPage](http://swaptalk.cc/)

- AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [arXiv 2024] [Paper](https://arxiv.org/abs/2405.03121) [Code](https://github.com/X-LANCE/AniTalker) [ProjectPage](https://x-lance.github.io/AniTalker/)

- NeRFFaceSpeech: One-shot Audio-diven 3D Talking Head Synthesis via Generative Prior [CVPR 2024 Workshop] [Paper](https://arxiv.org/abs/2405.05749) [Code](https://github.com/rlgnswk/NeRFFaceSpeech_Code/) [ProjectPage](https://rlgnswk.github.io/NeRFFaceSpeech_ProjectPage/)

- Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation [CVPR 2024 Workshop] [Paper](https://arxiv.org/abs/2405.04327)

- EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [arXiv 2024] [Paper](https://arxiv.org/abs/2404.19110)[ProjectPage](https://neeek2303.github.io/EMOPortraits)

- GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting [arXiv 2024] [Paper](https://arxiv.org/abs/2404.19040)

- VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time [arXiv 2024] [Paper](https://arxiv.org/abs/2404.10667) [ProjectPage](https://www.microsoft.com/en-us/research/project/vasa-1/)

- THQA: A Perceptual Quality Assessment Database for Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2404.09003) [Code](https://github.com/zyj-2000/THQA)

- Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [arXiv 2024] [Paper](https://arxiv.org/abs/2403.20153) [Code](https://github.com/KU-CVLAB/Talk3D) [ProjectPage](https://ku-cvlab.github.io/Talk3D/)

- EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2404.01647) [Code](https://github.com/tanshuai0219/EDTalk) [ProjectPage](https://tanshuai0219.github.io/EDTalk/)

- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17694) [Code](https://github.com/Zejun-Yang/AniPortrait)

- MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation [arXiv 2024] [Paper](https://ku-cvlab.github.io/MoDiTalker/) [ProjectPage](https://ku-cvlab.github.io/MoDiTalker/)

- Superior and Pragmatic Talking Face Generation with Teacher-Student Framework [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17883) [ProjectPage](https://superfacelink.github.io/)

- X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2403.15931)

- Adaptive Super Resolution For One-Shot Talking-Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2403.15944)

- Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style [arXiv 2024] [Paper](https://arxiv.org/abs/2403.06365)

- FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization [arXiv 2024] [Paper](https://arxiv.org/abs/2403.06375)

- FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio [arXiv 2024] [Paper](https://arxiv.org/abs/2403.01901) [Code](https://github.com/modelscope/facechain)

- Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis [CVPR 2024] [Paper](https://arxiv.org/abs/2402.17364) [Code](https://github.com/zhangzc21/DynTet)

- EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions [arXiv 2024] [Paper](https://arxiv.org/abs/2402.17485) [ProjectPage](https://humanaigc.github.io/emote-portrait-alive/) [Code](https://github.com/HumanAIGC/EMO)

- G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment [arXiv 2024] [Paper](https://arxiv.org/abs/2402.18122)

- Context-aware Talking Face Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.18092)

- EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.01422) [ProjectPage](https://peterfanfan.github.io/EmoSpeaker/) [Code](https://github.com/PeterFanFan/Emospeaker_code)

- GPAvatar: Generalizable and Precise Head Avatar from Image(s) [ICLR 2024] [Paper](https://arxiv.org/abs/2401.10215) [Code](https://github.com/xg-chu/GPAvatar)

- Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [ICLR 2024] [Paper](https://arxiv.org/abs/2401.08503)

- EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model [ICASSP 2024] [Paper](https://arxiv.org/abs/2401.08049)

- CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer [WACV 2024] [Paper](https://arxiv.org/abs/2311.06443) [Code](https://github.com/HowieMa/CVTHead)

#### 2023

- VectorTalker: SVG Talking Face Generation with Progressive Vectorisation [arXiv 2023] [Paper](https://arxiv.org/abs/2312.11568)

- DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models [arXiv 2023] [Paper](https://arxiv.org/abs/2312.09767) [ProjectPage](https://dreamtalk-project.github.io/)

- GMTalker: Gaussian Mixture based Emotional talking video Portraits [arXiv 2023] [Paper](https://arxiv.org/abs/2312.07669) [ProjectPage](https://bob35buaa.github.io/GMTalker)

- DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers [arXiv 2023] [Paper](https://arxiv.org/abs/2312.06400)

- R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning [arXiv 2023] [Paper](https://arxiv.org/abs/2312.05572)

- FT2TF: First-Person Statement Text-To-Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2312.05430)

- VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior [arXiv 2023] [Paper](https://arxiv.org/abs/2312.01841) [Code](https://github.com/HumanAIGC/VividTalk) [ProjectPage](https://humanaigc.github.io/vivid-talk/)

- SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2311.17590) [Code](https://github.com/ziqiaopeng/SyncTalk) [ProjectPage](https://ziqiaopeng.github.io/synctalk/)

- GAIA: Zero-shot Talking Avatar Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2311.15230)

- Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Li_Efficient_Region-Aware_Neural_Radiance_Fields_for_High-Fidelity_Talking_Portrait_Synthesis_ICCV_2023_paper.html) [ProjectPage](https://fictionarry.github.io/ER-NeRF/) [Code](https://github.com/Fictionarry/ER-NeRF)

- Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation [ICCV 2023] [Paper](https://arxiv.org/abs/2307.09906) [ProjectPage](https://harlanhong.github.io/publications/mcnet.html) [Code](https://github.com/harlanhong/ICCV2023-MCNET)

- MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions [ICCV 2023] [Paper](https://arxiv.org/abs/2307.10008) [ProjectPage](https://liuyunfei.net/projects/iccv23-moda/)

- ToonTalker: Cross-Domain Face Reenactment [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Gong_ToonTalker_Cross-Domain_Face_Reenactment_ICCV_2023_paper.pdf)

- Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation [ICCV 2023] [Paper](https://arxiv.org/abs/2309.04946) [ProjectPage](https://yuangan.github.io/eat/) [Code](https://github.com/yuangan/eat_code)

- EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Tan_EMMN_Emotional_Motion_Memory_Network_for_Audio-driven_Emotional_Talking_Face_ICCV_2023_paper.pdf)

- Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_Emotional_Listener_Portrait_Neural_Listener_Head_Generation_with_Emotion_ICCV_2023_paper.pdf)

- Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions [arXiv 2023] [Paper](https://arxiv.org/abs/2306.10813)

- Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow [arXiv 2023] [Paper](https://arxiv.org/abs/2307.09368)

- Reprogramming Audio-driven Talking Face Synthesis into Text-driven [arXiv 2023] [Paper](https://arxiv.org/abs/2306.16003)

- Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis [TCSVT 2023] [Paper](https://arxiv.org/abs/2309.00030)

- Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] [Paper](https://arxiv.org/abs/2306.03594)

- Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2306.03504)

- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [CVPR 2023] [Paper](https://arxiv.org/abs/2211.12194) [Code](https://github.com/OpenTalker/SadTalker)

- MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation [CVPR 2023] [Paper](https://arxiv.org/abs/2212.08062) [ProjectPage](https://meta-portrait.github.io/) [Code](https://github.com/Meta-Portrait/MetaPortrait)

- Implicit Neural Head Synthesis via Controllable Local Deformation Fields [CVPR 2023] [Paper](https://arxiv.org/abs/2304.11113)

- LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook [CVPR 2023] [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LipFormer_High-Fidelity_and_Generalizable_Talking_Face_Generation_With_a_Pre-Learned_CVPR_2023_paper.pdf)

- GANHead: Towards Generative Animatable Neural Head Avatars [CVPR 2023] [Paper](https://arxiv.org/abs/2304.03950v1) [ProjectPage](https://wsj-sjtu.github.io/GANHead/) [Code](https://github.com/wsj-sjtu/GANHead)

- Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Parametric_Implicit_Face_Representation_for_Audio-Driven_Facial_Reenactment_CVPR_2023_paper.pdf)

- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] [Paper](https://arxiv.org/abs/2305.08293) [Code](https://github.com/Weizhi-Zhong/IP_LAP)

- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] [Paper](https://arxiv.org/pdf/2305.05445.pdf) [ProjectPage](https://hangz-nju-cuhk.github.io/projects/StyleSync) [Code](https://github.com/guanjz20/StyleSync)

- Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] [Paper](https://arxiv.org/abs/2305.03713) [ProjectPage](https://research.nvidia.com/labs/nxp/avatar-fingerprinting/)

- Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] [Paper](https://arxiv.org/abs/2305.02594)

- High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [CVPR 2023] [Paper](https://arxiv.org/abs/2305.02572)

- StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2305.00521) [ProjectPage](https://stylelipsync.github.io) [Code](https://github.com/TaekyungKi/StyleLipSync)

- GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2305.00787) [ProjectPage](https://genefaceplusplus.github.io)

- High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] [Paper](https://arxiv.org/abs/2304.10168) [Project Page](https://yuegao.me/PECHead/)

- One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] [Paper](https://arxiv.org/abs/2304.05097) [ProjectPage](https://www.waytron.net/hidenerf/)

- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] [Paper](https://arxiv.org/abs/2303.17480) [Code](https://github.com/Sxjdwang/TalkLip)

- Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] [Paper](https://arxiv.org/abs/2304.08945)

- That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2304.03275) [ProjectPage](https://mm.kaist.ac.kr/projects/FC-TFG/)

- Emotionally Enhanced Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.11548) [Code](https://github.com/sahilg06/EmoGen) [ProjectPage](https://midas.iiitd.edu.in/emo/)

- A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] [Paper](https://arxiv.org/abs/2304.00471v1)

- TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] [Paper](https://arxiv.org/abs/2304.00334v1)

- FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] [Paper](https://arxiv.org/abs/2303.17789)

- DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] [Paper](https://arxiv.org/abs/2303.17550) [ProjectPage](https://daetalker.github.io/)

- OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION [ICASSP 2023] [Paper](https://arxiv.org/abs/2302.08197)

- DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] [Paper](https://arxiv.org/abs/2303.07697) [Code](https://github.com/deepbrainai-research/discohead) [ProjectPage](https://deepbrainai-research.github.io/discohead/)

- GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] [Paper](https://arxiv.org/abs/2301.13430) [Code](https://github.com/yerfor/GeneFace) [ProjectPage](https://geneface.github.io/)

- OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR 2023] [Paper](https://arxiv.org/abs/2303.14662) [Code](https://github.com/theEricMa/OTAvatar)

- Emotionally Enhanced Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.11548) [Code](https://github.com/sahilg06/EmoGen) [ProjectPage](https://midas.iiitd.edu.in/emo/)

- Style Transfer for 2D Talking Head Animation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.09799)

- READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] [Paper](https://arxiv.org/abs/2303.00744)

- On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2303.00502)

- DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [CVPR 2023] [Paper](https://arxiv.org/abs/2301.03786)

- Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] [Paper](https://mstypulkowski.github.io/diffusedheads/diffused_heads.pdf) [ProjectPage](https://mstypulkowski.github.io/diffusedheads/)

- StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] [Paper](https://arxiv.org/abs/2301.01081) [Code](https://github.com/FuxiVirtualHuman/styletalk)

- Audio-Visual Face Reenactment [WACV 2023] [Paper](https://arxiv.org/abs/2210.02755) [ProjectPage](http://cvit.iiit.ac.in/research/projects/cvit-projects/avfr) [Code](https://github.com/mdv3101/AVFR-Gan/)

#### 2022

- Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2212.05005) [ProjectPage](https://memoryface.github.io/)

- Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] [Paper](https://arxiv.org/abs/2212.04970)

- Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] [Paper](https://arxiv.org/abs/2212.04248) [ProjectPage](https://zxyin.github.io/TH-PAD/)

- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [CVPR 2022] [Paper](https://arxiv.org/abs/2211.14506) [ProjectPage](https://dorniwang.github.io/PD-FGC/)

- SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] [Paper](https://arxiv.org/abs/2211.09809) [ProjectPage](https://deepimagination.cc/SPACEx/)

- Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] [Paper](https://arxiv.org/abs/2210.03692) [Project Page](https://cvit.iiit.ac.in/research/projects/cvit-projects/talking-video-compression)

- Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] [Paper](https://arxiv.org/pdf/2209.01320.pdf)

- StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] [Paper](https://arxiv.org/pdf/2208.10922.pdf)

- Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] [Paper](https://arxiv.org/pdf/2208.02210.pdf)

- EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] [Paper](https://arxiv.org/pdf/2205.15278.pdf) 

- Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] [Paper](https://arxiv.org/pdf/2209.04252.pdf) 

- Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] [Paper](https://openreview.net/pdf?id=7r6kDq0mK_) [ProjectPage(note this page has auto-play music...)](https://wyhsirius.github.io/LIA-project/) [Code](https://github.com/wyhsirius/LIA)

- Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition [arXiv 2022] [Paper](https://arxiv.org/pdf/2211.12368.pdf) [ProjectPage](https://me.kiui.moe/radnerf/) [Code](https://github.com/ashawkey/RAD-NeRF)

- Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] [Paper](https://arxiv.org/pdf/2207.11770.pdf) [ProjectPage](https://sstzal.github.io/DFRF/) [Code](https://github.com/sstzal/DFRF)

- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] [Paper](https://arxiv.org/pdf/2201.07786.pdf) [ProjectPage](https://alvinliu0.github.io/projects/SSP-NeRF) [Code](https://github.com/alvinliu0/SSP-NeRF)

- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] [Paper](https://arxiv.org/pdf/2104.14631.pdf) [ProjectPage](https://sites.google.com/view/sibozhang/text2video) [Code](https://github.com/sibozhang/Text2Video)

- StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2208.13717) [ProjectPage](https://stable-face.github.io/)

- Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] [Paper](https://www.ijcai.org/proceedings/2022/0184.pdf)

- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] [Paper](https://arxiv.org/abs/2203.04036) [Code](https://github.com/FeiiYin/StyleHEAT) [ProjectPage](https://feiiyin.github.io/StyleHEAT/)

- DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] [Paper](https://arxiv.org/pdf/2201.00791.pdf) 

- Dynamic Neural Textures: Generating Talking-Face Videos with

Continuously Controllable Expressions [arXiv 2022] [Paper](https://arxiv.org/pdf/2204.06180.pdf)

- Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] [Paper](https://arxiv.org/pdf/2201.05986.pdf)

- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] [Paper](https://arxiv.org/pdf/2203.06605.pdf) [ProjectPage](https://harlanhong.github.io/publications/dagan.html) [Code](https://github.com/harlanhong/CVPR2022-DaGAN)

- Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] [Paper](https://arxiv.org/pdf/2203.02573.pdf) [Code](https://github.com/snap-research/MMVID) [ProjectPage](https://snap-research.github.io/MMVID/)

- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] [Paper](https://arxiv.org/abs/2203.06605) [Code](https://github.com/harlanhong/CVPR2022-DaGAN) [ProjectPage](https://harlanhong.github.io/publications/dagan.html)

- Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Liang_Expressive_Talking_Head_Generation_With_Granular_Audio-Visual_Control_CVPR_2022_paper.pdf)

- Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Song_Talking_Face_Generation_With_Multilingual_TTS_CVPR_2022_paper.pdf) [DemoPage](https://huggingface.co/spaces/CVPR/ml-talking-face)

- SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] [Paper](https://ojs.aaai.org/index.php/AAAI/article/download/20102/19861)

#### 2021

- Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation  [SIGGRAPH Asia 2021] [Paper](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/resources/SIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf) [Code](https://github.com/YuanxunLu/LiveSpeechPortraits)

- Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] [Paper](https://arxiv.org/abs/2111.00203) [Code](https://github.com/wuhaozhe/style_avatar)

- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] [Paper](https://arxiv.org/abs/2103.11078)  [Code](https://github.com/YudongGuo/AD-NeRF) 

- FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhang_FACIAL_Synthesizing_Dynamic_Talking_Face_With_Implicit_Attribute_Learning_ICCV_2021_paper.pdf) [Code](https://github.com/zhangchenxu528/FACIAL)

- Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Meshry_Learned_Spatial_Representations_for_Few-Shot_Talking-Head_Synthesis_ICCV_2021_paper.pdf) 

- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] [Paper](https://arxiv.org/abs/2104.11116)  [Code](https://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS)  [ProjectPage](https://hangz-nju-cuhk.github.io/projects/PC-AVS)

- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_One-Shot_Free-View_Neural_Talking-Head_Synthesis_for_Video_Conferencing_CVPR_2021_paper.pdf)

- Audio-Driven Emotional Video Portraits [CVPR 2021] [Paper](https://jixinya.github.io/projects/evp/resources/evp.pdf)  [Code](https://github.com/jixinya/EVP/)

- AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] [Paper](https://arxiv.org/pdf/2108.04325.pdf)

- Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] [Paper](https://www.bmvc2021-virtualconference.com/assets/papers/0291.pdf)

- Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] [Paper](https://www.ijcai.org/proceedings/2021/0152.pdf)

- Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] [Paper](https://arxiv.org/abs/2104.07995)

- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] [Paper](https://arxiv.org/abs/2104.14631)  [Code](https://github.com/sibozhang/Text2Video)

#### 2020

- Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020]  [Paper](http://arxiv.org/abs/2002.10137)  [Code](https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose)

- A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] [Paper](http://arxiv.org/abs/2008.10010)  [Code](https://github.com/Rudrabha/Wav2Lip)

- Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] [Paper](https://dl.acm.org/doi/abs/10.1145/3394171.3413844)

- Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020]  [Paper](https://arxiv.org/abs/2008.03592)  [Code](https://github.com/eeskimez/emotalkingface)

- A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] [Paper](https://arxiv.org/abs/2002.08700)

- Everybody's Talkin': Let Me Talk as You Want [arXiv 2020]  [Paper](https://arxiv.org/abs/2001.05201)

- HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] [Paper](https://arxiv.org/abs/2012.08261)

- Talking-head Generation with Rhythmic Head Motion [ECCV 2020] [Paper](https://arxiv.org/abs/2007.08547)

- Neural Voice Puppetry:  Audio-driven Facial Reenactment [ECCV 2020] [Paper](https://arxiv.org/pdf/1912.05566.pdf) [Project](https://justusthies.github.io/posts/neural-voice-puppetry/) [Code](https://github.com/JustusThies/NeuralVoicePuppetry)

- Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Prajwal_Learning_Individual_Speaking_Styles_for_Accurate_Lip_to_Speech_Synthesis_CVPR_2020_paper.pdf)

- Robust One Shot Audio to Video Generation [CVPRW 2020] [Paper](https://openaccess.thecvf.com/content_CVPRW_2020/html/w45/Kumar_Robust_One_Shot_Audio_to_Video_Generation_CVPRW_2020_paper.html)

- MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] [Paper](https://arxiv.org/abs/2004.12992)  [Code](https://github.com/adobe-research/MakeItTalk)

- FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] [Paper](https://arxiv.org/abs/1911.09224)

- Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] [Paper](https://arxiv.org/abs/2003.12957)

- Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] [Paper](https://arxiv.org/abs/2002.08700)

- SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] [Paper](https://arxiv.org/abs/1912.05833)

- Animating Face using Disentangled Audio Representations [WACV 2020] [Paper](https://arxiv.org/abs/1910.00726)

#### Before 2020

- Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019]  [Paper](http://arxiv.org/abs/1906.06337)  [PorjectPage](https://sites.google.com/view/facial-animation)

- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019]  [Paper](https://arxiv.org/abs/1905.08233)  [Code](https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models)

- Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019]  [Paper](http://www.cs.rochester.edu/u/lchen63/cvpr2019.pdf)  [Code](https://github.com/lelechen63/ATVGnet)

- Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019]  [Paper](https://arxiv.org/abs/1807.07860)  [Code](https://github.com/Hangz-nju-cuhk/Talking-Face-Generation-DAVS)  [ProjectPage](https://liuziwei7.github.io/projects/TalkingFace)

- Lip Movements Generation at a Glance [ECCV 2018]  [Paper](https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=2ahUKEwj54cbvupzoAhUyGKYKHXnfBuAQFjACegQIBBAB&url=http%3A%2F%2Fopenaccess.thecvf.com%2Fcontent_ECCV_2018%2Fpapers%2FLele_Chen_Lip_Movements_Generation_ECCV_2018_paper.pdf&usg=AOvVaw3FPJeIMPR56Bwm3k0bnQkI)

- X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018]  [Paper](https://www.robots.ox.ac.uk/~vgg/publications/2018/Wiles18/wiles18.pdf)  [Code](https://github.com/oawiles/X2Face)  [ProjectPage](http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/x2face.html)

- Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019]  [Paper](https://arxiv.org/abs/1804.04786)  [Code](https://github.com/susanqq/Talking_Face_Generation)

- Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018]  [Paper](https://arxiv.org/abs/1803.07461)

- High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018]  [Paper](https://arxiv.org/abs/1812.06589)

- Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] [Paper](https://arxiv.org/pdf/1803.07716)

- You said that? [BMVC 2017]  [Paper](https://arxiv.org/abs/1705.02966)

### 2D Video - Person dependent

- Continuously Controllable Facial Expression Editing in Talking Face Videos [TAFFC 2023] [Paper](https://doi.org/10.1109/TAFFC.2023.3334511) [Project Page](https://raineggplant.github.io/FEE4TV)

- Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017]  [Paper](http://grail.cs.washington.edu/projects/AudioToObama/siggraph17_obama.pdf)  [Project Page](http://grail.cs.washington.edu/projects/AudioToObama/)

- PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017]  [Paper](http://www.researchgate.net/publication/323352468_Photorealistic_adaptation_and_interpolation_of_facial_expressions_using_HMMS_and_AAMS_for_audio-visual_speech_synthesis)

- HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017]  [Paper](https://www.scirp.org/pdf/JCC_2017082216385517.pdf)

- ObamaNet: Photo-realistic lip-sync from text [arXiv 2017]  [Paper](https://arxiv.org/abs/1801.01442)

- A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015]  [Paper](https://dl.acm.org/citation.cfm?id=2944665)

- Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013]  [Paper](https://www.researchgate.net/publication/259287794_Photo-Realistic_Expressive_Text_to_Talking_Head_Synthesis)

- PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015]  [Paper](https://www.researchgate.net/publication/272094351_Photo-real_talking_head_with_deep_bidirectional_LSTM)

- Expressive Speech-Driven Facial Animation [TOG 2005]  [Paper](https://dl.acm.org/citation.cfm?id=1145094)

### 3D Animation

- Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters [arXiv 2024] [Paper](https://arxiv.org/abs/2412.14333)

- One Shot, One Talk: Whole-body Talking Avatar from a Single Image [arXiv 2024] [Paper](https://arxiv.org/abs/2412.01106) [Project](https://ustc3dv.github.io/OneShotOneTalk/)

- Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts [arXiv 2024] [Paper](https://arxiv.org/abs/2410.23836)

- Pose-Aware 3D Talking Face Synthesis using Geometry-guided Audio-Vertices Attention [TVCG 2024] [Paper](https://ieeexplore.ieee.org/document/10452856) [Code](https://github.com/sharlingw/PATFS)

- MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes [NeurIPS 2024] [Paper](https://arxiv.org/abs/2410.06734) [Code](https://github.com/yerfor/MimicTalk) [ProjectPage](https://mimictalk.github.io/)

- ScanTalk: 3D Talking Heads from Unregistered Scans [ECCV 2024] [Paper](https://arxiv.org/abs/2403.10942) [Code](https://github.com/miccunifi/ScanTalk)

- Audio-Driven Emotional 3D Talking-Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2410.17262v1)

- Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2410.11041)

- 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy [arxiv 2024] [Paper](https://www.arxiv.org/abs/2409.10848)

- ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE [arXiv 2024] [Paper](https://arxiv.org/abs/2409.07966) [Code](https://github.com/uuembodiedsocialai/ProbTalk3D/)

- KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding [ECCV 2024] [Paper](https://arxiv.org/abs/2409.01113) [Code](https://github.com/ffxzh/KMTalk)

- EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2408.11518)

- DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.06010)

- JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model [arXiv 2024] [Paper](https://www.arxiv.org/abs/2408.01627)

- GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer [arXiv 2024] [Paper](https://www.arxiv.org/abs/2408.01826)

- UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.00762)

- EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [arXiv 2024] [Paper](https://arxiv.org/abs/2408.00297)

- EmoFace: Audio-driven Emotional 3D Face Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2407.12501) [Code](https://github.com/SJTU-Lucy/EmoFace)

- MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset [InterSpeed 2024] [Paper](https://arxiv.org/abs/2406.14272) [ProjectPage](https://multi-talk.github.io/)

- 3D Gaussian Blendshapes for Head Avatar Animation [SIGGRAPH 2024] [Paper](https://arxiv.org/abs/2404.19398)

- CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2404.18604)

- GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [arXiv 2024] [Paper](https://arxiv.org/abs/2404.14037)

- Learn2Talk: 3D Talking Face Learns from 2D Talking Face [arXiv 2024] [Paper](https://arxiv.org/abs/2404.12888) [ProjectPage](https://lkjkjoiuiu.github.io/Learn2Talk/)

- Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication [arXiv 2024] [Paper](https://arxiv.org/abs/2403.19467)

- AnimateMe: 4D Facial Expressions via Diffusion Models [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17213)

- EmoVOCA: Speech-Driven Emotional 3D Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2403.12886)

- FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [CVPR 2024] [Paper](https://arxiv.org/abs/2312.08459) [Code](https://github.com/shivangi-aneja/FaceTalk) [ProjectPage](https://shivangi-aneja.github.io/projects/facetalk/)

- AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.16124)

- DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer [arXiv 2024] [Paper](https://arxiv.org/pdf/2402.05712.pdf) [Code](https://github.com/theEricMa/DiffSpeaker) 

- Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance [arXiv 2024] [Paper](https://arxiv.org/abs/2401.15687) [ProjectPage](https://sites.google.com/view/media2face)

- EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement [SIGGRAPH Asia 2023] [Paper](https://arxiv.org/abs/2306.08990) [ProjectPage](https://emote.is.tue.mpg.de/)

- PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features [arXiv] [Paper](https://arxiv.org/abs/2312.02781)

- 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [arXiv 2023] [Paper](https://arxiv.org/abs/2312.00870) [Code](https://github.com/bala1144/3DiFACE) [ProjectPage](https://balamuruganthambiraja.github.io/3DiFACE/)

- Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications [arXiv 2023] [Paper](https://arxiv.org/abs/2311.18168)

- DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser [arXiv 2023] [Paper](https://arxiv.org/abs/2311.16565)

- DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models [arXiv 2023] [Paper](https://arxiv.org/abs/2310.00434) [ProjectPage](https://diffposetalk.github.io) [Code](https://github.com/DiffPoseTalk/DiffPoseTalk)

- Imitator: Personalized Speech-driven 3D Facial Animation [ICCV 2023] [Paper](https://arxiv.org/abs/2301.00023) [ProjectPage](https://balamuruganthambiraja.github.io/Imitator/) [Code](https://github.com/bala1144/Imitator)

- Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Speech4Mesh_Speech-Assisted_Monocular_3D_Facial_Reconstruction_for_Speech-Driven_3D_Facial_ICCV_2023_paper.pdf)

- Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Yang_Semi-supervised_Speech-driven_3D_Facial_Animation_via_Cross-modal_Encoding_ICCV_2023_paper.pdf)

- Audio-Driven 3D Facial Animation from In-the-Wild Videos [arXiv 2023] [Paper](https://arxiv.org/abs/2306.11541) [ProjectPage](https://faw3d.github.io/)

- EmoTalk: Speech-driven emotional disentanglement for 3D face animation [ICCV 2023] [Paper](https://arxiv.org/abs/2303.11089) [ProjectPage](https://ziqiaopeng.github.io/emotalk/)

- FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] [Paper](https://arxiv.org/abs/2303.05416) [Code](https://github.com/galib360/FaceXHuBERT) [ProjectPage](https://galib360.github.io/FaceXHuBERT/)

- Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] [Paper](https://arxiv.org/abs/2302.12532)

- Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] [Paper](https://arxiv.org/abs/2301.06059) [ProjectPage](https://linchaobao.github.io/viseme2023/)

- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [CVPR 2023] [Paper](https://arxiv.org/abs/2301.02379) [ProjectPage](https://doubiiu.github.io/projects/codetalker/)

- Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] [Paper](https://arxiv.org/abs/2301.02008)

- Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] [Paper](https://arxiv.org/abs/2301.00023) [ProjectPage](https://balamuruganthambiraja.github.io/Imitator/)

- PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2212.06384) [ProjectPage](https://showlab.github.io/pv3d/)

- Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos [CVPR 2022] [Paper](https://arxiv.org/pdf/2112.00585.pdf) [Code](https://github.com/foivospar/NED)

- FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] [Paper](https://arxiv.org/pdf/2112.05329.pdf) [Code](https://github.com/EvelynFan/FaceFormer) [ProjectPage](https://evelynfan.github.io/audio2face/)

- LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces

from Video using Pose and Lighting Normalization [CVPR 2021] [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lahiri_LipSync3D_Data-Efficient_Learning_of_Personalized_3D_Talking_Faces_From_Video_CVPR_2021_paper.pdf)

- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Richard_MeshTalk_3D_Face_Animation_From_Speech_Using_Cross-Modality_Disentanglement_ICCV_2021_paper.pdf)

- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] [Paper](https://arxiv.org/abs/2103.11078) [Code](https://github.com/YudongGuo/AD-NeRF)

- 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] [Paper](https://arxiv.org/abs/2104.12051)

- Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] [Paper](https://arxiv.org/abs/2005.13616)

- Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] [Paper](https://arxiv.org/abs/2008.05023)

- Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019]  [Paper](http://openaccess.thecvf.com/content_CVPR_2019/html/Cudeiro_Capture_Learning_and_Synthesis_of_3D_Speaking_Styles_CVPR_2019_paper.html)

- VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018]  [Paper](http://arxiv.org/abs/1805.09488)

- Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018]  [Paper](https://arxiv.org/abs/1806.00154)

- End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] [Paper](https://dl.acm.org/doi/pdf/10.1145/3242969.3243017)

- Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]

- A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017]  [Paper](https://dl.acm.org/citation.cfm?id=3073699)

- Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017]  [Paper](https://dl.acm.org/citation.cfm?id=3073658)

- Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]

- Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016]  [Paper](https://www.researchgate.net/publication/307889314_Expressive_Speech_Driven_Talking_Avatar_Synthesis_with_DBLSTM_Using_Limited_Amount_of_Emotional_Bimodal_Data)

- Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012]  [Paper](https://www.ncbi.nlm.nih.gov/pubmed/18244487)

- Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010]  [Paper](https://link.springer.com/10.1007/978-3-642-12604-8_6)

## Datasets & Benchmark

- Responsive Listening Head Generation: A Benchmark Dataset and Baseline [ECCV 2022] [Paper](https://arxiv.org/abs/2112.13548) [ProjectPage](https://vico.solutions/)

- TalkingHead-1KH [Link](https://github.com/deepimagination/TalkingHead-1KH)

- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] [ProjectPage](https://wywu.github.io/projects/MEAD/MEAD.html)

- VoxCeleb [Link](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)

- LRW [Link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html)

- LRS2 [Link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html)

- GRID [Link](https://spandh.dcs.shef.ac.uk//avlombard/)

- CREMA-D [Link](https://github.com/CheyneyComputerScience/CREMA-D)

- MMFace4D [Link](https://arxiv.org/abs/2303.09797)

- DPCD [Link](https://github.com/Metaverse-AI-Lab-THU/Deep-Personalized-Character-Dataset-DPCD) [Paper](https://arxiv.org/abs/2304.11093)

## Survey

- A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing [arXiv 2024] [Paper](https://arxiv.org/abs/2406.10553v2)

- From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications [arXiv 2023] [Paper](https://arxiv.org/abs/2308.16041)

- Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] [Paper](https://arxiv.org/abs/2205.10839)

- What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] [Paper](https://arxiv.org/abs/2005.03201)

## Colabs

- Avatars4All: https://github.com/eyaler/avatars4all
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/JosephPai/Awesome-Talking-Face

Awesome Lists containing this project

README