Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/JosephPai/Awesome-Talking-Face

πŸ“– A curated list of resources dedicated to talking face.
https://github.com/JosephPai/Awesome-Talking-Face

List: Awesome-Talking-Face

Last synced: about 1 month ago
JSON representation

πŸ“– A curated list of resources dedicated to talking face.

Awesome Lists containing this project

README

        

# Awesome Talking Face [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome#readme)

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

#### :high_brightness: This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.

#### :star: If you find this repo useful, please star it!!!

#### 2022.09 Update!
Thanks for PR from everybody! From now on, I'll occasionally include some papers about **video-driven** talking face generation. Because I found that the community is trying to include the **video-driven** methods into the talking face generation scope, though it is originally termed as **Face Reenactment**.

So, if you are looking for **video-driven talking face generation**, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

#### 2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the **audio-driven** talking face generation works. However, I found several **text-based** research works are also very interesting. So I included them here. Enjoy it!

#### TO DO LIST

- [x] Main paper list
- [x] Add paper link
- [x] Add codes if have
- [x] Add project page if have
- [x] Datasets and survey

## Papers

### 2D Video - Person independent
#### 2024
- DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2410.13726) [Code](https://github.com/Hanbo-Cheng/DAWN-pytorch) [ProjectPage](https://hanbo-cheng.github.io/DAWN/)
- Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization [arXiv 2024] [Paper](https://arxiv.org/abs/2410.14283)
- MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting [arXiv 2024] [Paper](https://arxiv.org/abs/2410.10122) [Code](https://github.com/TMElyralab/MuseTalk)
- 3D-Aware Text-driven Talking Avatar Generation [ECCV 2024] [Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12305.pdf)
- LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details [arXiv 2024] [Paper](https://arxiv.org/abs/2410.00990)
- TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans [ECCVW 2024] [Paper](https://arxiv.org/abs/2409.16666)
- JoyHallo: Digital human model for Mandarin [arXiv 2024] [Paper](https://arxiv.org/abs/2409.13268) [Code](https://github.com/jdh-algo/JoyHallo)
- JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation [BMVC 2024] [Paper](https://arxiv.org/abs/2409.12156)
- StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads [TPAMI 2024] [Paper](https://www.arxiv.org/abs/2409.09292)
- DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures [CVPRW 2024] [Paper](https://arxiv.org/abs/2409.07649)
- EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [arXiv 2024] [Paper](https://www.arxiv.org/abs/2409.07255)
- SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2409.03270)
- SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing [arXiv 2024] [Paper](https://www.arxiv.org/abs/2409.03605)
- Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency [arXiv 2024] [Paper](https://arxiv.org/abs/2409.02634) [ProjectPage](https://loopyavatar.github.io/)
- PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2409.02657)
- CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2409.01876) [ProjectPage](https://cyberhost.github.io/)
- TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.13714)
- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2408.09347v1)
- FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.09384v1)
- LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control [arXiv 2024] [Paper](https://arxiv.org/abs/2407.03168) [ProjectPage](https://liveportrait.github.io/) [Code](https://github.com/KwaiVGI/LivePortrait)
- High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.05416)
- Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.01732)
- LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement [arXiv 2024] [Paper](https://arxiv.org/abs/2407.18595) [ProjectPage](https://tencentqqgylab.github.io/LinguaLinker/) [Code](https://github.com/TencentQQGYLab/LinguaLinker?tab=readme-ov-file)
- Learning Online Scale Transformation for Talking Head Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2407.09965)
- EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions [arXiv 2024] [Paper](https://arxiv.org/abs/2407.08136) [ProjectPage](https://badtobest.github.io/echomimic.html) [GitHub](https://github.com/BadToBest/EchoMimic)
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.08801) [ProjectPage](https://fudan-generative-vision.github.io/hallo/#/) [GitHub](https://github.com/fudan-generative-vision/hallo)
- RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network [arXiv 2024] [Paper](https://arxiv.org/abs/2406.18284)
- Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.07895)
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [arXiv 2024] [Paper](https://arxiv.org/abs/2406.07867) [ProjectPage](https://multidialog.github.io/)
- Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement [arXiv 2024] [Paper](https://arxiv.org/abs/2406.08096) [ProjectPage](https://ingrid789.github.io/MyTalk/)
- Controllable Talking Face Generation by Implicit Facial Keypoints Editing [arXiv 2024] [Paper](https://arxiv.org/abs/2406.02880)
- InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2405.15758) [ProjectPage](https://wangyuchi369.github.io/InstructAvatar/)
- Faces that Speak: Jointly Synthesising Talking Face and Speech from Text [arXiv 2024] [Paper](https://arxiv.org/abs/2405.10272) [ProjectPage](https://mm.kaist.ac.kr/projects/faces-that-speak/)
- Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2405.07257)
- SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space [arXiv 2024] [Paper](https://arxiv.org/abs/2405.05636) [ProjectPage](http://swaptalk.cc/)
- AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [arXiv 2024] [Paper](https://arxiv.org/abs/2405.03121) [Code](https://github.com/X-LANCE/AniTalker) [ProjectPage](https://x-lance.github.io/AniTalker/)
- NeRFFaceSpeech: One-shot Audio-diven 3D Talking Head Synthesis via Generative Prior [CVPR 2024 Workshop] [Paper](https://arxiv.org/abs/2405.05749) [Code](https://github.com/rlgnswk/NeRFFaceSpeech_Code/) [ProjectPage](https://rlgnswk.github.io/NeRFFaceSpeech_ProjectPage/)
- Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation [CVPR 2024 Workshop] [Paper](https://arxiv.org/abs/2405.04327)
- EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [arXiv 2024] [Paper](https://arxiv.org/abs/2404.19110)[ProjectPage](https://neeek2303.github.io/EMOPortraits)
- GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting [arXiv 2024] [Paper](https://arxiv.org/abs/2404.19040)
- VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time [arXiv 2024] [Paper](https://arxiv.org/abs/2404.10667) [ProjectPage](https://www.microsoft.com/en-us/research/project/vasa-1/)
- THQA: A Perceptual Quality Assessment Database for Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2404.09003) [Code](https://github.com/zyj-2000/THQA)
- Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [arXiv 2024] [Paper](https://arxiv.org/abs/2403.20153) [Code](https://github.com/KU-CVLAB/Talk3D) [ProjectPage](https://ku-cvlab.github.io/Talk3D/)
- EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis [arXiv 2024] [Paper](https://arxiv.org/abs/2404.01647) [Code](https://github.com/tanshuai0219/EDTalk) [ProjectPage](https://tanshuai0219.github.io/EDTalk/)
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17694) [Code](https://github.com/Zejun-Yang/AniPortrait)
- MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation [arXiv 2024] [Paper](https://ku-cvlab.github.io/MoDiTalker/) [ProjectPage](https://ku-cvlab.github.io/MoDiTalker/)
- Superior and Pragmatic Talking Face Generation with Teacher-Student Framework [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17883) [ProjectPage](https://superfacelink.github.io/)
- X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2403.15931)
- Adaptive Super Resolution For One-Shot Talking-Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2403.15944)
- Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style [arXiv 2024] [Paper](https://arxiv.org/abs/2403.06365)
- FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization [arXiv 2024] [Paper](https://arxiv.org/abs/2403.06375)
- FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio [arXiv 2024] [Paper](https://arxiv.org/abs/2403.01901) [Code](https://github.com/modelscope/facechain)
- Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis [CVPR 2024] [Paper](https://arxiv.org/abs/2402.17364) [Code](https://github.com/zhangzc21/DynTet)
- EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions [arXiv 2024] [Paper](https://arxiv.org/abs/2402.17485) [ProjectPage](https://humanaigc.github.io/emote-portrait-alive/) [Code](https://github.com/HumanAIGC/EMO)
- G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment [arXiv 2024] [Paper](https://arxiv.org/abs/2402.18122)
- Context-aware Talking Face Video Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.18092)
- EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.01422) [ProjectPage](https://peterfanfan.github.io/EmoSpeaker/) [Code](https://github.com/PeterFanFan/Emospeaker_code)
- GPAvatar: Generalizable and Precise Head Avatar from Image(s) [ICLR 2024] [Paper](https://arxiv.org/abs/2401.10215) [Code](https://github.com/xg-chu/GPAvatar)
- Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [ICLR 2024] [Paper](https://arxiv.org/abs/2401.08503)
- EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model [ICASSP 2024] [Paper](https://arxiv.org/abs/2401.08049)
- CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer [WACV 2024] [Paper](https://arxiv.org/abs/2311.06443) [Code](https://github.com/HowieMa/CVTHead)

#### 2023

- VectorTalker: SVG Talking Face Generation with Progressive Vectorisation [arXiv 2023] [Paper](https://arxiv.org/abs/2312.11568)
- DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models [arXiv 2023] [Paper](https://arxiv.org/abs/2312.09767) [ProjectPage](https://dreamtalk-project.github.io/)
- GMTalker: Gaussian Mixture based Emotional talking video Portraits [arXiv 2023] [Paper](https://arxiv.org/abs/2312.07669) [ProjectPage](https://bob35buaa.github.io/GMTalker)
- DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers [arXiv 2023] [Paper](https://arxiv.org/abs/2312.06400)
- R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning [arXiv 2023] [Paper](https://arxiv.org/abs/2312.05572)
- FT2TF: First-Person Statement Text-To-Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2312.05430)
- VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior [arXiv 2023] [Paper](https://arxiv.org/abs/2312.01841) [Code](https://github.com/HumanAIGC/VividTalk) [ProjectPage](https://humanaigc.github.io/vivid-talk/)
- SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2311.17590) [Code](https://github.com/ziqiaopeng/SyncTalk) [ProjectPage](https://ziqiaopeng.github.io/synctalk/)
- GAIA: Zero-shot Talking Avatar Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2311.15230)
- Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Li_Efficient_Region-Aware_Neural_Radiance_Fields_for_High-Fidelity_Talking_Portrait_Synthesis_ICCV_2023_paper.html) [ProjectPage](https://fictionarry.github.io/ER-NeRF/) [Code](https://github.com/Fictionarry/ER-NeRF)
- Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation [ICCV 2023] [Paper](https://arxiv.org/abs/2307.09906) [ProjectPage](https://harlanhong.github.io/publications/mcnet.html) [Code](https://github.com/harlanhong/ICCV2023-MCNET)
- MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions [ICCV 2023] [Paper](https://arxiv.org/abs/2307.10008) [ProjectPage](https://liuyunfei.net/projects/iccv23-moda/)
- ToonTalker: Cross-Domain Face Reenactment [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Gong_ToonTalker_Cross-Domain_Face_Reenactment_ICCV_2023_paper.pdf)
- Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation [ICCV 2023] [Paper](https://arxiv.org/abs/2309.04946) [ProjectPage](https://yuangan.github.io/eat/) [Code](https://github.com/yuangan/eat_code)
- EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Tan_EMMN_Emotional_Motion_Memory_Network_for_Audio-driven_Emotional_Talking_Face_ICCV_2023_paper.pdf)
- Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_Emotional_Listener_Portrait_Neural_Listener_Head_Generation_with_Emotion_ICCV_2023_paper.pdf)
- Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions [arXiv 2023] [Paper](https://arxiv.org/abs/2306.10813)
- Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow [arXiv 2023] [Paper](https://arxiv.org/abs/2307.09368)
- Reprogramming Audio-driven Talking Face Synthesis into Text-driven [arXiv 2023] [Paper](https://arxiv.org/abs/2306.16003)
- Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis [TCSVT 2023] [Paper](https://arxiv.org/abs/2309.00030)
- Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] [Paper](https://arxiv.org/abs/2306.03594)
- Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2306.03504)
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [CVPR 2023] [Paper](https://arxiv.org/abs/2211.12194) [Code](https://github.com/OpenTalker/SadTalker)
- MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation [CVPR 2023] [Paper](https://arxiv.org/abs/2212.08062) [ProjectPage](https://meta-portrait.github.io/) [Code](https://github.com/Meta-Portrait/MetaPortrait)
- Implicit Neural Head Synthesis via Controllable Local Deformation Fields [CVPR 2023] [Paper](https://arxiv.org/abs/2304.11113)
- LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook [CVPR 2023] [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LipFormer_High-Fidelity_and_Generalizable_Talking_Face_Generation_With_a_Pre-Learned_CVPR_2023_paper.pdf)
- GANHead: Towards Generative Animatable Neural Head Avatars [CVPR 2023] [Paper](https://arxiv.org/abs/2304.03950v1) [ProjectPage](https://wsj-sjtu.github.io/GANHead/) [Code](https://github.com/wsj-sjtu/GANHead)
- Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Parametric_Implicit_Face_Representation_for_Audio-Driven_Facial_Reenactment_CVPR_2023_paper.pdf)
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] [Paper](https://arxiv.org/abs/2305.08293) [Code](https://github.com/Weizhi-Zhong/IP_LAP)
- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] [Paper](https://arxiv.org/pdf/2305.05445.pdf) [ProjectPage](https://hangz-nju-cuhk.github.io/projects/StyleSync) [Code](https://github.com/guanjz20/StyleSync)
- Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] [Paper](https://arxiv.org/abs/2305.03713) [ProjectPage](https://research.nvidia.com/labs/nxp/avatar-fingerprinting/)
- Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] [Paper](https://arxiv.org/abs/2305.02594)
- High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [CVPR 2023] [Paper](https://arxiv.org/abs/2305.02572)
- StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2305.00521) [ProjectPage](https://stylelipsync.github.io) [Code](https://github.com/TaekyungKi/StyleLipSync)
- GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2305.00787) [ProjectPage](https://genefaceplusplus.github.io)
- High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] [Paper](https://arxiv.org/abs/2304.10168) [Project Page](https://yuegao.me/PECHead/)
- One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] [Paper](https://arxiv.org/abs/2304.05097) [ProjectPage](https://www.waytron.net/hidenerf/)
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] [Paper](https://arxiv.org/abs/2303.17480) [Code](https://github.com/Sxjdwang/TalkLip)
- Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] [Paper](https://arxiv.org/abs/2304.08945)
- That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2304.03275) [ProjectPage](https://mm.kaist.ac.kr/projects/FC-TFG/)
- Emotionally Enhanced Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.11548) [Code](https://github.com/sahilg06/EmoGen) [ProjectPage](https://midas.iiitd.edu.in/emo/)
- A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] [Paper](https://arxiv.org/abs/2304.00471v1)
- TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] [Paper](https://arxiv.org/abs/2304.00334v1)
- FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] [Paper](https://arxiv.org/abs/2303.17789)
- DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] [Paper](https://arxiv.org/abs/2303.17550) [ProjectPage](https://daetalker.github.io/)
- OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION [ICASSP 2023] [Paper](https://arxiv.org/abs/2302.08197)
- DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] [Paper](https://arxiv.org/abs/2303.07697) [Code](https://github.com/deepbrainai-research/discohead) [ProjectPage](https://deepbrainai-research.github.io/discohead/)
- GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] [Paper](https://arxiv.org/abs/2301.13430) [Code](https://github.com/yerfor/GeneFace) [ProjectPage](https://geneface.github.io/)
- OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR 2023] [Paper](https://arxiv.org/abs/2303.14662) [Code](https://github.com/theEricMa/OTAvatar)
- Emotionally Enhanced Talking Face Generation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.11548) [Code](https://github.com/sahilg06/EmoGen) [ProjectPage](https://midas.iiitd.edu.in/emo/)
- Style Transfer for 2D Talking Head Animation [arXiv 2023] [Paper](https://arxiv.org/abs/2303.09799)
- READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] [Paper](https://arxiv.org/abs/2303.00744)
- On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] [Paper](https://arxiv.org/abs/2303.00502)
- DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [CVPR 2023] [Paper](https://arxiv.org/abs/2301.03786)
- Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] [Paper](https://mstypulkowski.github.io/diffusedheads/diffused_heads.pdf) [ProjectPage](https://mstypulkowski.github.io/diffusedheads/)
- StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] [Paper](https://arxiv.org/abs/2301.01081) [Code](https://github.com/FuxiVirtualHuman/styletalk)
- Audio-Visual Face Reenactment [WACV 2023] [Paper](https://arxiv.org/abs/2210.02755) [ProjectPage](http://cvit.iiit.ac.in/research/projects/cvit-projects/avfr) [Code](https://github.com/mdv3101/AVFR-Gan/)

#### 2022
- Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2212.05005) [ProjectPage](https://memoryface.github.io/)
- Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] [Paper](https://arxiv.org/abs/2212.04970)
- Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] [Paper](https://arxiv.org/abs/2212.04248) [ProjectPage](https://zxyin.github.io/TH-PAD/)
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [CVPR 2022] [Paper](https://arxiv.org/abs/2211.14506) [ProjectPage](https://dorniwang.github.io/PD-FGC/)
- SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] [Paper](https://arxiv.org/abs/2211.09809) [ProjectPage](https://deepimagination.cc/SPACEx/)
- Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] [Paper](https://arxiv.org/abs/2210.03692) [Project Page](https://cvit.iiit.ac.in/research/projects/cvit-projects/talking-video-compression)
- Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] [Paper](https://arxiv.org/pdf/2209.01320.pdf)
- StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] [Paper](https://arxiv.org/pdf/2208.10922.pdf)
- Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] [Paper](https://arxiv.org/pdf/2208.02210.pdf)
- EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] [Paper](https://arxiv.org/pdf/2205.15278.pdf)
- Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] [Paper](https://arxiv.org/pdf/2209.04252.pdf)
- Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] [Paper](https://openreview.net/pdf?id=7r6kDq0mK_) [ProjectPage(note this page has auto-play music...)](https://wyhsirius.github.io/LIA-project/) [Code](https://github.com/wyhsirius/LIA)
- Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition [arXiv 2022] [Paper](https://arxiv.org/pdf/2211.12368.pdf) [ProjectPage](https://me.kiui.moe/radnerf/) [Code](https://github.com/ashawkey/RAD-NeRF)
- Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] [Paper](https://arxiv.org/pdf/2207.11770.pdf) [ProjectPage](https://sstzal.github.io/DFRF/) [Code](https://github.com/sstzal/DFRF)
- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] [Paper](https://arxiv.org/pdf/2201.07786.pdf) [ProjectPage](https://alvinliu0.github.io/projects/SSP-NeRF) [Code](https://github.com/alvinliu0/SSP-NeRF)
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] [Paper](https://arxiv.org/pdf/2104.14631.pdf) [ProjectPage](https://sites.google.com/view/sibozhang/text2video) [Code](https://github.com/sibozhang/Text2Video)
- StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2208.13717) [ProjectPage](https://stable-face.github.io/)
- Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] [Paper](https://www.ijcai.org/proceedings/2022/0184.pdf)
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] [Paper](https://arxiv.org/abs/2203.04036) [Code](https://github.com/FeiiYin/StyleHEAT) [ProjectPage](https://feiiyin.github.io/StyleHEAT/)
- DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] [Paper](https://arxiv.org/pdf/2201.00791.pdf)
- Dynamic Neural Textures: Generating Talking-Face Videos with
Continuously Controllable Expressions [arXiv 2022] [Paper](https://arxiv.org/pdf/2204.06180.pdf)
- Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] [Paper](https://arxiv.org/pdf/2201.05986.pdf)
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] [Paper](https://arxiv.org/pdf/2203.06605.pdf) [ProjectPage](https://harlanhong.github.io/publications/dagan.html) [Code](https://github.com/harlanhong/CVPR2022-DaGAN)
- Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] [Paper](https://arxiv.org/pdf/2203.02573.pdf) [Code](https://github.com/snap-research/MMVID) [ProjectPage](https://snap-research.github.io/MMVID/)
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] [Paper](https://arxiv.org/abs/2203.06605) [Code](https://github.com/harlanhong/CVPR2022-DaGAN) [ProjectPage](https://harlanhong.github.io/publications/dagan.html)
- Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Liang_Expressive_Talking_Head_Generation_With_Granular_Audio-Visual_Control_CVPR_2022_paper.pdf)
- Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Song_Talking_Face_Generation_With_Multilingual_TTS_CVPR_2022_paper.pdf) [DemoPage](https://huggingface.co/spaces/CVPR/ml-talking-face)
- SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] [Paper](https://ojs.aaai.org/index.php/AAAI/article/download/20102/19861)

#### 2021
- Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] [Paper](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/resources/SIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf) [Code](https://github.com/YuanxunLu/LiveSpeechPortraits)
- Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] [Paper](https://arxiv.org/abs/2111.00203) [Code](https://github.com/wuhaozhe/style_avatar)
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] [Paper](https://arxiv.org/abs/2103.11078) [Code](https://github.com/YudongGuo/AD-NeRF)
- FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhang_FACIAL_Synthesizing_Dynamic_Talking_Face_With_Implicit_Attribute_Learning_ICCV_2021_paper.pdf) [Code](https://github.com/zhangchenxu528/FACIAL)
- Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Meshry_Learned_Spatial_Representations_for_Few-Shot_Talking-Head_Synthesis_ICCV_2021_paper.pdf)
- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] [Paper](https://arxiv.org/abs/2104.11116) [Code](https://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS) [ProjectPage](https://hangz-nju-cuhk.github.io/projects/PC-AVS)
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_One-Shot_Free-View_Neural_Talking-Head_Synthesis_for_Video_Conferencing_CVPR_2021_paper.pdf)
- Audio-Driven Emotional Video Portraits [CVPR 2021] [Paper](https://jixinya.github.io/projects/evp/resources/evp.pdf) [Code](https://github.com/jixinya/EVP/)
- AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] [Paper](https://arxiv.org/pdf/2108.04325.pdf)
- Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] [Paper](https://www.bmvc2021-virtualconference.com/assets/papers/0291.pdf)
- Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] [Paper](https://www.ijcai.org/proceedings/2021/0152.pdf)
- Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] [Paper](https://arxiv.org/abs/2104.07995)
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] [Paper](https://arxiv.org/abs/2104.14631) [Code](https://github.com/sibozhang/Text2Video)

#### 2020
- Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] [Paper](http://arxiv.org/abs/2002.10137) [Code](https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose)
- A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] [Paper](http://arxiv.org/abs/2008.10010) [Code](https://github.com/Rudrabha/Wav2Lip)
- Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] [Paper](https://dl.acm.org/doi/abs/10.1145/3394171.3413844)
- Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] [Paper](https://arxiv.org/abs/2008.03592) [Code](https://github.com/eeskimez/emotalkingface)
- A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] [Paper](https://arxiv.org/abs/2002.08700)
- Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] [Paper](https://arxiv.org/abs/2001.05201)
- HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] [Paper](https://arxiv.org/abs/2012.08261)
- Talking-head Generation with Rhythmic Head Motion [ECCV 2020] [Paper](https://arxiv.org/abs/2007.08547)
- Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] [Paper](https://arxiv.org/pdf/1912.05566.pdf) [Project](https://justusthies.github.io/posts/neural-voice-puppetry/) [Code](https://github.com/JustusThies/NeuralVoicePuppetry)
- Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Prajwal_Learning_Individual_Speaking_Styles_for_Accurate_Lip_to_Speech_Synthesis_CVPR_2020_paper.pdf)
- Robust One Shot Audio to Video Generation [CVPRW 2020] [Paper](https://openaccess.thecvf.com/content_CVPRW_2020/html/w45/Kumar_Robust_One_Shot_Audio_to_Video_Generation_CVPRW_2020_paper.html)
- MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] [Paper](https://arxiv.org/abs/2004.12992) [Code](https://github.com/adobe-research/MakeItTalk)
- FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] [Paper](https://arxiv.org/abs/1911.09224)
- Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] [Paper](https://arxiv.org/abs/2003.12957)
- Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] [Paper](https://arxiv.org/abs/2002.08700)
- SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] [Paper](https://arxiv.org/abs/1912.05833)
- Animating Face using Disentangled Audio Representations [WACV 2020] [Paper](https://arxiv.org/abs/1910.00726)

#### Before 2020
- Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] [Paper](http://arxiv.org/abs/1906.06337) [PorjectPage](https://sites.google.com/view/facial-animation)
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] [Paper](https://arxiv.org/abs/1905.08233) [Code](https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models)
- Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] [Paper](http://www.cs.rochester.edu/u/lchen63/cvpr2019.pdf) [Code](https://github.com/lelechen63/ATVGnet)
- Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] [Paper](https://arxiv.org/abs/1807.07860) [Code](https://github.com/Hangz-nju-cuhk/Talking-Face-Generation-DAVS) [ProjectPage](https://liuziwei7.github.io/projects/TalkingFace)
- Lip Movements Generation at a Glance [ECCV 2018] [Paper](https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=2ahUKEwj54cbvupzoAhUyGKYKHXnfBuAQFjACegQIBBAB&url=http%3A%2F%2Fopenaccess.thecvf.com%2Fcontent_ECCV_2018%2Fpapers%2FLele_Chen_Lip_Movements_Generation_ECCV_2018_paper.pdf&usg=AOvVaw3FPJeIMPR56Bwm3k0bnQkI)
- X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] [Paper](https://www.robots.ox.ac.uk/~vgg/publications/2018/Wiles18/wiles18.pdf) [Code](https://github.com/oawiles/X2Face) [ProjectPage](http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/x2face.html)
- Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] [Paper](https://arxiv.org/abs/1804.04786) [Code](https://github.com/susanqq/Talking_Face_Generation)
- Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] [Paper](https://arxiv.org/abs/1803.07461)
- High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] [Paper](https://arxiv.org/abs/1812.06589)
- Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] [Paper](https://arxiv.org/pdf/1803.07716)
- You said that? [BMVC 2017] [Paper](https://arxiv.org/abs/1705.02966)

### 2D Video - Person dependent

- Continuously Controllable Facial Expression Editing in Talking Face Videos [TAFFC 2023] [Paper](https://doi.org/10.1109/TAFFC.2023.3334511) [Project Page](https://raineggplant.github.io/FEE4TV)
- Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] [Paper](http://grail.cs.washington.edu/projects/AudioToObama/siggraph17_obama.pdf) [Project Page](http://grail.cs.washington.edu/projects/AudioToObama/)
- PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] [Paper](http://www.researchgate.net/publication/323352468_Photorealistic_adaptation_and_interpolation_of_facial_expressions_using_HMMS_and_AAMS_for_audio-visual_speech_synthesis)
- HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] [Paper](https://www.scirp.org/pdf/JCC_2017082216385517.pdf)
- ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] [Paper](https://arxiv.org/abs/1801.01442)
- A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] [Paper](https://dl.acm.org/citation.cfm?id=2944665)
- Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] [Paper](https://www.researchgate.net/publication/259287794_Photo-Realistic_Expressive_Text_to_Talking_Head_Synthesis)
- PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] [Paper](https://www.researchgate.net/publication/272094351_Photo-real_talking_head_with_deep_bidirectional_LSTM)
- Expressive Speech-Driven Facial Animation [TOG 2005] [Paper](https://dl.acm.org/citation.cfm?id=1145094)

### 3D Animation
- MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes [NeurIPS 2024] [Paper](https://arxiv.org/abs/2410.06734) [Code](https://github.com/yerfor/MimicTalk) [ProjectPage](https://mimictalk.github.io/)
- ScanTalk: 3D Talking Heads from Unregistered Scans [ECCV 2024] [Paper](https://arxiv.org/abs/2403.10942) [Code](https://github.com/miccunifi/ScanTalk)
- Audio-Driven Emotional 3D Talking-Head Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2410.17262v1)
- Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2410.11041)
- 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy [arxiv 2024] [Paper](https://www.arxiv.org/abs/2409.10848)
- ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE [arXiv 2024] [Paper](https://arxiv.org/abs/2409.07966) [Code](https://github.com/uuembodiedsocialai/ProbTalk3D/)
- KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding [ECCV 2024] [Paper](https://arxiv.org/abs/2409.01113) [Code](https://github.com/ffxzh/KMTalk)
- EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention [arXiv 2024] [Paper](https://arxiv.org/abs/2408.11518)
- DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2408.06010)
- JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model [arXiv 2024] [Paper](https://www.arxiv.org/abs/2408.01627)
- GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer [arXiv 2024] [Paper](https://www.arxiv.org/abs/2408.01826)
- UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model [arXiv 2024] [Paper](https://arxiv.org/abs/2408.00762)
- EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [arXiv 2024] [Paper](https://arxiv.org/abs/2408.00297)
- EmoFace: Audio-driven Emotional 3D Face Animation [arXiv 2024] [Paper](https://arxiv.org/abs/2407.12501) [Code](https://github.com/SJTU-Lucy/EmoFace)
- MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset [InterSpeed 2024] [Paper](https://arxiv.org/abs/2406.14272) [ProjectPage](https://multi-talk.github.io/)
- 3D Gaussian Blendshapes for Head Avatar Animation [SIGGRAPH 2024] [Paper](https://arxiv.org/abs/2404.19398)
- CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2404.18604)
- GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [arXiv 2024] [Paper](https://arxiv.org/abs/2404.14037)
- Learn2Talk: 3D Talking Face Learns from 2D Talking Face [arXiv 2024] [Paper](https://arxiv.org/abs/2404.12888) [ProjectPage](https://lkjkjoiuiu.github.io/Learn2Talk/)
- Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication [arXiv 2024] [Paper](https://arxiv.org/abs/2403.19467)
- AnimateMe: 4D Facial Expressions via Diffusion Models [arXiv 2024] [Paper](https://arxiv.org/abs/2403.17213)
- EmoVOCA: Speech-Driven Emotional 3D Talking Heads [arXiv 2024] [Paper](https://arxiv.org/abs/2403.12886)
- FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [CVPR 2024] [Paper](https://arxiv.org/abs/2312.08459) [Code](https://github.com/shivangi-aneja/FaceTalk) [ProjectPage](https://shivangi-aneja.github.io/projects/facetalk/)
- AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation [arXiv 2024] [Paper](https://arxiv.org/abs/2402.16124)
- DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer [arXiv 2024] [Paper](https://arxiv.org/pdf/2402.05712.pdf) [Code](https://github.com/theEricMa/DiffSpeaker)
- Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance [arXiv 2024] [Paper](https://arxiv.org/abs/2401.15687) [ProjectPage](https://sites.google.com/view/media2face)
- EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement [SIGGRAPH Asia 2023] [Paper](https://arxiv.org/abs/2306.08990) [ProjectPage](https://emote.is.tue.mpg.de/)
- PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features [arXiv] [Paper](https://arxiv.org/abs/2312.02781)
- 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [arXiv 2023] [Paper](https://arxiv.org/abs/2312.00870) [Code](https://github.com/bala1144/3DiFACE) [ProjectPage](https://balamuruganthambiraja.github.io/3DiFACE/)
- Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications [arXiv 2023] [Paper](https://arxiv.org/abs/2311.18168)
- DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser [arXiv 2023] [Paper](https://arxiv.org/abs/2311.16565)
- DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models [arXiv 2023] [Paper](https://arxiv.org/abs/2310.00434) [ProjectPage](https://diffposetalk.github.io) [Code](https://github.com/DiffPoseTalk/DiffPoseTalk)
- Imitator: Personalized Speech-driven 3D Facial Animation [ICCV 2023] [Paper](https://arxiv.org/abs/2301.00023) [ProjectPage](https://balamuruganthambiraja.github.io/Imitator/) [Code](https://github.com/bala1144/Imitator)
- Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Speech4Mesh_Speech-Assisted_Monocular_3D_Facial_Reconstruction_for_Speech-Driven_3D_Facial_ICCV_2023_paper.pdf)
- Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding [ICCV 2023] [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Yang_Semi-supervised_Speech-driven_3D_Facial_Animation_via_Cross-modal_Encoding_ICCV_2023_paper.pdf)
- Audio-Driven 3D Facial Animation from In-the-Wild Videos [arXiv 2023] [Paper](https://arxiv.org/abs/2306.11541) [ProjectPage](https://faw3d.github.io/)
- EmoTalk: Speech-driven emotional disentanglement for 3D face animation [ICCV 2023] [Paper](https://arxiv.org/abs/2303.11089) [ProjectPage](https://ziqiaopeng.github.io/emotalk/)
- FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] [Paper](https://arxiv.org/abs/2303.05416) [Code](https://github.com/galib360/FaceXHuBERT) [ProjectPage](https://galib360.github.io/FaceXHuBERT/)
- Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] [Paper](https://arxiv.org/abs/2302.12532)
- Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] [Paper](https://arxiv.org/abs/2301.06059) [ProjectPage](https://linchaobao.github.io/viseme2023/)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [CVPR 2023] [Paper](https://arxiv.org/abs/2301.02379) [ProjectPage](https://doubiiu.github.io/projects/codetalker/)
- Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] [Paper](https://arxiv.org/abs/2301.02008)
- Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] [Paper](https://arxiv.org/abs/2301.00023) [ProjectPage](https://balamuruganthambiraja.github.io/Imitator/)
- PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] [Paper](https://arxiv.org/abs/2212.06384) [ProjectPage](https://showlab.github.io/pv3d/)
- Neural Emotion Director: Speech-preserving semantic control of facial expressions in β€œin-the-wild” videos [CVPR 2022] [Paper](https://arxiv.org/pdf/2112.00585.pdf) [Code](https://github.com/foivospar/NED)
- FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] [Paper](https://arxiv.org/pdf/2112.05329.pdf) [Code](https://github.com/EvelynFan/FaceFormer) [ProjectPage](https://evelynfan.github.io/audio2face/)
- LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces
from Video using Pose and Lighting Normalization [CVPR 2021] [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lahiri_LipSync3D_Data-Efficient_Learning_of_Personalized_3D_Talking_Faces_From_Video_CVPR_2021_paper.pdf)
- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Richard_MeshTalk_3D_Face_Animation_From_Speech_Using_Cross-Modality_Disentanglement_ICCV_2021_paper.pdf)
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] [Paper](https://arxiv.org/abs/2103.11078) [Code](https://github.com/YudongGuo/AD-NeRF)
- 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] [Paper](https://arxiv.org/abs/2104.12051)
- Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] [Paper](https://arxiv.org/abs/2005.13616)
- Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] [Paper](https://arxiv.org/abs/2008.05023)
- Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] [Paper](http://openaccess.thecvf.com/content_CVPR_2019/html/Cudeiro_Capture_Learning_and_Synthesis_of_3D_Speaking_Styles_CVPR_2019_paper.html)
- VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] [Paper](http://arxiv.org/abs/1805.09488)
- Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] [Paper](https://arxiv.org/abs/1806.00154)
- End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] [Paper](https://dl.acm.org/doi/pdf/10.1145/3242969.3243017)
- Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
- A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] [Paper](https://dl.acm.org/citation.cfm?id=3073699)
- Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] [Paper](https://dl.acm.org/citation.cfm?id=3073658)
- Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
- Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] [Paper](https://www.researchgate.net/publication/307889314_Expressive_Speech_Driven_Talking_Avatar_Synthesis_with_DBLSTM_Using_Limited_Amount_of_Emotional_Bimodal_Data)
- Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] [Paper](https://www.ncbi.nlm.nih.gov/pubmed/18244487)
- Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] [Paper](https://link.springer.com/10.1007/978-3-642-12604-8_6)

## Datasets & Benchmark

- Responsive Listening Head Generation: A Benchmark Dataset and Baseline [ECCV 2022] [Paper](https://arxiv.org/abs/2112.13548) [ProjectPage](https://vico.solutions/)
- TalkingHead-1KH [Link](https://github.com/deepimagination/TalkingHead-1KH)
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] [ProjectPage](https://wywu.github.io/projects/MEAD/MEAD.html)
- VoxCeleb [Link](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
- LRW [Link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html)
- LRS2 [Link](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html)
- GRID [Link](https://spandh.dcs.shef.ac.uk//avlombard/)
- CREMA-D [Link](https://github.com/CheyneyComputerScience/CREMA-D)
- MMFace4D [Link](https://arxiv.org/abs/2303.09797)
- DPCD [Link](https://github.com/Metaverse-AI-Lab-THU/Deep-Personalized-Character-Dataset-DPCD) [Paper](https://arxiv.org/abs/2304.11093)

## Survey

- A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing [arXiv 2024] [Paper](https://arxiv.org/abs/2406.10553v2)
- From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications [arXiv 2023] [Paper](https://arxiv.org/abs/2308.16041)
- Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] [Paper](https://arxiv.org/abs/2205.10839)
- What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] [Paper](https://arxiv.org/abs/2005.03201)

## Colabs
- Avatars4All: https://github.com/eyaler/avatars4all