https://github.com/netease-media/controltalk
Official code for "Controllable Talking Face Generation by Implicit Facial Keypoints Editing"
https://github.com/netease-media/controltalk
Last synced: 4 months ago
JSON representation
Official code for "Controllable Talking Face Generation by Implicit Facial Keypoints Editing"
- Host: GitHub
- URL: https://github.com/netease-media/controltalk
- Owner: NetEase-Media
- License: mit
- Created: 2024-10-31T02:47:03.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-31T09:27:24.000Z (7 months ago)
- Last Synced: 2024-12-07T07:16:41.243Z (6 months ago)
- Language: Python
- Homepage: https://netease-media.github.io/ControlTalk/
- Size: 49.7 MB
- Stars: 12
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ControlTalk
[](https://github.com/NetEase-Media/ControlTalk)
[](https://arxiv.org/pdf/2406.02880)Official code for "Controllable Talking Face Generation by Implicit Facial Keypoints Editing"
**ControlTalk**: A talking face generation method to control face expression deformation based on driven
audio, constructing the head pose and facial expression (lip motion) for both single image or sequential video inputs in a unified manner.
## News
- **2024/10/31**: Inference code is now available!## Installation
```bash
# Create a python 3.10 conda env (you could also use virtualenv)
conda env create -f environment.yml
```## Inference
### 1. Download checkpoints
- Download pretrained models from huggingface [detailed guidance](https://huggingface.co/docs/huggingface_hub/guides/download).```bash
# Download hubert model
https://huggingface.co/TencentGameMate/chinese-hubert-large# Download our pretrained model
https://huggingface.co/Lavivis/ControlTalk
```
- Put all pretrained models in `./checkpoints`, the file structure should be like:
```
checkpoints
├── audio_encoder.pt
├── lipControlNet.pt
├── 20231128_210236_337a_e0362-checkpoint.pth.tar
├── TencentGameMate
├───└──chinese-hubert-large
├─────────└──config.json
├─────────└──pytorch_model.bin
├─────────└──preprocessor_config.json
└─────────└──chinese-hubert-large-fairseq-ckpt.pt
```
### 2. Inference```bash
python inference.py \
--source_video './data/drive_video.mp4' \
--source_img_path './data/example.png' \
--audio './data/drive_audio.wav' \
--save_as_video \
--box -1 0 0 0 \
# --img_mode # if you only want to control the face expression
```## Training
Coming soon!
## Acknowledgements
- [chinese_speech_pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
- [face-vid2vid](https://nvlabs.github.io/face-vid2vid/)
- [face-vid2vid (Unofficial implementation)](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis)## Citation
If our work and codebase is useful for you, please cite as:
```
@article{zhao2024controllable,
title={Controllable Talking Face Generation by Implicit Facial Keypoints Editing},
author={Zhao, Dong and Shi, Jiaying and Li, Wenjun and Wang, Shudong and Xu, Shenghui and Pan, Zhaoming},
journal={arXiv preprint arXiv:2406.02880},
year={2024}
}
```
## LicenseOur code is released under MIT License.