Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DinoMan/speech-driven-animation
https://github.com/DinoMan/speech-driven-animation
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/DinoMan/speech-driven-animation
- Owner: DinoMan
- Created: 2019-01-29T16:08:41.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-09-10T11:15:52.000Z (about 1 year ago)
- Last Synced: 2024-08-01T18:37:17.622Z (3 months ago)
- Language: Python
- Size: 1.19 MB
- Stars: 946
- Watchers: 56
- Forks: 290
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Speech-Driven Animation
This library implements the end-to-end facial synthesis model described in this [paper](https://sites.google.com/view/facialsynthesis/home).
This library is maintained by Konstantinos Vougioukas, Honglie Chen and Pingchuan Ma.
![speech-driven-animation](example.gif)
## Downloading the models
The models were hosted on git LFS. However the demand was so high that I reached the quota for free gitLFS storage. I have moved the models to GoogleDrive. Models can be found [here](https://drive.google.com/drive/folders/17Dc2keVoNSrlrOdLL3kXdM8wjb20zkbF?usp=sharing).
Place the model file(s) under *`sda/data/`*## Installing
To install the library do:
```
$ pip install .
```## Running the example
To create the animations you will need to instantiate the VideoAnimator class. Then you provide an image and audio clip (or the paths to the files) and a video will be produced.
## Choosing the model
The model has been trained on the GRID, TCD-TIMIT, CREMA-D and LRW datasets. The default model is GRID. To load another pretrained model simply instantiate the VideoAnimator with the following arguments:```
import sda
va = sda.VideoAnimator(gpu=0, model_path="crema") # Instantiate the animator
```The models that are currently uploaded are:
- [x] GRID
- [x] TIMIT
- [x] CREMA
- [ ] LRW### Example with image and audio paths
```
import sda
va = sda.VideoAnimator(gpu=0) # Instantiate the animator
vid, aud = va("example/image.bmp", "example/audio.wav")
```### Example with numpy arrays
```
import sda
import scipy.io.wavfile as wav
from PIL import Imageva = sda.VideoAnimator(gpu=0) # Instantiate the animator
fs, audio_clip = wav.read("example/audio.wav")
still_frame = Image.open("example/image.bmp")
vid, aud = va(frame, audio_clip, fs=fs)
```### Saving video with audio
```
va.save_video(vid, aud, "generated.mp4")
```## Using the encodings
The encoders for audio and video are made available so that they can be used to produce features for classification tasks.### Audio encoder
The Audio encoder (which is made of Audio-frame encoder and RNN) is provided along with a dictionary which has information such as the feature length (in seconds) required by the Audio Frame encoder and the overlap between audio frames.
```
import sda
encoder, info = sda.get_audio_feature_extractor(gpu=0)
```## Citation
If you find this code useful in your research, please consider to cite the following papers:
```bibtex
@inproceedings{vougioukas2019end,
title={End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs.},
author={Vougioukas, Konstantinos and Petridis, Stavros and Pantic, Maja},
booktitle={CVPR Workshops},
pages={37--40},
year={2019}
}
```