{"id":13488384,"url":"https://github.com/DinoMan/speech-driven-animation","last_synced_at":"2025-03-28T00:33:49.647Z","repository":{"id":37686715,"uuid":"168184477","full_name":"DinoMan/speech-driven-animation","owner":"DinoMan","description":null,"archived":false,"fork":false,"pushed_at":"2023-09-10T11:15:52.000Z","size":1251,"stargazers_count":951,"open_issues_count":24,"forks_count":290,"subscribers_count":56,"default_branch":"master","last_synced_at":"2024-10-31T00:39:56.357Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DinoMan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-01-29T16:08:41.000Z","updated_at":"2024-10-24T12:05:15.000Z","dependencies_parsed_at":"2024-01-16T09:02:39.784Z","dependency_job_id":"2a9a1e8f-226e-4a7e-ba05-f1c45628ef70","html_url":"https://github.com/DinoMan/speech-driven-animation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DinoMan%2Fspeech-driven-animation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DinoMan%2Fspeech-driven-animation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DinoMan%2Fspeech-driven-animation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DinoMan%2Fspeech-driven-animation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DinoMan","download_url":"https://codeload.github.com/DinoMan/speech-driven-animation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949289,"owners_count":20698913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:14.768Z","updated_at":"2025-03-28T00:33:44.617Z","avatar_url":"https://github.com/DinoMan.png","language":"Python","funding_links":[],"categories":["Python","5. Generation of synthetic content"],"sub_categories":["5.4 Generation Videos"],"readme":"# Speech-Driven Animation\r\n\r\nThis library implements the end-to-end facial synthesis model described in this [paper](https://sites.google.com/view/facialsynthesis/home).\r\n\r\nThis library is maintained by Konstantinos Vougioukas, Honglie Chen and Pingchuan Ma.\r\n\r\n![speech-driven-animation](example.gif)\r\n\r\n## Downloading the models\r\nThe models were hosted on git LFS. However the demand was so high that I reached the quota for free gitLFS storage. I have moved the models to GoogleDrive. Models can be found [here](https://drive.google.com/drive/folders/17Dc2keVoNSrlrOdLL3kXdM8wjb20zkbF?usp=sharing).\r\nPlace the model file(s) under *`sda/data/`*\r\n\r\n## Installing\r\n\r\nTo install the library do:\r\n```\r\n$ pip install .\r\n```\r\n\r\n## Running the example\r\n\r\nTo create the animations you will need to instantiate the VideoAnimator class. Then you provide an image and audio clip (or the paths to the files) and a video will be produced.\r\n\r\n\r\n## Choosing the model\r\nThe model has been trained on the GRID, TCD-TIMIT, CREMA-D and LRW datasets. The default model is GRID. To load another pretrained model simply instantiate the VideoAnimator with the following arguments:\r\n\r\n```\r\nimport sda\r\nva = sda.VideoAnimator(gpu=0, model_path=\"crema\")  # Instantiate the animator\r\n```\r\n\r\nThe models that are currently uploaded are:\r\n- [x] GRID\r\n- [x] TIMIT\r\n- [x] CREMA\r\n- [ ] LRW\r\n\r\n\r\n### Example with image and audio paths\r\n```\r\nimport sda\r\nva = sda.VideoAnimator(gpu=0)  # Instantiate the animator\r\nvid, aud = va(\"example/image.bmp\", \"example/audio.wav\")\r\n```\r\n\r\n### Example with numpy arrays\r\n```\r\nimport sda\r\nimport scipy.io.wavfile as wav\r\nfrom PIL import Image\r\n\r\nva = sda.VideoAnimator(gpu=0) # Instantiate the animator\r\nfs, audio_clip = wav.read(\"example/audio.wav\")\r\nstill_frame = Image.open(\"example/image.bmp\")\r\nvid, aud = va(frame, audio_clip, fs=fs)\r\n```\r\n\r\n### Saving video with audio\r\n```\r\nva.save_video(vid, aud, \"generated.mp4\")\r\n```\r\n\r\n## Using the encodings\r\nThe encoders for audio and video are made available so that they can be used to produce features for classification tasks.\r\n\r\n### Audio encoder\r\nThe Audio encoder (which is made of Audio-frame encoder and RNN) is provided along with a dictionary which has information such as the feature length (in seconds) required by the Audio Frame encoder and the overlap between audio frames.\r\n```\r\nimport sda\r\nencoder, info = sda.get_audio_feature_extractor(gpu=0)\r\n```\r\n\r\n## Citation\r\n\r\nIf you find this code useful in your research, please consider to cite the following papers:\r\n\r\n```bibtex\r\n@inproceedings{vougioukas2019end,\r\n  title={End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs.},\r\n  author={Vougioukas, Konstantinos and Petridis, Stavros and Pantic, Maja},\r\n  booktitle={CVPR Workshops},\r\n  pages={37--40},\r\n  year={2019}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDinoMan%2Fspeech-driven-animation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDinoMan%2Fspeech-driven-animation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDinoMan%2Fspeech-driven-animation/lists"}