{"id":13577275,"url":"https://github.com/Rudrabha/Wav2Lip","last_synced_at":"2025-04-05T11:32:03.636Z","repository":{"id":37478827,"uuid":"285774113","full_name":"Rudrabha/Wav2Lip","owner":"Rudrabha","description":"This repository contains the codes of \"A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild\", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs ","archived":false,"fork":false,"pushed_at":"2024-09-24T21:37:09.000Z","size":542,"stargazers_count":10646,"open_issues_count":322,"forks_count":2270,"subscribers_count":172,"default_branch":"master","last_synced_at":"2024-10-29T11:24:05.242Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://synclabs.so","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Rudrabha.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-07T08:06:38.000Z","updated_at":"2024-10-29T10:13:01.000Z","dependencies_parsed_at":"2023-02-19T02:15:41.323Z","dependency_job_id":"282f56dc-a6a3-4ea8-b409-f9ad9b95d10c","html_url":"https://github.com/Rudrabha/Wav2Lip","commit_stats":{"total_commits":92,"total_committers":9,"mean_commits":"10.222222222222221","dds":"0.40217391304347827","last_synced_commit":"d2bc3ac51dd6b758b52370f253e5115248c22090"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rudrabha%2FWav2Lip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rudrabha%2FWav2Lip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rudrabha%2FWav2Lip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rudrabha%2FWav2Lip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Rudrabha","download_url":"https://codeload.github.com/Rudrabha/Wav2Lip/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223186501,"owners_count":17102476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:01:20.010Z","updated_at":"2025-04-05T11:32:03.623Z","avatar_url":"https://github.com/Rudrabha.png","language":"Python","funding_links":[],"categories":["Python","\u003cspan id=\"animation\"\u003eAnimation\u003c/span\u003e","视频剪辑","HarmonyOS","语音识别与合成_其他","Video Generation and Editing","Repos","Media Tools"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Windows Manager","网络服务_其他","Coding","FFmpeg-Based Tools"],"readme":"# **Wav2Lip**: *Accurately Lip-syncing Videos In The Wild* \n### A commercial version of Wav2Lip can be directly accessed at https://sync.so\nAre you looking to integrate this into a product? We have a turn-key hosted API with new and improved lip-syncing models here: https://sync.so/\nFor any other commercial / enterprise requests, please contact us at pavan@sync.so and prady@sync.so\nTo reach out to the authors directly you can reach us at prajwal@sync.so, rudrabha@sync.so.\nThis code is part of the paper: _A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild_ published at ACM Multimedia 2020. \n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-lip-sync-expert-is-all-you-need-for-speech/lip-sync-on-lrs2)](https://paperswithcode.com/sota/lip-sync-on-lrs2?p=a-lip-sync-expert-is-all-you-need-for-speech)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-lip-sync-expert-is-all-you-need-for-speech/lip-sync-on-lrs3)](https://paperswithcode.com/sota/lip-sync-on-lrs3?p=a-lip-sync-expert-is-all-you-need-for-speech)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-lip-sync-expert-is-all-you-need-for-speech/lip-sync-on-lrw)](https://paperswithcode.com/sota/lip-sync-on-lrw?p=a-lip-sync-expert-is-all-you-need-for-speech)\n|📑 Original Paper|📰 Project Page|🌀 Demo|⚡ Live Testing|📔 Colab Notebook\n|:-:|:-:|:-:|:-:|:-:|\n[Paper](http://arxiv.org/abs/2008.10010) | [Project Page](http://cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild/) | [Demo Video](https://youtu.be/0fXaDCZNOJc) | [Interactive Demo](https://synclabs.so/) | [Colab Notebook](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing) /[Updated Collab Notebook](https://colab.research.google.com/drive/1IjFW1cLevs6Ouyu4Yht4mnR4yeuMqO7Y#scrollTo=MH1m608OymLH)\n \n![Logo](https://drive.google.com/uc?export=view\u0026id=1Wn0hPmpo4GRbCIJR8Tf20Akzdi1qjjG9)\n----------\n**Highlights**\n----------\n - Weights of the visual quality disc has been updated in readme!\n - Lip-sync videos to any target speech with high accuracy :100:. Try our [interactive demo](https://sync.so/).\n - :sparkles: Works for any identity, voice, and language. Also works for CGI faces and synthetic voices.\n - Complete training code, inference code, and pretrained models are available :boom:\n - Or, quick-start with the Google Colab Notebook: [Link](https://colab.research.google.com/drive/1tZpDWXz49W6wDcTprANRGLo2D_EbD5J8?usp=sharing). Checkpoints and samples are available in a Google Drive [folder](https://drive.google.com/drive/folders/1I-0dNLfFOSFwrfqjNa-SXuwaURHE5K4k?usp=sharing) as well. There is also a [tutorial video](https://www.youtube.com/watch?v=Ic0TBhfuOrA) on this, courtesy of [What Make Art](https://www.youtube.com/channel/UCmGXH-jy0o2CuhqtpxbaQgA). Also, thanks to [Eyal Gruss](https://eyalgruss.com), there is a more accessible [Google Colab notebook](https://j.mp/wav2lip) with more useful features. A tutorial collab notebook is present at this [link](https://colab.research.google.com/drive/1IjFW1cLevs6Ouyu4Yht4mnR4yeuMqO7Y#scrollTo=MH1m608OymLH).  \n - :fire: :fire: Several new, reliable evaluation benchmarks and metrics [[`evaluation/` folder of this repo]](https://github.com/Rudrabha/Wav2Lip/tree/master/evaluation) released. Instructions to calculate the metrics reported in the paper are also present.\n--------\n**Disclaimer**\n--------\nAll results from this open-source code or our [demo website](https://bhaasha.iiit.ac.in/lipsync) should only be used for research/academic/personal purposes only. As the models are trained on the \u003ca href=\"http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html\"\u003eLRS2 dataset\u003c/a\u003e, any form of commercial use is strictly prohibited. For commercial requests please contact us directly!\nPrerequisites\n-------------\n- `Python 3.6` \n- ffmpeg: `sudo apt-get install ffmpeg`\n- Install necessary packages using `pip install -r requirements.txt`. Alternatively, instructions for using a docker image is provided [here](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668). Have a look at [this comment](https://github.com/Rudrabha/Wav2Lip/issues/131#issuecomment-725478562) and comment on [the gist](https://gist.github.com/xenogenesi/e62d3d13dadbc164124c830e9c453668) if you encounter any issues. \n- Face detection [pre-trained model](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) should be downloaded to `face_detection/detection/sfd/s3fd.pth`. Alternative [link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/prajwal_k_research_iiit_ac_in/EZsy6qWuivtDnANIG73iHjIBjMSoojcIV0NULXV-yiuiIg?e=qTasa8) if the above does not work.\nGetting the weights\n----------\n| Model  | Description |  Link to the model | \n| :-------------: | :---------------: | :---------------: |\n| Wav2Lip  | Highly accurate lip-sync | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/Eb3LEzbfuKlJiR600lQWRxgBIY27JZg80f7V9jtMfbNDaQ?e=TBFBVW)  |\n| Wav2Lip + GAN  | Slightly inferior lip-sync, but better visual quality | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW) |\n| Expert Discriminator  | Weights of the expert discriminator | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EQRvmiZg-HRAjvI6zqN9eTEBP74KefynCwPWVmF57l-AYA?e=ZRPHKP) |\n| Visual Quality Discriminator  | Weights of the visual disc trained in a GAN setup | [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EQVqH88dTm1HjlK11eNba5gBbn15WMS0B0EZbDBttqrqkg?e=ic0ljo) |\nLip-syncing videos using the pre-trained models (Inference)\n-------\nYou can lip-sync any video to any audio:\n```bash\npython inference.py --checkpoint_path \u003cckpt\u003e --face \u003cvideo.mp4\u003e --audio \u003can-audio-source\u003e \n```\nThe result is saved (by default) in `results/result_voice.mp4`. You can specify it as an argument,  similar to several other available options. The audio source can be any file supported by `FFMPEG` containing audio data: `*.wav`, `*.mp3` or even a video file, from which the code will automatically extract the audio.\n##### Tips for better results:\n- Experiment with the `--pads` argument to adjust the detected face bounding box. Often leads to improved results. You might need to increase the bottom padding to include the chin region. E.g. `--pads 0 20 0 0`.\n- If you see the mouth position dislocated or some weird artifacts such as two mouths, then it can be because of over-smoothing the face detections. Use the `--nosmooth` argument and give it another try. \n- Experiment with the `--resize_factor` argument, to get a lower-resolution video. Why? The models are trained on faces that were at a lower resolution. You might get better, visually pleasing results for 720p videos than for 1080p videos (in many cases, the latter works well too). \n- The Wav2Lip model without GAN usually needs more experimenting with the above two to get the most ideal results, and sometimes, can give you a better result as well.\nPreparing LRS2 for training\n----------\nOur models are trained on LRS2. See [here](#training-on-datasets-other-than-lrs2) for a few suggestions regarding training on other datasets.\n##### LRS2 dataset folder structure\n```\ndata_root (mvlrs_v1)\n├── main, pretrain (we use only main folder in this work)\n|\t├── list of folders\n|\t│   ├── five-digit numbered video IDs ending with (.mp4)\n```\nPlace the LRS2 filelists (train, val, test) `.txt` files in the `filelists/` folder.\n##### Preprocess the dataset for fast training\n```bash\npython preprocess.py --data_root data_root/main --preprocessed_root lrs2_preprocessed/\n```\nAdditional options like `batch_size` and the number of GPUs to use in parallel to use can also be set.\n##### Preprocessed LRS2 folder structure\n```\npreprocessed_root (lrs2_preprocessed)\n├── list of folders\n|\t├── Folders with five-digit numbered video IDs\n|\t│   ├── *.jpg\n|\t│   ├── audio.wav\n```\nTrain!\n----------\nThere are two major steps: (i) Train the expert lip-sync discriminator, (ii) Train the Wav2Lip model(s).\n##### Training the expert discriminator\nYou can download [the pre-trained weights](#getting-the-weights) if you want to skip this step. To train it:\n```bash\npython color_syncnet_train.py --data_root lrs2_preprocessed/ --checkpoint_dir \u003cfolder_to_save_checkpoints\u003e\n```\n##### Training the Wav2Lip models\nYou can either train the model without the additional visual quality discriminator (\u003c 1 day of training) or use the discriminator (~2 days). For the former, run: \n```bash\npython wav2lip_train.py --data_root lrs2_preprocessed/ --checkpoint_dir \u003cfolder_to_save_checkpoints\u003e --syncnet_checkpoint_path \u003cpath_to_expert_disc_checkpoint\u003e\n```\nTo train with the visual quality discriminator, you should run `hq_wav2lip_train.py` instead. The arguments for both files are similar. In both cases, you can resume training as well. Look at `python wav2lip_train.py --help` for more details. You can also set additional less commonly-used hyper-parameters at the bottom of the `hparams.py` file.\nTraining on datasets other than LRS2\n------------------------------------\nTraining on other datasets might require modifications to the code. Please read the following before you raise an issue:\n- You might not get good results by training/fine-tuning on a few minutes of a single speaker. This is a separate research problem, to which we do not have a solution yet. Thus, we would most likely not be able to resolve your issue. \n- You must train the expert discriminator for your own dataset before training Wav2Lip.\n- If it is your own dataset downloaded from the web, in most cases, needs to be sync-corrected.\n- Be mindful of the FPS of the videos of your dataset. Changes to FPS would need significant code changes. \n- The expert discriminator's eval loss should go down to ~0.25 and the Wav2Lip eval sync loss should go down to ~0.2 to get good results. \nWhen raising an issue on this topic, please let us know that you are aware of all these points.\nWe have an HD model trained on a dataset allowing commercial usage. The size of the generated face will be 192 x 288 in our new model.\nEvaluation\n----------\nPlease check the `evaluation/` folder for the instructions.\nLicense and Citation\n----------\nThis repository can only be used for personal/research/non-commercial purposes. However, for commercial requests, please contact us directly at rudrabha@synclabs.so or prajwal@synclabs.so. We have a turn-key hosted API with new and improved lip-syncing models here: https://synclabs.so/\nThe size of the generated face will be 192 x 288 in our new models. Please cite the following paper if you use this repository:\n```\n@inproceedings{10.1145/3394171.3413532,\nauthor = {Prajwal, K R and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P. and Jawahar, C.V.},\ntitle = {A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild},\nyear = {2020},\nisbn = {9781450379885},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3394171.3413532},\ndoi = {10.1145/3394171.3413532},\nbooktitle = {Proceedings of the 28th ACM International Conference on Multimedia},\npages = {484–492},\nnumpages = {9},\nkeywords = {lip sync, talking face generation, video generation},\nlocation = {Seattle, WA, USA},\nseries = {MM '20}\n}\n```\nAcknowledgments\n----------\nParts of the code structure are inspired by this [TTS repository](https://github.com/r9y9/deepvoice3_pytorch). We thank the author for this wonderful code. The code for Face Detection has been taken from the [face_alignment](https://github.com/1adrianb/face-alignment) repository. We thank the authors for releasing their code and models. We thank [zabique](https://github.com/zabique) for the tutorial collab notebook.\n## Acknowledgements\n - [Awesome Readme Templates](https://awesomeopensource.com/project/elangosundar/awesome-README-templates)\n - [Awesome README](https://github.com/matiassingers/awesome-readme)\n - [How to write a Good readme](https://bulldogjob.com/news/449-how-to-write-a-good-readme-for-your-github-project)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRudrabha%2FWav2Lip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRudrabha%2FWav2Lip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRudrabha%2FWav2Lip/lists"}