{"id":13477219,"url":"https://github.com/NVIDIA/tacotron2","last_synced_at":"2025-03-27T04:32:58.060Z","repository":{"id":37430904,"uuid":"132042932","full_name":"NVIDIA/tacotron2","owner":"NVIDIA","description":"Tacotron 2 - PyTorch implementation with faster-than-realtime inference","archived":false,"fork":false,"pushed_at":"2024-06-12T18:49:15.000Z","size":2766,"stargazers_count":5075,"open_issues_count":215,"forks_count":1379,"subscribers_count":116,"default_branch":"master","last_synced_at":"2024-10-29T15:04:57.180Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-03T19:54:06.000Z","updated_at":"2024-10-28T01:12:41.000Z","dependencies_parsed_at":"2023-01-23T16:16:03.853Z","dependency_job_id":"ef398b70-6048-45fe-a542-e2c8d917b07e","html_url":"https://github.com/NVIDIA/tacotron2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Ftacotron2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Ftacotron2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Ftacotron2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Ftacotron2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/tacotron2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222194845,"owners_count":16946988,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:01:39.576Z","updated_at":"2025-03-27T04:32:58.053Z","avatar_url":"https://github.com/NVIDIA.png","language":"Jupyter Notebook","readme":"# Tacotron 2 (without wavenet)\n\nPyTorch implementation of [Natural TTS Synthesis By Conditioning\nWavenet On Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf). \n\nThis implementation includes **distributed** and **automatic mixed precision** support\nand uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).\n\nDistributed and Automatic Mixed Precision support relies on NVIDIA's [Apex] and [AMP].\n\nVisit our [website] for audio samples using our published [Tacotron 2] and\n[WaveGlow] models.\n\n![Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram](tensorboard.png)\n\n\n## Pre-requisites\n1. NVIDIA GPU + CUDA cuDNN\n\n## Setup\n1. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)\n2. Clone this repo: `git clone https://github.com/NVIDIA/tacotron2.git`\n3. CD into this repo: `cd tacotron2`\n4. Initialize submodule: `git submodule init; git submodule update`\n5. Update .wav paths: `sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt`\n    - Alternatively, set `load_mel_from_disk=True` in `hparams.py` and update mel-spectrogram paths \n6. Install [PyTorch 1.0]\n7. Install [Apex]\n8. Install python requirements or build docker image \n    - Install python requirements: `pip install -r requirements.txt`\n\n## Training\n1. `python train.py --output_directory=outdir --log_directory=logdir`\n2. (OPTIONAL) `tensorboard --logdir=outdir/logdir`\n\n## Training using a pre-trained model\nTraining using a pre-trained model can lead to faster convergence  \nBy default, the dataset dependent text embedding layers are [ignored]\n\n1. Download our published [Tacotron 2] model\n2. `python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start`\n\n## Multi-GPU (distributed) and Automatic Mixed Precision Training\n1. `python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True`\n\n## Inference demo\n1. Download our published [Tacotron 2] model\n2. Download our published [WaveGlow] model\n3. `jupyter notebook --ip=127.0.0.1 --port=31337`\n4. Load inference.ipynb \n\nN.b.  When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2\nand the Mel decoder were trained on the same mel-spectrogram representation. \n\n\n## Related repos\n[WaveGlow](https://github.com/NVIDIA/WaveGlow) Faster than real time Flow-based\nGenerative Network for Speech Synthesis\n\n[nv-wavenet](https://github.com/NVIDIA/nv-wavenet/) Faster than real time\nWaveNet.\n\n## Acknowledgements\nThis implementation uses code from the following repos: [Keith\nIto](https://github.com/keithito/tacotron/), [Prem\nSeetharaman](https://github.com/pseeth/pytorch-stft) as described in our code.\n\nWe are inspired by [Ryuchi Yamamoto's](https://github.com/r9y9/tacotron_pytorch)\nTacotron PyTorch implementation.\n\nWe are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan\nWang and Zongheng Yang.\n\n\n[WaveGlow]: https://drive.google.com/open?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF\n[Tacotron 2]: https://drive.google.com/file/d/1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA/view?usp=sharing\n[pytorch 1.0]: https://github.com/pytorch/pytorch#installation\n[website]: https://nv-adlr.github.io/WaveGlow\n[ignored]: https://github.com/NVIDIA/tacotron2/blob/master/hparams.py#L22\n[Apex]: https://github.com/nvidia/apex\n[AMP]: https://github.com/NVIDIA/apex/tree/master/apex/amp","funding_links":[],"categories":["Jupyter Notebook","Uncategorized","Audio Systhesis","语音处理","Tools \u0026 Frameworks","PyTorch Models","Paper implementations｜论文实现","Paper implementations","Open Source TTS Libraries"],"sub_categories":["Uncategorized","Music-Video Synthesis","Open-source projects","Audio Processing","Other libraries｜其他库:","Other libraries:","Python Libraries"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA%2Ftacotron2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVIDIA%2Ftacotron2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA%2Ftacotron2/lists"}