{"id":26753141,"url":"https://github.com/mahshid1378/tacotron2","last_synced_at":"2026-05-03T06:41:44.929Z","repository":{"id":284154430,"uuid":"954001890","full_name":"mahshid1378/tacotron2","owner":"mahshid1378","description":"Tacotron 2 - PyTorch implementation with faster-than-realtime inference","archived":false,"fork":false,"pushed_at":"2025-03-24T12:50:50.000Z","size":1383,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T13:27:04.637Z","etag":null,"topics":["english","pytorch","speech-synthesis","text-to-speech","tts","vae"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mahshid1378.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-24T12:16:15.000Z","updated_at":"2025-03-24T12:50:54.000Z","dependencies_parsed_at":"2025-03-24T13:37:05.211Z","dependency_job_id":null,"html_url":"https://github.com/mahshid1378/tacotron2","commit_stats":null,"previous_names":["mahshid1378/tacotron2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mahshid1378/tacotron2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2Ftacotron2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2Ftacotron2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2Ftacotron2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2Ftacotron2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mahshid1378","download_url":"https://codeload.github.com/mahshid1378/tacotron2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahshid1378%2Ftacotron2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32560914,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["english","pytorch","speech-synthesis","text-to-speech","tts","vae"],"created_at":"2025-03-28T13:17:58.688Z","updated_at":"2026-05-03T06:41:44.913Z","avatar_url":"https://github.com/mahshid1378.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tacotron 2 (without wavenet)\n\nPyTorch implementation of [Natural TTS Synthesis By Conditioning\nWavenet On Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf). \n\nThis implementation includes **distributed** and **automatic mixed precision** support\nand uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).\n\nDistributed and Automatic Mixed Precision support relies on NVIDIA's [Apex] and [AMP].\n\nVisit our [website] for audio samples using our published [Tacotron 2] and\n[WaveGlow] models.\n\n![Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram](tensorboard.png)\n\n\n## Pre-requisites\n1. NVIDIA GPU + CUDA cuDNN\n\n## Setup\n1. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)\n2. Clone this repo: `git clone https://github.com/NVIDIA/tacotron2.git`\n3. CD into this repo: `cd tacotron2`\n4. Initialize submodule: `git submodule init; git submodule update`\n5. Update .wav paths: `sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt`\n    - Alternatively, set `load_mel_from_disk=True` in `hparams.py` and update mel-spectrogram paths \n6. Install [PyTorch 1.0]\n7. Install [Apex]\n8. Install python requirements or build docker image \n    - Install python requirements: `pip install -r requirements.txt`\n\n## Training\n1. `python train.py --output_directory=outdir --log_directory=logdir`\n2. (OPTIONAL) `tensorboard --logdir=outdir/logdir`\n\n## Training using a pre-trained model\nTraining using a pre-trained model can lead to faster convergence  \nBy default, the dataset dependent text embedding layers are [ignored]\n\n1. Download our published [Tacotron 2] model\n2. `python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start`\n\n## Multi-GPU (distributed) and Automatic Mixed Precision Training\n1. `python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True`\n\n## Inference demo\n1. Download our published [Tacotron 2] model\n2. Download our published [WaveGlow] model\n3. `jupyter notebook --ip=127.0.0.1 --port=31337`\n4. Load inference.ipynb \n\nN.b.  When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2\nand the Mel decoder were trained on the same mel-spectrogram representation. \n\n\n## Related repos\n[WaveGlow](https://github.com/NVIDIA/WaveGlow) Faster than real time Flow-based\nGenerative Network for Speech Synthesis\n\n[nv-wavenet](https://github.com/NVIDIA/nv-wavenet/) Faster than real time\nWaveNet.\n\n## Acknowledgements\nThis implementation uses code from the following repos: , [Prem\nSeetharaman[Keith\nIto](https://github.com/keithito/tacotron/)](https://github.com/pseeth/pytorch-stft) as described in our code.\n\nWe are inspired by [Ryuchi Yamamoto's](https://github.com/r9y9/tacotron_pytorch)\nTacotron PyTorch implementation.\n\nWe are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan\nWang and Zongheng Yang.\n\n\n[WaveGlow]: https://drive.google.com/open?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF\n[Tacotron 2]: https://drive.google.com/file/d/1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA/view?usp=sharing\n[pytorch 1.0]: https://github.com/pytorch/pytorch#installation\n[website]: https://nv-adlr.github.io/WaveGlow\n[ignored]: https://github.com/NVIDIA/tacotron2/blob/master/hparams.py#L22\n[Apex]: https://github.com/nvidia/apex\n[AMP]: https://github.com/NVIDIA/apex/tree/master/apex/amp","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshid1378%2Ftacotron2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmahshid1378%2Ftacotron2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshid1378%2Ftacotron2/lists"}