{"id":22109616,"url":"https://github.com/xcmyz/fastspeech","last_synced_at":"2025-05-16T19:07:42.570Z","repository":{"id":35533645,"uuid":"189590197","full_name":"xcmyz/FastSpeech","owner":"xcmyz","description":"The Implementation of FastSpeech based on pytorch.","archived":false,"fork":false,"pushed_at":"2023-07-06T22:00:13.000Z","size":18189,"stargazers_count":868,"open_issues_count":14,"forks_count":213,"subscribers_count":35,"default_branch":"master","last_synced_at":"2025-04-12T17:50:17.200Z","etag":null,"topics":["deep-learning","pytorch","speech-synthesis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xcmyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-31T12:27:47.000Z","updated_at":"2025-04-10T03:05:46.000Z","dependencies_parsed_at":"2023-01-15T23:01:08.263Z","dependency_job_id":"bdb65536-f1e8-41b2-b1e2-52e4d45bda87","html_url":"https://github.com/xcmyz/FastSpeech","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xcmyz%2FFastSpeech","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xcmyz%2FFastSpeech/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xcmyz%2FFastSpeech/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xcmyz%2FFastSpeech/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xcmyz","download_url":"https://codeload.github.com/xcmyz/FastSpeech/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254592395,"owners_count":22097013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","pytorch","speech-synthesis"],"created_at":"2024-12-01T09:35:08.593Z","updated_at":"2025-05-16T19:07:42.531Z","avatar_url":"https://github.com/xcmyz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FastSpeech-Pytorch\nThe Implementation of FastSpeech Based on Pytorch.\n\n## Update (2020/07/20)\n1. Optimize the training process.\n2. Optimize the implementation of length regulator.\n3. Use the same hyper parameter as FastSpeech2.\n4. **The measures of the 1, 2 and 3 make the training process 3 times faster than before.**\n5. **Better speech quality.**\n\n## Model\n\u003cdiv style=\"text-align: center\"\u003e\n    \u003cimg src=\"img/fastspeech_structure.png\" style=\"max-width:100%;\"\u003e\n\u003c/div\u003e\n\n## My Blog\n- [FastSpeech Reading Notes](https://zhuanlan.zhihu.com/p/67325775)\n- [Details and Rethinking of this Implementation](https://zhuanlan.zhihu.com/p/67939482)\n\n## Prepare Dataset\n1. Download and extract [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).\n2. Put LJSpeech dataset in `data`.\n3. Unzip `alignments.zip`.\n4. Put [Nvidia pretrained waveglow model](https://drive.google.com/file/d/1WsibBTsuRg_SF2Z6L6NFRTT-NjEy1oTx/view?usp=sharing) in the `waveglow/pretrained_model` and rename as `waveglow_256channels.pt`;\n5. Run `python3 preprocess.py`.\n\n## Training\nRun `python3 train.py`.\n\n## Evaluation\nRun `python3 eval.py`.\n\n## Notes\n- In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.\n- I use the same hyper-parameter as [FastSpeech2](https://arxiv.org/abs/2006.04558).\n- The examples of audio are in `sample`.\n- [pretrained model](https://drive.google.com/file/d/1vMrKtbjPj9u_o3Y-8prE6hHCc6Yj4Nqk/view?usp=sharing).\n\n## Reference\n\n### Repository\n- [The Implementation of Tacotron Based on Tensorflow](https://github.com/keithito/tacotron)\n- [The Implementation of Transformer Based on Pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch)\n- [The Implementation of Transformer-TTS Based on Pytorch](https://github.com/xcmyz/Transformer-TTS)\n- [The Implementation of Tacotron2 Based on Pytorch](https://github.com/NVIDIA/tacotron2)\n- [The Implementation of FastSpeech2 Based on Pytorch](https://github.com/ming024/FastSpeech2)\n\n### Paper\n- [Tacotron2](https://arxiv.org/abs/1712.05884)\n- [Transformer](https://arxiv.org/abs/1706.03762)\n- [FastSpeech](https://arxiv.org/abs/1905.09263)\n- [FastSpeech2](https://arxiv.org/abs/2006.04558)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxcmyz%2Ffastspeech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxcmyz%2Ffastspeech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxcmyz%2Ffastspeech/lists"}