{"id":15037984,"url":"https://github.com/tensorspeech/tensorflowtts","last_synced_at":"2025-04-09T01:21:51.135Z","repository":{"id":37472609,"uuid":"249107601","full_name":"TensorSpeech/TensorFlowTTS","owner":"TensorSpeech","description":":stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)","archived":false,"fork":false,"pushed_at":"2024-07-05T07:24:49.000Z","size":136523,"stargazers_count":3912,"open_issues_count":1,"forks_count":813,"subscribers_count":79,"default_branch":"master","last_synced_at":"2025-04-02T00:17:32.578Z","etag":null,"topics":["chinese-tts","fastspeech","fastspeech2","german-tts","japanese-tts","korea-tts","melgan","mobile-tts","multi-speaker-tts","multiband-melgan","parallel-wavegan","real-time","speech-synthesis","tacotron2","tensorflow2","text-to-speech","tflite","tts","vocoder","zh-tts"],"latest_commit_sha":null,"homepage":"https://tensorspeech.github.io/TensorFlowTTS/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TensorSpeech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-22T03:44:10.000Z","updated_at":"2025-03-29T18:44:00.000Z","dependencies_parsed_at":"2024-10-29T08:31:14.586Z","dependency_job_id":"6f63ff60-ace5-4427-881e-ef5679fadebc","html_url":"https://github.com/TensorSpeech/TensorFlowTTS","commit_stats":{"total_commits":607,"total_committers":42,"mean_commits":"14.452380952380953","dds":"0.44975288303130145","last_synced_commit":"136877136355c82d7ba474ceb7a8f133bd84767e"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowTTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowTTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowTTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TensorSpeech%2FTensorFlowTTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TensorSpeech","download_url":"https://codeload.github.com/TensorSpeech/TensorFlowTTS/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247953860,"owners_count":21024102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese-tts","fastspeech","fastspeech2","german-tts","japanese-tts","korea-tts","melgan","mobile-tts","multi-speaker-tts","multiband-melgan","parallel-wavegan","real-time","speech-synthesis","tacotron2","tensorflow2","text-to-speech","tflite","tts","vocoder","zh-tts"],"created_at":"2024-09-24T20:36:41.434Z","updated_at":"2025-04-09T01:21:51.120Z","avatar_url":"https://github.com/TensorSpeech.png","language":"Python","readme":"\u003ch2 align=\"center\"\u003e\n\u003cp\u003e :yum: TensorFlowTTS\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/tensorspeech/TensorFlowTTS/actions\"\u003e\n        \u003cimg alt=\"Build\" src=\"https://github.com/tensorspeech/TensorFlowTTS/workflows/CI/badge.svg?branch=master\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/tensorspeech/TensorFlowTTS/blob/master/LICENSE\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/tensorspeech/TensorflowTTS?color=red\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing\"\u003e\n        \u003cimg alt=\"Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003c/h2\u003e\n\u003ch2 align=\"center\"\u003e\n\u003cp\u003eReal-Time State-of-the-art Speech Synthesis for Tensorflow 2\n\u003c/h2\u003e\n\n:zany_face: TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.\n\n## What's new\n- 2021/08/18 (**NEW!**) Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See [Gradio Web Demo](https://huggingface.co/spaces/akhaliq/TensorFlowTTS).\n- 2021/08/12 (**NEW!**) Support French TTS (Tacotron2, Multiband MelGAN). Pls see the [colab](https://colab.research.google.com/drive/1jd3u46g-fGQw0rre8fIwWM9heJvrV1c0?usp=sharing). Many Thanks [Samuel Delalez](https://github.com/samuel-lunii)\n- 2021/06/01 Integrated with [Huggingface Hub](https://huggingface.co/tensorspeech). See the [PR](https://github.com/TensorSpeech/TensorFlowTTS/pull/555). Thanks [patrickvonplaten](https://github.com/patrickvonplaten) and [osanseviero](https://github.com/osanseviero)\n- 2021/03/18  Support IOS for FastSpeech2 and MB MelGAN. Thanks [kewlbear](https://github.com/kewlbear). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/ios)\n- 2021/01/18 Support TFLite C++ inference. Thanks [luan78zaoha](https://github.com/luan78zaoha). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cpptflite)\n- 2020/12/02 Support German TTS with [Thorsten dataset](https://github.com/thorstenMueller/deep-learning-german-tts). See the [Colab](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing). Thanks [thorstenMueller](https://github.com/thorstenMueller) and [monatis](https://github.com/monatis)\n- 2020/11/24 Add HiFi-GAN vocoder. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/hifigan)\n- 2020/11/19 Add Multi-GPU gradient accumulator. See [here](https://github.com/TensorSpeech/TensorFlowTTS/pull/377)\n- 2020/08/23 Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)\n- 2020/08/20 Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin)\n- 2020/08/18 Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file\n- 2020/08/14 Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan)\n- 2020/08/05 Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153)\n- 2020/07/17 Support MultiGPU for all Trainer\n- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from the TFlite team for his support\n- 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.\n- 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported\n\n\n## Features\n- High performance on Speech Synthesis.\n- Be able to fine-tune on other languages.\n- Fast, Scalable, and Reliable.\n- Suitable for deployment.\n- Easy to implement a new model, based-on abstract class.\n- Mixed precision to speed-up training if possible.\n- Support Single/Multi GPU gradient Accumulate.\n- Support both Single/Multi GPU in base trainer class.\n- TFlite conversion for all supported models.\n- Android example.\n- Support many languages (currently, we support Chinese, Korean, English, French and German)\n- Support C++ inference.\n- Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.\n\n## Requirements\nThis repository is tested on Ubuntu 18.04 with:\n\n- Python 3.7+\n- Cuda 10.1\n- CuDNN 7.6.5\n- Tensorflow 2.2/2.3/2.4/2.5/2.6\n- [Tensorflow Addons](https://github.com/tensorflow/addons) \u003e= 0.10.0\n\nDifferent Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. **We recommend you install TensorFlow 2.6.0 to training in case you want to use MultiGPU.**\n\n## Installation\n### With pip\n```bash\n$ pip install TensorFlowTTS\n```\n### From source\nExamples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.\n```bash\n$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git\n$ cd TensorFlowTTS\n$ pip install .\n```\nIf you want to upgrade the repository and its dependencies:\n```bash\n$ git pull\n$ pip install --upgrade .\n```\n\n# Supported Model architectures\nTensorFlowTTS currently  provides the following architectures:\n\n1. **MelGAN** released with the paper [MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711) by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.\n2. **Tacotron-2** released with the paper [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.\n3. **FastSpeech** released with the paper [FastSpeech: Fast, Robust, and Controllable Text to Speech](https://arxiv.org/abs/1905.09263) by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.\n4. **Multi-band MelGAN** released with the paper [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.\n5. **FastSpeech2** released with the paper [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558) by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.\n6. **Parallel WaveGAN** released with the paper [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.\n7. **HiFi-GAN** released with the paper [HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis](https://arxiv.org/abs/2010.05646) by Jungil Kong, Jaehyeon Kim, Jaekyoung Bae.\n\nWe are also implementing some techniques to improve quality and convergence speed from the following papers:\n\n2. **Guided Attention Loss** released with the paper [Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention\n](https://arxiv.org/abs/1710.08969) by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.\n\n\n# Audio Samples\nHere in an audio samples on valid set. [tacotron-2](https://drive.google.com/open?id=1kaPXRdLg9gZrll9KtvH3-feOBMM8sn3_), [fastspeech](https://drive.google.com/open?id=1f69ujszFeGnIy7PMwc8AkUckhIaT2OD0), [melgan](https://drive.google.com/open?id=1mBwGVchwtNkgFsURl7g4nMiqx4gquAC2), [melgan.stft](https://drive.google.com/open?id=1xUkDjbciupEkM3N4obiJAYySTo6J9z6b), [fastspeech2](https://drive.google.com/drive/u/1/folders/1NG7oOfNuXSh7WyAoM1hI8P5BxDALY_mU), [multiband_melgan](https://drive.google.com/drive/folders/1DCV3sa6VTyoJzZmKATYvYVDUAFXlQ_Zp)\n\n# Tutorial End-to-End\n\n## Prepare Dataset\n\nPrepare a dataset in the following format:\n```\n|- [NAME_DATASET]/\n|   |- metadata.csv\n|   |- wavs/\n|       |- file1.wav\n|       |- ...\n```\n\nWhere `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.\n\nNote that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts/synpaflex]` for example.\n\n## Preprocessing\n\nThe preprocessing has two steps:\n\n1. Preprocess audio features\n    - Convert characters to IDs\n    - Compute mel spectrograms\n    - Normalize mel spectrograms to [-1, 1] range\n    - Split the dataset into train and validation\n    - Compute the mean and standard deviation of multiple features from the **training** split\n2. Standardize mel spectrogram based on computed statistics\n\nTo reproduce the steps above:\n```\ntensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]\ntensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]\n```\n\nRight now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar), [`libritts`](http://www.openslr.org/60/), [`thorsten`](https://github.com/thorstenMueller/deep-learning-german-tts) and\n[`synpaflex`](https://www.ortolang.fr/market/corpora/synpaflex-corpus/) for dataset argument. In the future, we intend to support more datasets.\n\n**Note**: To run `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need to reformat it first before run preprocessing.\n\n**Note**: To run `synpaflex` preprocessing, please first run the notebook [notebooks/prepare_synpaflex.ipynb](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/notebooks/prepare_synpaflex.ipynb). We need to reformat it first before run preprocessing.\n\nAfter preprocessing, the structure of the project folder should be:\n```\n|- [NAME_DATASET]/\n|   |- metadata.csv\n|   |- wav/\n|       |- file1.wav\n|       |- ...\n|- dump_[ljspeech/kss/baker/libritts/thorsten]/\n|   |- train/\n|       |- ids/\n|           |- LJ001-0001-ids.npy\n|           |- ...\n|       |- raw-feats/\n|           |- LJ001-0001-raw-feats.npy\n|           |- ...\n|       |- raw-f0/\n|           |- LJ001-0001-raw-f0.npy\n|           |- ...\n|       |- raw-energies/\n|           |- LJ001-0001-raw-energy.npy\n|           |- ...\n|       |- norm-feats/\n|           |- LJ001-0001-norm-feats.npy\n|           |- ...\n|       |- wavs/\n|           |- LJ001-0001-wave.npy\n|           |- ...\n|   |- valid/\n|       |- ids/\n|           |- LJ001-0009-ids.npy\n|           |- ...\n|       |- raw-feats/\n|           |- LJ001-0009-raw-feats.npy\n|           |- ...\n|       |- raw-f0/\n|           |- LJ001-0001-raw-f0.npy\n|           |- ...\n|       |- raw-energies/\n|           |- LJ001-0001-raw-energy.npy\n|           |- ...\n|       |- norm-feats/\n|           |- LJ001-0009-norm-feats.npy\n|           |- ...\n|       |- wavs/\n|           |- LJ001-0009-wave.npy\n|           |- ...\n|   |- stats.npy\n|   |- stats_f0.npy\n|   |- stats_energy.npy\n|   |- train_utt_ids.npy\n|   |- valid_utt_ids.npy\n|- examples/\n|   |- melgan/\n|   |- fastspeech/\n|   |- tacotron2/\n|   ...\n```\n\n- `stats.npy` contains the mean and std from the training split mel spectrograms\n- `stats_energy.npy` contains the mean and std of energy values from the training split\n- `stats_f0.npy` contains the mean and std of F0 values in the training split\n- `train_utt_ids.npy` / `valid_utt_ids.npy` contains training and validation utterances IDs respectively\n\nWe use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats`, and `wave`) for each input type.\n\n\n**IMPORTANT NOTES**:\n- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.\n- Regardless of how your dataset is formatted, the final structure of the `dump` folder **SHOULD** follow the above structure to be able to use the training script, or you can modify it by yourself 😄.\n\n## Training models\n\nTo know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.\n\n- For Tacotron-2 tutorial, pls see [examples/tacotron2](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2)\n- For FastSpeech tutorial, pls see [examples/fastspeech](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech)\n- For FastSpeech2 tutorial, pls see [examples/fastspeech2](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech2)\n- For FastSpeech2 + MFA tutorial, pls see [examples/fastspeech2_libritts](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts)\n- For MelGAN tutorial, pls see [examples/melgan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/melgan)\n- For MelGAN + STFT Loss tutorial, pls see [examples/melgan.stft](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/melgan.stft)\n- For Multiband-MelGAN tutorial, pls see [examples/multiband_melgan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan)\n- For Parallel WaveGAN tutorial, pls see [examples/parallel_wavegan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)\n- For Multiband-MelGAN Generator + HiFi-GAN tutorial, pls see [examples/multiband_melgan_hf](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan_hf)\n- For HiFi-GAN tutorial, pls see [examples/hifigan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan)\n# Abstract Class Explaination\n\n## Abstract DataLoader Tensorflow-based dataset\n\nA detail implementation of abstract dataset class from [tensorflow_tts/dataset/abstract_dataset](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/datasets/abstract_dataset.py). There are some functions you need overide and understand:\n\n1. **get_args**: This function return argumentation for **generator** class, normally is utt_ids.\n2. **generator**: This function have an inputs from **get_args** function and return a inputs for models. **Note that we return a dictionary for all generator functions with the keys that exactly match with the model's parameters because base_trainer will use model(\\*\\*batch) to do forward step.**\n3. **get_output_dtypes**: This function need return dtypes for each element from **generator** function.\n4. **get_len_dataset**: Return len of datasets, normaly is len(utt_ids).\n\n**IMPORTANT NOTES**:\n\n- A pipeline of creating dataset should be: cache -\u003e shuffle -\u003e map_fn -\u003e get_batch -\u003e prefetch.\n- If you do shuffle before cache, the dataset won't shuffle when it re-iterate over datasets.\n- You should apply map_fn to make each element return from **generator** function have the same length before getting batch and feed it into a model.\n\nSome examples to use this **abstract_dataset** are [tacotron_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/tacotron2/tacotron_dataset.py), [fastspeech_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/fastspeech/fastspeech_dataset.py), [melgan_dataset.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan/audio_mel_dataset.py), [fastspeech2_dataset.py](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/fastspeech2_dataset.py)\n\n\n## Abstract Trainer Class\n\nA detail implementation of base_trainer from [tensorflow_tts/trainer/base_trainer.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py). It include [Seq2SeqBasedTrainer](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L265) and [GanBasedTrainer](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L149) inherit from [BasedTrainer](https://github.com/tensorspeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L16). All trainer support both single/multi GPU. There a some functions you **MUST** overide when implement new_trainer:\n\n- **compile**: This function aim to define a models, and losses.\n- **generate_and_save_intermediate_result**: This function will save intermediate result such as: plot alignment, save audio generated, plot mel-spectrogram ...\n- **compute_per_example_losses**: This function will compute per_example_loss for model, note that all element of the loss **MUST** has shape [batch_size].\n\nAll models on this repo are trained based-on **GanBasedTrainer** (see [train_melgan.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan/train_melgan.py), [train_melgan_stft.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/melgan.stft/train_melgan_stft.py), [train_multiband_melgan.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/multiband_melgan/train_multiband_melgan.py)) and **Seq2SeqBasedTrainer** (see [train_tacotron2.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/tacotron2/train_tacotron2.py), [train_fastspeech.py](https://github.com/tensorspeech/TensorFlowTTS/blob/master/examples/fastspeech/train_fastspeech.py)).\n\n# End-to-End Examples\nYou can know how to inference each model at [notebooks](https://github.com/tensorspeech/TensorFlowTTS/tree/master/notebooks) or see a [colab](https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing) (for English), [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing) (for Korean), [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing) (for Chinese), [colab](https://colab.research.google.com/drive/1jd3u46g-fGQw0rre8fIwWM9heJvrV1c0?usp=sharing) (for French), [colab](https://colab.research.google.com/drive/1W0nSFpsz32M0OcIkY9uMOiGrLTPKVhTy?usp=sharing) (for German). Here is an example code for end2end inference with fastspeech2 and multi-band melgan. We uploaded all our pretrained in [HuggingFace Hub](https://huggingface.co/tensorspeech).\n\n```python\nimport numpy as np\nimport soundfile as sf\nimport yaml\n\nimport tensorflow as tf\n\nfrom tensorflow_tts.inference import TFAutoModel\nfrom tensorflow_tts.inference import AutoProcessor\n\n# initialize fastspeech2 model.\nfastspeech2 = TFAutoModel.from_pretrained(\"tensorspeech/tts-fastspeech2-ljspeech-en\")\n\n\n# initialize mb_melgan model\nmb_melgan = TFAutoModel.from_pretrained(\"tensorspeech/tts-mb_melgan-ljspeech-en\")\n\n\n# inference\nprocessor = AutoProcessor.from_pretrained(\"tensorspeech/tts-fastspeech2-ljspeech-en\")\n\ninput_ids = processor.text_to_sequence(\"Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.\")\n# fastspeech inference\n\nmel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(\n    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),\n    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),\n    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),\n    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),\n    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),\n)\n\n# melgan inference\naudio_before = mb_melgan.inference(mel_before)[0, :, 0]\naudio_after = mb_melgan.inference(mel_after)[0, :, 0]\n\n# save to file\nsf.write('./audio_before.wav', audio_before, 22050, \"PCM_16\")\nsf.write('./audio_after.wav', audio_after, 22050, \"PCM_16\")\n```\n\n# Contact\n- [Minh Nguyen Quan Anh](https://github.com/tensorspeech): nguyenquananhminh@gmail.com\n- [erogol](https://github.com/erogol): erengolge@gmail.com\n- [Kuan Chen](https://github.com/azraelkuan): azraelkuan@gmail.com\n- [Dawid Kobus](https://github.com/machineko): machineko@protonmail.com\n- [Takuya Ebata](https://github.com/MokkeMeguru): meguru.mokke@gmail.com\n- [Trinh Le Quang](https://github.com/l4zyf9x): trinhle.cse@gmail.com\n- [Yunchao He](https://github.com/candlewill): yunchaohe@gmail.com\n- [Alejandro Miguel Velasquez](https://github.com/ZDisket): xml506ok@gmail.com\n\n# License\nAll models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n\n# Acknowledgement\nWe want to thank [Tomoki Hayashi](https://github.com/kan-bayashi), who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source [ParallelWaveGan](https://github.com/kan-bayashi/ParallelWaveGAN) project.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorspeech%2Ftensorflowtts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorspeech%2Ftensorflowtts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorspeech%2Ftensorflowtts/lists"}