https://github.com/Vaibhavs10/open-tts-tracker

Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/Vaibhavs10/open-tts-tracker
Owner: Vaibhavs10
Created: 2024-01-15T12:35:45.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-21T10:33:25.000Z (11 months ago)
Last Synced: 2024-10-14T00:05:46.508Z (7 months ago)
Size: 98.6 KB
Stars: 1,084
Watchers: 65
Forks: 69
Open Issues: 6
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-osml-for-devs - Open TTS Tracker - source text-to-speech models. (Audio / Models and Demos)
awesome-osml-for-devs - Open TTS Tracker - source text-to-speech models. (Audio / Models and Demos)
README

        # 🗣️ Open TTS Tracker

A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.

This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.

> [!NOTE]  

> This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! 🤗

| Name | GitHub | Weights | License | Fine-tune | Languages | Paper | Demo | Issues |

|---|---|---|---|---|---|---|---|---|

| Amphion | [Repo](https://github.com/open-mmlab/Amphion) | [🤗 Hub](https://huggingface.co/amphion) | [MIT](https://github.com/open-mmlab/Amphion/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2312.09911) | [🤗 Space](https://huggingface.co/amphion) |  |

| AI4Bharat | [Repo](https://github.com/AI4Bharat/Indic-TTS) | [🤗 Hub](https://huggingface.co/ai4bharat) | [MIT](https://github.com/AI4Bharat/Indic-TTS/blob/master/LICENSE.txt) | [Yes](https://github.com/AI4Bharat/Indic-TTS?tab=readme-ov-file#training-steps) | Indic | [Paper](https://arxiv.org/abs/2211.09536) | [Demo](https://models.ai4bharat.org/#/tts) |

| Bark | [Repo](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bark) | [🤗 Hub](https://huggingface.co/suno/bark) | [MIT](https://github.com/suno-ai/bark/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2209.03143) | [🤗 Space](https://huggingface.co/spaces/suno/bark) |  |

| EmotiVoice | [Repo](https://github.com/netease-youdao/EmotiVoice) | [GDrive](https://drive.google.com/drive/folders/1y6Xwj_GG9ulsAonca_unSGbJ4lxbNymM) | [Apache 2.0](https://github.com/netease-youdao/EmotiVoice/blob/main/LICENSE) | [Yes](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data) | ZH + EN | Not Available | Not Available | Separate [GUI agreement](https://github.com/netease-youdao/EmotiVoice/blob/main/EmotiVoice_UserAgreement_%E6%98%93%E9%AD%94%E5%A3%B0%E7%94%A8%E6%88%B7%E5%8D%8F%E8%AE%AE.pdf) |

| Fish-speech | [Repo](https://github.com/fishaudio/fish-speech) | [🤗 Hub](https://huggingface.co/fishaudio) | [Apache 2.0](https://github.com/fishaudio/fish-speech/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2411.01156) | [🤗 Space](https://huggingface.co/spaces/fishaudio/fish-speech-1) | |

| Glow-TTS | [Repo](https://github.com/jaywalnut310/glow-tts) | [GDrive](https://drive.google.com/file/d/1JiCMBVTG4BMREK8cT3MYck1MgYvwASL0/view) | [MIT](https://github.com/jaywalnut310/glow-tts/blob/master/LICENSE) | [Yes](https://github.com/jaywalnut310/glow-tts?tab=readme-ov-file#2-pre-requisites) | English | [Paper](https://arxiv.org/abs/2005.11129) | [GH Pages](https://jaywalnut310.github.io/glow-tts-demo/index.html) |  |

| GPT-SoVITS | [Repo](https://github.com/RVC-Boss/GPT-SoVITS) | [🤗 Hub](https://huggingface.co/lj1995/GPT-SoVITS) | [MIT](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE) | [Yes](https://github.com/RVC-Boss/GPT-SoVITS?tab=readme-ov-file#pretrained-models) | Multilingual | Not Available | Not Available | |

| HierSpeech++ | [Repo](https://github.com/sh-lee-prml/HierSpeechpp) | [GDrive](https://drive.google.com/drive/folders/1-L_90BlCkbPyKWWHTUjt5Fsu3kz0du0w) | [MIT](https://github.com/sh-lee-prml/HierSpeechpp/blob/main/LICENSE) | No | KR + EN | [Paper](https://arxiv.org/abs/2311.12454) | [🤗 Space](https://huggingface.co/spaces/LeeSangHoon/HierSpeech_TTS) | |

| IMS-Toucan | [Repo](https://github.com/DigitalPhonetics/IMS-Toucan) | [GH release](https://github.com/DigitalPhonetics/IMS-Toucan/tags) | [Apache 2.0](https://github.com/DigitalPhonetics/IMS-Toucan/blob/ToucanTTS/LICENSE) | [Yes](https://github.com/DigitalPhonetics/IMS-Toucan#build-a-toucantts-pipeline) | Multilingual | [Paper](https://arxiv.org/abs/2206.12229) | [🤗 Space](https://huggingface.co/spaces/Flux9665/IMS-Toucan) |  |

| Kokoro | [Repo](https://github.com/hexgrad/kokoro) | [🤗 Hub](https://huggingface.co/hexgrad) | [Apache 2.0](https://github.com/hexgrad/kokoro/blob/main/LICENSE)  | Yes | Multilingual | | [🤗 Space](https://huggingface.co/spaces/hexgrad/Kokoro-TTS) | |

| Llasa | [Repo](https://github.com/zhenye234/LLaSA_training) | [🤗 Hub](https://huggingface.co/HKUSTAudio) | [CC-BY-NC 4.0](https://github.com/zhenye234/LLaSA_training/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2502.04128) | [🤗 Space](https://huggingface.co/spaces/zhenye234/LLaSA_TTS) | |

| MahaTTS | [Repo](https://github.com/dubverse-ai/MahaTTS) | [🤗 Hub](https://huggingface.co/Dubverse/MahaTTS) | [Apache 2.0](https://github.com/dubverse-ai/MahaTTS/blob/main/LICENSE) | No | English + Indic | Not Available | [Recordings](https://github.com/dubverse-ai/MahaTTS/blob/main/README.md#sample-outputs), [Colab](https://colab.research.google.com/drive/1qkZz2km-PX75P0f6mUb2y5e-uzub27NW?usp=sharing) | |

| Matcha-TTS | [Repo](https://github.com/shivammehta25/Matcha-TTS) | [GDrive](https://drive.google.com/drive/folders/17C_gYgEHOxI5ZypcfE_k1piKCtyR0isJ) | [MIT](https://github.com/shivammehta25/Matcha-TTS/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Matcha-TTS/tree/main#train-with-your-own-dataset) | English | [Paper](https://arxiv.org/abs/2309.03199) | [🤗 Space](https://huggingface.co/spaces/shivammehta25/Matcha-TTS) | GPL-licensed phonemizer |

| MeloTTS | [Repo](https://github.com/myshell-ai/MeloTTS) | [🤗 Hub](https://huggingface.co/myshell-ai/) | [MIT](https://github.com/myshell-ai/MeloTTS/blob/main/LICENSE) | Yes | Multilingual | | [🤗 Space](https://huggingface.co/spaces/mrfakename/MeloTTS) | |

| MetaVoice-1B | [Repo](https://github.com/metavoiceio/metavoice-src) | [🤗 Hub](https://huggingface.co/metavoiceio/metavoice-1B-v0.1/tree/main) | [Apache 2.0](https://github.com/metavoiceio/metavoice-src/blob/main/LICENSE) | [Yes](https://github.com/metavoiceio/metavoice-src?tab=readme-ov-file) | Multilingual | Not Available | [🤗 Space](https://ttsdemo.themetavoice.xyz/) |  |

| Neural-HMM TTS | [Repo](https://github.com/shivammehta25/Neural-HMM) | [GitHub](https://github.com/shivammehta25/Neural-HMM/releases) | [MIT](https://github.com/shivammehta25/Neural-HMM/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Neural-HMM?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2108.13320) | [GH Pages](https://shivammehta25.github.io/Neural-HMM/) |  |

| OpenVoice | [Repo](https://github.com/myshell-ai/OpenVoice) | [🤗 Hub](https://huggingface.co/myshell-ai/OpenVoice) | [CC-BY-NC 4.0](https://github.com/myshell-ai/OpenVoice/blob/main/LICENSE) | No | ZH + EN | [Paper](https://arxiv.org/abs/2312.01479) | [🤗 Space](https://huggingface.co/spaces/myshell-ai/OpenVoice) | Non Commercial |

| OuteTTS | [Repo](https://github.com/edwko/OuteTTS) | [🤗 Hub](https://huggingface.co/OuteAI/) | [Apache 2.0](https://github.com/edwko/OuteTTS/blob/main/LICENSE) | No | Multilingual | | [🤗 Space](https://huggingface.co/spaces/OuteAI/OuteTTS-0.3-1B-Demo) | |

| OverFlow TTS | [Repo](https://github.com/shivammehta25/OverFlow) | [GitHub](https://github.com/shivammehta25/OverFlow/releases) | [MIT](https://github.com/shivammehta25/OverFlow/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/OverFlow/tree/main?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2211.06892) | [GH Pages](https://shivammehta25.github.io/OverFlow/) |  |

| Parler TTS | [Repo](https://github.com/huggingface/parler-tts) | [🤗 Hub](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) | [Apache 2.0](https://github.com/huggingface/parler-tts/blob/main/LICENSE) | [Yes](https://github.com/huggingface/parler-tts/tree/main/training) | English | Not Available | Not Available | |

| pflowTTS | [Unofficial Repo](https://github.com/p0p4k/pflowtts_pytorch) | [GDrive](https://drive.google.com/drive/folders/1x-A2Ezmmiz01YqittO_GLYhngJXazaF0) | [MIT](https://github.com/p0p4k/pflowtts_pytorch/blob/master/LICENSE) | [Yes](https://github.com/p0p4k/pflowtts_pytorch#instructions-to-run) | English | [Paper](https://openreview.net/pdf?id=zNA7u7wtIN) | Not Available | GPL-licensed phonemizer |

| Piper | [Repo](https://github.com/rhasspy/piper) | [🤗 Hub](https://huggingface.co/datasets/rhasspy/piper-checkpoints/) | [MIT](https://github.com/rhasspy/piper/blob/master/LICENSE.md) | [Yes](https://github.com/rhasspy/piper/blob/master/TRAINING.md) | Multilingual | Not Available | Not Available | [GPL-licensed phonemizer](https://github.com/rhasspy/piper/issues/93) |

| Pheme | [Repo](https://github.com/PolyAI-LDN/pheme) | [🤗 Hub](https://huggingface.co/PolyAI/pheme) | [CC-BY](https://github.com/PolyAI-LDN/pheme/blob/main/LICENSE) | [Yes](https://github.com/PolyAI-LDN/pheme#training) | English | [Paper](https://arxiv.org/abs/2401.02839) | [🤗 Space](https://huggingface.co/spaces/PolyAI/pheme) |  |

| RAD-MMM | [Repo](https://github.com/NVIDIA/RAD-MMM) | [GDrive](https://drive.google.com/file/d/1p8SEVHRlyLQpQnVP2Dc66RlqJVVRDCsJ/view) | [MIT](https://github.com/NVIDIA/RAD-MMM/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/RAD-MMM?tab=readme-ov-file#training) | Multilingual | [Paper](https://arxiv.org/pdf/2301.10335.pdf) | [Jupyter Notebook](https://github.com/NVIDIA/RAD-MMM/blob/main/inference.ipynb), [Webpage](https://research.nvidia.com/labs/adlr/projects/radmmm/) |  |

| RAD-TTS | [Repo](https://github.com/NVIDIA/radtts) | [GDrive](https://drive.google.com/file/d/1Rb2VMUwQahGrnpFSlAhCPh7OpDN3xgOr/view?usp=sharing) | [MIT](https://github.com/NVIDIA/radtts/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/radtts#training-radtts-without-pitch-and-energy-conditioning) | English | [Paper](https://openreview.net/pdf?id=0NQwnnwAORi) | [GH Pages](https://nv-adlr.github.io/RADTTS) |  |

| Silero | [Repo](https://github.com/snakers4/silero-models) | [GH links](https://github.com/snakers4/silero-models/blob/master/models.yml) | [CC BY-NC-SA](https://github.com/snakers4/silero-models/blob/master/LICENSE) | [No](https://github.com/snakers4/silero-models/discussions/78) | EM + DE + ES + EA | Not Available | Not Available | [Non Commercial](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) |

| StyleTTS 2 | [Repo](https://github.com/yl4579/StyleTTS2) | [🤗 Hub](https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main) | [MIT](https://github.com/yl4579/StyleTTS2/blob/main/LICENSE) | [Yes](https://github.com/yl4579/StyleTTS2#finetuning) | English | [Paper](https://arxiv.org/abs/2306.07691) | [🤗 Space](https://huggingface.co/spaces/styletts2/styletts2) | GPL-licensed phonemizer |

| Tacotron 2 | [Unofficial Repo](https://github.com/NVIDIA/tacotron2) | [GDrive](https://drive.google.com/file/d/1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA/view) | [BSD-3](https://github.com/NVIDIA/tacotron2/blob/master/LICENSE) | [Yes](https://github.com/NVIDIA/tacotron2/tree/master?tab=readme-ov-file#training) | English | [Paper](https://arxiv.org/abs/1712.05884) | [Webpage](https://google.github.io/tacotron/publications/tacotron2/) |  |

| TorToiSe TTS | [Repo](https://github.com/neonbjb/tortoise-tts) | [🤗 Hub](https://huggingface.co/jbetker/tortoise-tts-v2) | [Apache 2.0](https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE) | [Yes](https://git.ecker.tech/mrq/tortoise-tts) | English | [Technical report](https://arxiv.org/abs/2305.07243) | [🤗 Space](https://huggingface.co/spaces/Manmay/tortoise-tts) |  |

| TTTS | [Repo](https://github.com/adelacvg/ttts) | [🤗 Hub](https://huggingface.co/adelacvg/TTTS) | [MPL 2.0](https://github.com/adelacvg/ttts/blob/master/LICENSE) | No | ZH | Not Available | [Colab](https://colab.research.google.com/github/adelacvg/ttts/blob/master/demo.ipynb), [🤗 Space](https://huggingface.co/spaces/mrfakename/TTTS) | |

| VALL-E | [Unofficial Repo](https://github.com/enhuiz/vall-e) | Not Available | [MIT](https://github.com/enhuiz/vall-e/blob/main/LICENSE) | [Yes](https://github.com/enhuiz/vall-e#get-started) | NA | [Paper](https://arxiv.org/abs/2301.02111) | Not Available |  |

| VITS/ MMS-TTS | [Repo](https://github.com/huggingface/transformers/tree/7142bdfa90a3526cfbed7483ede3afbef7b63939/src/transformers/models/vits) | [🤗 Hub](https://huggingface.co/kakao-enterprise) / [MMS](https://huggingface.co/models?search=mms-tts) | [Apache 2.0](https://github.com/huggingface/transformers/blob/main/LICENSE) | [Yes](https://github.com/ylacombe/finetune-hf-vits) | English | [Paper](https://arxiv.org/abs/2106.06103) | [🤗 Space](https://huggingface.co/spaces/kakao-enterprise/vits) | GPL-licensed phonemizer |

| WhisperSpeech | [Repo](https://github.com/collabora/WhisperSpeech) | [🤗 Hub](https://huggingface.co/collabora/whisperspeech) | [MIT](https://github.com/collabora/WhisperSpeech/blob/main/LICENSE) | No | English, Polish | Not Available | [🤗 Space](https://huggingface.co/spaces/collabora/WhisperSpeech), [Recordings](https://github.com/collabora/WhisperSpeech/blob/main/README.md), [Colab](https://colab.research.google.com/github/collabora/WhisperSpeech/blob/8168a30f26627fcd15076d10c85d9e33c52204cf/Inference%20example.ipynb) | |

| XTTS | [Repo](https://github.com/coqui-ai/TTS) | [🤗 Hub](https://huggingface.co/coqui/XTTS-v2) | [CPML](https://coqui.ai/cpml) | [Yes](https://docs.coqui.ai/en/latest/models/xtts.html#training) | Multilingual | [Paper](https://arxiv.org/abs/2406.04904) | [🤗 Space](https://huggingface.co/spaces/coqui/xtts) | Non Commercial |

| xVASynth | [Repo](https://github.com/DanRuta/xVA-Synth) | [🤗 Hub](https://huggingface.co/Pendrokar/xvapitch_nvidia) | [GPL-3.0](https://github.com/DanRuta/xVA-Synth/blob/master/LICENSE.md) | [Yes](https://github.com/DanRuta/xva-trainer) | Multilingual | [Paper](https://arxiv.org/abs/2009.14153) | [🤗 Space](https://huggingface.co/spaces/Pendrokar/xVASynth) | Copyrighted materials used for training. |

| Zonos | [Repo](https://github.com/Zyphra/Zonos/) | [🤗 Hub](https://huggingface.co/Zyphra/) | [Apache 2.0](https://github.com/Zyphra/Zonos/blob/main/LICENSE) | No | Multilingual | | [🤗 Space](https://huggingface.co/spaces/Steveeeeeeen/Zonos) | |

### Capability specifics

	Click on this to toggle table visibility

| Name | Processor
⚡ | Phonetic alphabet
🔤 | Insta-clone
👥 | Emotional control
🎭 | Prompting
📖 | Speech control
🎚 | Streaming support
🌊 | S2S support
🦜 | Longform synthesis |

|---|---|---|---|---|---|---|---|---| --- |

| Amphion | CUDA |  | 👥 | 🎭👥 | ❌ |  |  |  |  |

| Bark | CUDA |  | ❌ | 🎭 tags | ❌ |  |  |  |  |

| EmotiVoice |  |  |  |  |  |  |  |  |  |

| Fish-speech | CUDA | ❌ | 👥 | 🎭👥 | ❌ | speed / stability
🎚 | 🌊 | 🦜 | Yes |

| Glow-TTS |  |  |  |  |  |  |  |  |  |

| GPT-SoVITS |  |  |  |  |  |  |  |  |  |

| HierSpeech++ |  | ❌ | 👥 | 🎭👥 | ❌ | speed / stability
🎚 |  | 🦜 |  |

| IMS-Toucan | CUDA | ❌ | ❌ | ❌ | ❌ |  |  |  |  |

| Kokoro | CPU / CUDA | 👥 | ❌ | ❌ | ❌ | speed
🎚 | 🌊 | ❌ | |

| Llasa | CUDA | ❌ | 👥 | 🎭 | ❌ | ❌ | ❌ | ❌ | |

| MahaTTS |  |  |  |  |  |  |  |  |  |

| Matcha-TTS |  | IPA | ❌ | ❌ | ❌ | speed / stability
🎚 |  |  |  |

| MetaVoice-1B | CUDA |  | 👥 | 🎭👥 | ❌ | stability / similarity
🎚 |  |  | Yes |

| Neural-HMM TTS |  |  |  |  |  |  |  |  |  |

| OpenVoice | CUDA | ❌ | 👥 | 6-type 🎭
😡😃😭😯🤫😊    | ❌ |  |  |  |  |

| OuteTTS | CPU / CUDA | ❌ | 👥 | ❌ | ❌ | speed
🎚 | 🌊 | ❌ | |

| OverFlow TTS |  |  |  |  |  |  |  |  |  |

| pflowTTS |  |  |  |  |  |  |  |  |  |

| Piper |  |  |  |  |  |  |  |  |  |

| Pheme | CUDA | ❌ | 👥 | 🎭👥 | ❌ | stability
🎚 |  |  |  |

| RAD-TTS |  |  |  |  |  |  |  |  |  |

| Silero |  |  |  |  |  |  |  |  |  |

| StyleTTS 2 | CPU / CUDA | IPA | 👥 | 🎭👥 | ❌ |  | 🌊 |  | Yes |

| Tacotron 2 |  |  |  |  |  |  |  |  |  |

| TorToiSe TTS |  | ❌ | ❌ | ❌ | 📖 |  | 🌊 |  |  |

| TTTS | CPU/CUDA | ❌ | 👥 |  |  |  |  |  |  |

| VALL-E |  |  |  |  |  |  |  |  |  |

| VITS/ MMS-TTS | CUDA | ❌ | ❌ | ❌ | ❌ | speed
🎚 |  |  |  |

| WhisperSpeech | CUDA | ❌ | 👥 | 🎭👥 | ❌ | speed
🎚 |  |  |  |

| XTTS | CUDA | ❌ | 👥 | 🎭👥 | ❌ | speed / stability
🎚 | 🌊 | ❌ |  |

| xVASynth | CPU / CUDA | ARPAbet+ | ❌ | 4-type 🎭
😡😃😭😯
per‑phoneme | ❌ | speed / pitch / energy / 🎭
🎚
per‑phoneme | ❌ | 🦜 |  |

| Zonos | CUDA | eSpeak | 👥 | 🎭 | ❌ | speed / pitch / quality / emotion
🎚 | ❌ | ❌ | |

* Processor - CPU/CUDA/ROCm (single/multi used for inference; Real-time factor should be below 2.0 to qualify for CPU, though some leeway can be given if it supports audio streaming)

* Phonetic alphabet - None/[IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet)/[ARPAbet](https://en.wikipedia.org/wiki/ARPABET) (Phonetic transcription that allows to control pronunciation of certain words during inference)

* Insta-clone - Yes/No (Zero-shot model for quick voice clone)

* Emotional control - Yes🎭/Strict (Strict, as in has no ability to go in-between states, insta-clone switch/🎭👥)

* Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, [ElevenLabs docs](https://elevenlabs.io/docs/speech-synthesis/prompting#emotion))

* Streaming support - Yes/No (If it is possible to playback audio that is still being generated)

* Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)

* Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S; S2T=>T2S does not count)

## How can you help?

Help make this list more complete. Create demos on the Hugging Face Hub and link them here :)

Got any questions? Drop me a DM on Twitter [@reach_vb](https://twitter.com/reach_vb).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Vaibhavs10/open-tts-tracker

Awesome Lists containing this project

README