Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Vaibhavs10/open-tts-tracker


https://github.com/Vaibhavs10/open-tts-tracker

Last synced: 24 days ago
JSON representation

Awesome Lists containing this project

README

        

# πŸ—£οΈ Open TTS Tracker

A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.

This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.

> [!NOTE]
> This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! πŸ€—

| Name | GitHub | Weights | License | Fine-tune | Languages | Paper | Demo | Issues |
|---|---|---|---|---|---|---|---|---|
| Amphion | [Repo](https://github.com/open-mmlab/Amphion) | [πŸ€— Hub](https://huggingface.co/amphion) | [MIT](https://github.com/open-mmlab/Amphion/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2312.09911) | [πŸ€— Space](https://huggingface.co/amphion) | |
| AI4Bharat | [Repo](https://github.com/AI4Bharat/Indic-TTS) | [πŸ€— Hub](https://huggingface.co/ai4bharat) | [MIT](https://github.com/AI4Bharat/Indic-TTS/blob/master/LICENSE.txt) | [Yes](https://github.com/AI4Bharat/Indic-TTS?tab=readme-ov-file#training-steps) | Indic | [Paper](https://arxiv.org/abs/2211.09536) | [Demo](https://models.ai4bharat.org/#/tts) |
| Bark | [Repo](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bark) | [πŸ€— Hub](https://huggingface.co/suno/bark) | [MIT](https://github.com/suno-ai/bark/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2209.03143) | [πŸ€— Space](https://huggingface.co/spaces/suno/bark) | |
| EmotiVoice | [Repo](https://github.com/netease-youdao/EmotiVoice) | [GDrive](https://drive.google.com/drive/folders/1y6Xwj_GG9ulsAonca_unSGbJ4lxbNymM) | [Apache 2.0](https://github.com/netease-youdao/EmotiVoice/blob/main/LICENSE) | [Yes](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data) | ZH + EN | Not Available | Not Available | Separate [GUI agreement](https://github.com/netease-youdao/EmotiVoice/blob/main/EmotiVoice_UserAgreement_%E6%98%93%E9%AD%94%E5%A3%B0%E7%94%A8%E6%88%B7%E5%8D%8F%E8%AE%AE.pdf) |
| Glow-TTS | [Repo](https://github.com/jaywalnut310/glow-tts) | [GDrive](https://drive.google.com/file/d/1JiCMBVTG4BMREK8cT3MYck1MgYvwASL0/view) | [MIT](https://github.com/jaywalnut310/glow-tts/blob/master/LICENSE) | [Yes](https://github.com/jaywalnut310/glow-tts?tab=readme-ov-file#2-pre-requisites) | English | [Paper](https://arxiv.org/abs/2005.11129) | [GH Pages](https://jaywalnut310.github.io/glow-tts-demo/index.html) | |
| GPT-SoVITS | [Repo](https://github.com/RVC-Boss/GPT-SoVITS) | [πŸ€— Hub](https://huggingface.co/lj1995/GPT-SoVITS) | [MIT](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE) | [Yes](https://github.com/RVC-Boss/GPT-SoVITS?tab=readme-ov-file#pretrained-models) | Multilingual | Not Available | Not Available | |
| HierSpeech++ | [Repo](https://github.com/sh-lee-prml/HierSpeechpp) | [GDrive](https://drive.google.com/drive/folders/1-L_90BlCkbPyKWWHTUjt5Fsu3kz0du0w) | [MIT](https://github.com/sh-lee-prml/HierSpeechpp/blob/main/LICENSE) | No | KR + EN | [Paper](https://arxiv.org/abs/2311.12454) | [πŸ€— Space](https://huggingface.co/spaces/LeeSangHoon/HierSpeech_TTS) | |
| IMS-Toucan | [Repo](https://github.com/DigitalPhonetics/IMS-Toucan) | [GH release](https://github.com/DigitalPhonetics/IMS-Toucan/tags) | [Apache 2.0](https://github.com/DigitalPhonetics/IMS-Toucan/blob/ToucanTTS/LICENSE) | [Yes](https://github.com/DigitalPhonetics/IMS-Toucan#build-a-toucantts-pipeline) | Multilingual | [Paper](https://arxiv.org/abs/2206.12229) | [πŸ€— Space](https://huggingface.co/spaces/Flux9665/IMS-Toucan) | |
| MahaTTS | [Repo](https://github.com/dubverse-ai/MahaTTS) | [πŸ€— Hub](https://huggingface.co/Dubverse/MahaTTS) | [Apache 2.0](https://github.com/dubverse-ai/MahaTTS/blob/main/LICENSE) | No | English + Indic | Not Available | [Recordings](https://github.com/dubverse-ai/MahaTTS/blob/main/README.md#sample-outputs), [Colab](https://colab.research.google.com/drive/1qkZz2km-PX75P0f6mUb2y5e-uzub27NW?usp=sharing) | |
| Matcha-TTS | [Repo](https://github.com/shivammehta25/Matcha-TTS) | [GDrive](https://drive.google.com/drive/folders/17C_gYgEHOxI5ZypcfE_k1piKCtyR0isJ) | [MIT](https://github.com/shivammehta25/Matcha-TTS/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Matcha-TTS/tree/main#train-with-your-own-dataset) | English | [Paper](https://arxiv.org/abs/2309.03199) | [πŸ€— Space](https://huggingface.co/spaces/shivammehta25/Matcha-TTS) | GPL-licensed phonemizer |
| MetaVoice-1B | [Repo](https://github.com/metavoiceio/metavoice-src) | [πŸ€— Hub](https://huggingface.co/metavoiceio/metavoice-1B-v0.1/tree/main) | [Apache 2.0](https://github.com/metavoiceio/metavoice-src/blob/main/LICENSE) | [Yes](https://github.com/metavoiceio/metavoice-src?tab=readme-ov-file) | Multilingual | Not Available | [πŸ€— Space](https://ttsdemo.themetavoice.xyz/) | |
| Neural-HMM TTS | [Repo](https://github.com/shivammehta25/Neural-HMM) | [GitHub](https://github.com/shivammehta25/Neural-HMM/releases) | [MIT](https://github.com/shivammehta25/Neural-HMM/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Neural-HMM?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2108.13320) | [GH Pages](https://shivammehta25.github.io/Neural-HMM/) | |
| OpenVoice | [Repo](https://github.com/myshell-ai/OpenVoice) | [πŸ€— Hub](https://huggingface.co/myshell-ai/OpenVoice) | [CC-BY-NC 4.0](https://github.com/myshell-ai/OpenVoice/blob/main/LICENSE) | No | ZH + EN | [Paper](https://arxiv.org/abs/2312.01479) | [πŸ€— Space](https://huggingface.co/spaces/myshell-ai/OpenVoice) | Non Commercial |
| OverFlow TTS | [Repo](https://github.com/shivammehta25/OverFlow) | [GitHub](https://github.com/shivammehta25/OverFlow/releases) | [MIT](https://github.com/shivammehta25/OverFlow/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/OverFlow/tree/main?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2211.06892) | [GH Pages](https://shivammehta25.github.io/OverFlow/) | |
| Parler TTS | [Repo](https://github.com/huggingface/parler-tts) | [πŸ€— Hub](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) | [Apache 2.0](https://github.com/huggingface/parler-tts/blob/main/LICENSE) | [Yes](https://github.com/huggingface/parler-tts/tree/main/training) | English | Not Available | Not Available | |
| pflowTTS | [Unofficial Repo](https://github.com/p0p4k/pflowtts_pytorch) | [GDrive](https://drive.google.com/drive/folders/1x-A2Ezmmiz01YqittO_GLYhngJXazaF0) | [MIT](https://github.com/p0p4k/pflowtts_pytorch/blob/master/LICENSE) | [Yes](https://github.com/p0p4k/pflowtts_pytorch#instructions-to-run) | English | [Paper](https://openreview.net/pdf?id=zNA7u7wtIN) | Not Available | GPL-licensed phonemizer |
| Piper | [Repo](https://github.com/rhasspy/piper) | [πŸ€— Hub](https://huggingface.co/datasets/rhasspy/piper-checkpoints/) | [MIT](https://github.com/rhasspy/piper/blob/master/LICENSE.md) | [Yes](https://github.com/rhasspy/piper/blob/master/TRAINING.md) | Multilingual | Not Available | Not Available | [GPL-licensed phonemizer](https://github.com/rhasspy/piper/issues/93) |
| Pheme | [Repo](https://github.com/PolyAI-LDN/pheme) | [πŸ€— Hub](https://huggingface.co/PolyAI/pheme) | [CC-BY](https://github.com/PolyAI-LDN/pheme/blob/main/LICENSE) | [Yes](https://github.com/PolyAI-LDN/pheme#training) | English | [Paper](https://arxiv.org/abs/2401.02839) | [πŸ€— Space](https://huggingface.co/spaces/PolyAI/pheme) | |
| RAD-MMM | [Repo](https://github.com/NVIDIA/RAD-MMM) | [GDrive](https://drive.google.com/file/d/1p8SEVHRlyLQpQnVP2Dc66RlqJVVRDCsJ/view) | [MIT](https://github.com/NVIDIA/RAD-MMM/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/RAD-MMM?tab=readme-ov-file#training) | Multilingual | [Paper](https://arxiv.org/pdf/2301.10335.pdf) | [Jupyter Notebook](https://github.com/NVIDIA/RAD-MMM/blob/main/inference.ipynb), [Webpage](https://research.nvidia.com/labs/adlr/projects/radmmm/) | |
| RAD-TTS | [Repo](https://github.com/NVIDIA/radtts) | [GDrive](https://drive.google.com/file/d/1Rb2VMUwQahGrnpFSlAhCPh7OpDN3xgOr/view?usp=sharing) | [MIT](https://github.com/NVIDIA/radtts/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/radtts#training-radtts-without-pitch-and-energy-conditioning) | English | [Paper](https://openreview.net/pdf?id=0NQwnnwAORi) | [GH Pages](https://nv-adlr.github.io/RADTTS) | |
| Silero | [Repo](https://github.com/snakers4/silero-models) | [GH links](https://github.com/snakers4/silero-models/blob/master/models.yml) | [CC BY-NC-SA](https://github.com/snakers4/silero-models/blob/master/LICENSE) | [No](https://github.com/snakers4/silero-models/discussions/78) | EM + DE + ES + EA | Not Available | Not Available | [Non Commercial](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) |
| StyleTTS 2 | [Repo](https://github.com/yl4579/StyleTTS2) | [πŸ€— Hub](https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main) | [MIT](https://github.com/yl4579/StyleTTS2/blob/main/LICENSE) | [Yes](https://github.com/yl4579/StyleTTS2#finetuning) | English | [Paper](https://arxiv.org/abs/2306.07691) | [πŸ€— Space](https://huggingface.co/spaces/styletts2/styletts2) | GPL-licensed phonemizer |
| Tacotron 2 | [Unofficial Repo](https://github.com/NVIDIA/tacotron2) | [GDrive](https://drive.google.com/file/d/1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA/view) | [BSD-3](https://github.com/NVIDIA/tacotron2/blob/master/LICENSE) | [Yes](https://github.com/NVIDIA/tacotron2/tree/master?tab=readme-ov-file#training) | English | [Paper](https://arxiv.org/abs/1712.05884) | [Webpage](https://google.github.io/tacotron/publications/tacotron2/) | |
| TorToiSe TTS | [Repo](https://github.com/neonbjb/tortoise-tts) | [πŸ€— Hub](https://huggingface.co/jbetker/tortoise-tts-v2) | [Apache 2.0](https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE) | [Yes](https://git.ecker.tech/mrq/tortoise-tts) | English | [Technical report](https://arxiv.org/abs/2305.07243) | [πŸ€— Space](https://huggingface.co/spaces/Manmay/tortoise-tts) | |
| TTTS | [Repo](https://github.com/adelacvg/ttts) | [πŸ€— Hub](https://huggingface.co/adelacvg/TTTS) | [MPL 2.0](https://github.com/adelacvg/ttts/blob/master/LICENSE) | No | ZH | Not Available | [Colab](https://colab.research.google.com/github/adelacvg/ttts/blob/master/demo.ipynb), [πŸ€— Space](https://huggingface.co/spaces/mrfakename/TTTS) | |
| VALL-E | [Unofficial Repo](https://github.com/enhuiz/vall-e) | Not Available | [MIT](https://github.com/enhuiz/vall-e/blob/main/LICENSE) | [Yes](https://github.com/enhuiz/vall-e#get-started) | NA | [Paper](https://arxiv.org/abs/2301.02111) | Not Available | |
| VITS/ MMS-TTS | [Repo](https://github.com/huggingface/transformers/tree/7142bdfa90a3526cfbed7483ede3afbef7b63939/src/transformers/models/vits) | [πŸ€— Hub](https://huggingface.co/kakao-enterprise) / [MMS](https://huggingface.co/models?search=mms-tts) | [Apache 2.0](https://github.com/huggingface/transformers/blob/main/LICENSE) | [Yes](https://github.com/ylacombe/finetune-hf-vits) | English | [Paper](https://arxiv.org/abs/2106.06103) | [πŸ€— Space](https://huggingface.co/spaces/kakao-enterprise/vits) | GPL-licensed phonemizer |
| WhisperSpeech | [Repo](https://github.com/collabora/WhisperSpeech) | [πŸ€— Hub](https://huggingface.co/collabora/whisperspeech) | [MIT](https://github.com/collabora/WhisperSpeech/blob/main/LICENSE) | No | English, Polish | Not Available | [πŸ€— Space](https://huggingface.co/spaces/collabora/WhisperSpeech), [Recordings](https://github.com/collabora/WhisperSpeech/blob/main/README.md), [Colab](https://colab.research.google.com/github/collabora/WhisperSpeech/blob/8168a30f26627fcd15076d10c85d9e33c52204cf/Inference%20example.ipynb) | |
| XTTS | [Repo](https://github.com/coqui-ai/TTS) | [πŸ€— Hub](https://huggingface.co/coqui/XTTS-v2) | [CPML](https://coqui.ai/cpml) | [Yes](https://docs.coqui.ai/en/latest/models/xtts.html#training) | Multilingual | [Paper](https://arxiv.org/abs/2406.04904) | [πŸ€— Space](https://huggingface.co/spaces/coqui/xtts) | Non Commercial |
| xVASynth | [Repo](https://github.com/DanRuta/xVA-Synth) | [πŸ€— Hub](https://huggingface.co/Pendrokar/xvapitch_nvidia) | [GPL-3.0](https://github.com/DanRuta/xVA-Synth/blob/master/LICENSE.md) | [Yes](https://github.com/DanRuta/xva-trainer) | Multilingual | [Paper](https://arxiv.org/abs/2009.14153) | [πŸ€— Space](https://huggingface.co/spaces/Pendrokar/xVASynth) | Copyrighted materials used for training. |

### Capability specifics

Click on this to toggle table visibility

| Name | Processor
⚑ | Phonetic alphabet
πŸ”€ | Insta-clone
πŸ‘₯ | Emotional control
🎭 | Prompting
πŸ“– | Speech control
🎚 | Streaming support
🌊 | S2S support
🦜 | Longform synthesis |
|---|---|---|---|---|---|---|---|---| --- |
| Amphion | CUDA | | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | | | | |
| Bark | CUDA | | ❌ | 🎭 tags | ❌ | | | | |
| EmotiVoice | | | | | | | | | |
| Glow-TTS | | | | | | | | | |
| GPT-SoVITS | | | | | | | | | |
| HierSpeech++ | | ❌ | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | speed / stability
🎚 | | 🦜 | |
| IMS-Toucan | CUDA | ❌ | ❌ | ❌ | ❌ | | | | |
| MahaTTS | | | | | | | | | |
| Matcha-TTS | | IPA | ❌ | ❌ | ❌ | speed / stability
🎚 | | | |
| MetaVoice-1B | CUDA | | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | stability / similarity
🎚 | | | Yes |
| Neural-HMM TTS | | | | | | | | | |
| OpenVoice | CUDA | ❌ | πŸ‘₯ | 6-type 🎭
πŸ˜‘πŸ˜ƒπŸ˜­πŸ˜―πŸ€«πŸ˜Š | ❌ | | | | |
| OverFlow TTS | | | | | | | | | |
| pflowTTS | | | | | | | | | |
| Piper | | | | | | | | | |
| Pheme | CUDA | ❌ | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | stability
🎚 | | | |
| RAD-TTS | | | | | | | | | |
| Silero | | | | | | | | | |
| StyleTTS 2 | CPU / CUDA | IPA | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | | 🌊 | | Yes |
| Tacotron 2 | | | | | | | | | |
| TorToiSe TTS | | ❌ | ❌ | ❌ | πŸ“– | | 🌊 | | |
| TTTS | CPU/CUDA | ❌ | πŸ‘₯ | | | | | | |
| VALL-E | | | | | | | | | |
| VITS/ MMS-TTS | CUDA | ❌ | ❌ | ❌ | ❌ | speed
🎚 | | | |
| WhisperSpeech | CUDA | ❌ | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | speed
🎚 | | | |
| XTTS | CUDA | ❌ | πŸ‘₯ | 🎭πŸ‘₯ | ❌ | speed / stability
🎚 | 🌊 | ❌ | |
| xVASynth | CPU / CUDA | ARPAbet+ | ❌ | 4-type 🎭
πŸ˜‘πŸ˜ƒπŸ˜­πŸ˜―
per‑phoneme | ❌ | speed / pitch / energy / 🎭
🎚
per‑phoneme | ❌ | 🦜 | |

* Processor - CPU/CUDA/ROCm (single/multi used for inference; Real-time factor should be below 2.0 to qualify for CPU, though some leeway can be given if it supports audio streaming)
* Phonetic alphabet - None/[IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet)/[ARPAbet](https://en.wikipedia.org/wiki/ARPABET) (Phonetic transcription that allows to control pronunciation of certain words during inference)
* Insta-clone - Yes/No (Zero-shot model for quick voice clone)
* Emotional control - Yes🎭/Strict (Strict, as in has no ability to go in-between states, insta-clone switch/🎭πŸ‘₯)
* Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, [ElevenLabs docs](https://elevenlabs.io/docs/speech-synthesis/prompting#emotion))
* Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
* Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)
* Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S; S2T=>T2S does not count)

## How can you help?

Help make this list more complete. Create demos on the Hugging Face Hub and link them here :)
Got any questions? Drop me a DM on Twitter [@reach_vb](https://twitter.com/reach_vb).