Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rendchevi/nix-tts

🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation
https://github.com/rendchevi/nix-tts

onnx pytorch speech-synthesis text-to-speech tts

Last synced: about 2 months ago
JSON representation

🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation

Host: GitHub
URL: https://github.com/rendchevi/nix-tts
Owner: rendchevi
License: mit
Created: 2022-03-20T02:26:54.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-09-13T14:34:47.000Z (over 1 year ago)
Last Synced: 2024-08-10T14:15:04.979Z (5 months ago)
Topics: onnx, pytorch, speech-synthesis, text-to-speech, tts
Language: Python
Homepage:
Size: 17.6 KB
Stars: 231
Watchers: 9
Forks: 31
Open Issues: 14
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # **🐤 Nix-TTS**

### **Lightweight and End-to-end Text-to-Speech via Module-wise Distillation**

#### Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

This is a repository for our paper, **🐤 Nix-TTS** (Accepted to IEEE SLT 2022). We released the pretrained models, an interactive demo, and audio samples below.

[[📄 Paper Link](Coming Soon!)] [[🤗 Interactive Demo](https://huggingface.co/spaces/rendchevi/nix-tts)] [[📢 Audio Samples](https://anon1178.github.io/Nix-SLT-Demo/)]

**Abstract**    Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specifically, we offer module-wise distillation, enabling flexible and independent distillation to the encoder and decoder module. The resulting Nix-TTS inherited the advantageous properties of being non-autoregressive and end-to-end from the teacher, yet significantly smaller in size, with only 5.23M parameters or up to 89.34\% reduction of the teacher model; it also achieves over 3.04$\times$ and 8.36$\times$ inference speedup on Intel-i7 CPU and Raspberry Pi 3B respectively and still retains a fair voice naturalness and intelligibility compared to the teacher model.

## **Getting Started with Nix-TTS**

**Clone the `nix-tts` repository and move to its directory**

```bash

git clone https://github.com/rendchevi/nix-tts.git

cd nix-tts

```

**Install the dependencies**

- Install Python dependencies. We recommend `python >= 3.8`

```bash

pip install -r requirements.txt 

```

- Install espeak in your device (for text tokenization).

```bash

sudo apt-get install espeak

```

Or follow the [official instruction](https://github.com/bootphon/phonemizer#dependencies) in case it didn't work.

**Download your chosen pre-trained model [here](https://drive.google.com/drive/folders/1GbFOnJsgKHCAXySm2sTluRRikc4TAWxJ?usp=sharing)**. 

| Model      | Num. of Params | Faster than real-time^* (CPU Intel-i7) | Faster than real-time^* (RasPi Model 3B) |

| ----------  | -------------- | ----| ----|

| Nix-TTS (ONNX)     | 5.23 M | 11.9x | 0.50x |

| Nix-TTS w/ Stochastic Duration (ONNX) | 6.03 M | 10.8x | 0.50x |

**^*** Here we compute how much the model run faster than real-time as the inverse of Real Time Factor (RTF). The complete table of all models speedup is detailed on the paper.

**And running Nix-TTS is as easy as:**

```py

from nix.models.TTS import NixTTSInference

from IPython.display import Audio

# Initiate Nix-TTS

nix = NixTTSInference(model_dir = "")

# Tokenize input text

c, c_length, phoneme = nix.tokenize("Born to multiply, born to gaze into night skies.")

# Convert text to raw speech

xw = nix.vocalize(c, c_length)

# Listen to the generated speech

Audio(xw[0,0], rate = 22050)

```

## **Acknowledgement**

- This research is fully and exclusively funded by [Kata.ai](https://kata.ai), where the authors work as part of the [Kata.ai Research Team](https://kata.ai/research).

- Some of the complex parts of our model, as mentioned in the paper, are adapted from the original implementation of [VITS](https://github.com/jaywalnut310/vits) and [Comprehensive-Transformer-TTS](https://github.com/keonlee9420/Comprehensive-Transformer-TTS).