https://github.com/egorsmkv/tts_uk
High-fidelity speech synthesis for Ukrainian using modern neural networks.
https://github.com/egorsmkv/tts_uk
audio sound speech-uk synthesis text-to-speech tts ukrainian vocos wav wave
Last synced: 7 months ago
JSON representation
High-fidelity speech synthesis for Ukrainian using modern neural networks.
- Host: GitHub
- URL: https://github.com/egorsmkv/tts_uk
- Owner: egorsmkv
- License: mit
- Created: 2025-03-02T15:51:24.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-04-28T09:10:37.000Z (7 months ago)
- Last Synced: 2025-04-28T10:29:22.791Z (7 months ago)
- Topics: audio, sound, speech-uk, synthesis, text-to-speech, tts, ukrainian, vocos, wav, wave
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/spaces/Yehor/radtts-uk-vocos-demo
- Size: 1.76 MB
- Stars: 8
- Watchers: 1
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Text-to-Speech for Ukrainian
[](https://pypi.org/project/tts_uk/)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/projects/tts_uk)
[](https://doi.org/10.5281/zenodo.14966501)
[](https://app.fossa.com/projects/git%2Bgithub.com%2Fegorsmkv%2Ftts_uk?ref=badge_small)
High-fidelity speech synthesis for Ukrainian using modern neural networks.
## Statuses
[](https://github.com/egorsmkv/tts_uk/actions/workflows/ci.yml)
[](https://github.com/egorsmkv/tts_uk/actions/workflows/dependabot/dependabot-updates)
[](https://github.com/egorsmkv/tts_uk/actions/workflows/snyk-python.yml)
## Demo
[](https://huggingface.co/spaces/Yehor/radtts-uk-vocos-demo)
[](https://colab.research.google.com/drive/1sdCPnZJRNAf12PhPut4gu6T_o6lYaUdo?usp=sharing)
Check out our demo on [Hugging Face space](https://huggingface.co/spaces/Yehor/radtts-uk-vocos-demo) or just [listen to samples here](https://huggingface.co/spaces/speech-uk/listen-tts-voices).
## Features
- Multi-speaker model: 2 **female** (Tetiana, Lada) + 1 **male** (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the [RAD-TTS++](https://github.com/egorsmkv/radtts-uk) acoustic model;
- Fast vocoding using [Vocos](https://github.com/gemelo-ai/vocos);
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and **Windows**/**WSL**;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
## Installation
```shell
# Install from PyPI
pip install tts-uk
# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk
# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment
```
Read [uv's installation](https://github.com/astral-sh/uv?tab=readme-ov-file#installation) section.
Also, you can [download the repository](https://github.com/egorsmkv/tts_uk/archive/refs/heads/main.zip) as a ZIP archive.
## Getting started
Code example:
```python
import torchaudio
from tts_uk.inference import synthesis
sampling_rate = 44_100
# Perform the synthesis, `synthesis` function returns:
# - mels: Mel spectrograms of the generated audio.
# - wave: The synthesized waveform by a Vocoder as a PyTorch tensor.
# - stats: A dictionary containing synthesis statistics (processing time, duration, speech rate, etc).
mels, wave, stats = synthesis(
text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
voice="tetiana", # tetiana, mykyta, lada
n_takes=1,
use_latest_take=False,
token_dur_scaling=1,
f0_mean=0,
f0_std=0,
energy_mean=0,
energy_std=0,
sigma_decoder=0.8,
sigma_token_duration=0.666,
sigma_f0=1,
sigma_energy=1,
)
print(stats)
# Save the generated audio to a WAV file.
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")
```
Use these Google colabs:
- [CPU inference](https://colab.research.google.com/drive/1dsQiVhTaNw5lRfUiCZeECMuEbtEEYqbZ?usp=sharing)
- [GPU inference](https://colab.research.google.com/drive/1sdCPnZJRNAf12PhPut4gu6T_o6lYaUdo?usp=sharing) on T4 card (long document to synthesize)
Or run synthesis in a terminal:
```shell
uv run example.py
```
If you need to synthesize articles we recommend consider [wtpsplit](https://github.com/segment-any-text/wtpsplit).
## Get help and support
Please feel free to connect with us using [the Issues section](https://github.com/egorsmkv/tts_uk/issues).
## License
Code has the MIT license.
## Model authors
### Acoustic
- [Yehor Smoliakov](https://github.com/egorsmkv), [HF profile](https://huggingface.co/Yehor)
### Vocoder
- [Serhiy Stetskovych](https://github.com/patriotyk), [HF profile](https://huggingface.co/patriotyk)
## Community
[](https://bit.ly/discord-uds)
- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
Also, follow [our Speech-UK initiative](https://huggingface.co/speech-uk) on Hugging Face!
## Acknowledgements
- [RAD-TTS by NVIDIA](https://github.com/NVIDIA/radtts)
- [Vocos fork by langtech-bsc](https://github.com/langtech-bsc/vocos/tree/matcha)