https://github.com/zefir1990/tts2mp3

TTS to mp3
https://github.com/zefir1990/tts2mp3

tts

Last synced: about 2 months ago
JSON representation

TTS to mp3

Host: GitHub
URL: https://github.com/zefir1990/tts2mp3
Owner: zefir1990
License: mit
Created: 2026-03-19T04:48:04.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-19T15:24:37.000Z (4 months ago)
Last Synced: 2026-03-20T07:47:48.801Z (4 months ago)
Topics: tts
Language: Python
Homepage: https://demensdeum.com
Size: 6.84 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# tts2mp3

`tts2mp3` is a Python script that converts text to high-quality speech and saves it as an MP3 file using the **Coqui-TTS** engine. It supports multiple languages (including Russian and English) and voice cloning via XTTS-v2.

## Features

- **High-Quality TTS**: Uses Coqui-TTS's advanced models.
- **Multilingual**: Supports Russian, English, and 15+ other languages.
- **Voice Cloning**: Clone any voice using a 6-second reference audio clip with XTTS-v2.
- **MP3 Output**: Automatically converts synthesized audio to MP3 format.

## Prerequisites

Before running the script, you'll need:

1. **Python >= 3.9, < 3.12** (Important: `coqui-tts` has compatibility issues with Python 3.12+)
2. **FFmpeg**: Required by `pydub` for MP3 conversion.
- [Download FFmpeg](https://ffmpeg.org/download.html)
- Ensure `ffmpeg` is in your system PATH.

## Installation

1. Clone the repository or download the script.
2. Create and activate a virtual environment (recommended):
```powershell
python -m venv tts-env
.\tts-env\Scripts\activate
```
3. Install the dependencies:
```bash
pip install -r requirements.txt
```

## Usage

### 🚀 Basic Usage (English)
```bash
python tts2mp3.py --text "Hello world" --output hello.mp3
```

### 🇷🇺 Russian (Standard Voice)
Uses the VITS model which does **not** require a reference voice.
```bash
python tts2mp3.py --text "Привет, это тест." --model "tts_models/ru/multi-dataset/vits" --language "ru" --output privet.mp3
```

### 👤 High-Quality Voice Cloning (XTTS-v2)
Requires a 6-second reference WAV file of the target voice.
```bash
python tts2mp3.py --text "Привет, я говорю вашим голосом." --model "tts_models/multilingual/multi-dataset/xtts_v2" --language "ru" --speaker_wav "reference.wav" --output clone.mp3
```

## All Arguments

- `--text`: Direct text to convert.
- `--file`: Path to a text file to convert.
- `--output`: Output MP3 path (default: `output.mp3`).
- `--model`: Coqui-TTS model name (default: `tts_models/en/ljspeech/glow-tts`).
- `--language`: Language code (e.g., `en`, `ru`) for multilingual models.
- `--speaker_wav`: Reference WAV for cloning (XTTS).
- `--speaker`: Speaker name for multi-speaker models.
- `--gpu`: Use GPU for faster synthesis if available.

## Troubleshooting

### 1. `ImportError: cannot import name 'BeamSearchScorer'`
This is caused by incompatible `transformers` versions. Ensure you are using `transformers==4.33.0` as specified in `requirements.txt`.

### 2. `WeightsUnpickler` Error (PyTorch 2.6+)
The script includes a monkey-patch to fix this security-related conflict in newer PyTorch versions. If it persists, try re-running the script.

### 3. "Kernel size can't be greater than actual input size"
- Ensure you are using the correct model for the language (e.g., don't use the English model for Russian text).
- Avoid very short text strings; adding a period or extra word can help.

---
*Created as part of an agentic coding task.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zefir1990/tts2mp3

Awesome Lists containing this project

README