https://github.com/veralvx/xtts-finetune

XTTS fine-tuning via CLI
https://github.com/veralvx/xtts-finetune

ai ai-training audio audio-processing coqui coqui-tts docker dockerfile fine-tuning finetuning python python3 text-to-speech training tts tts-model uv xtts xtts-v2 xttsv2

Last synced: about 1 month ago
JSON representation

XTTS fine-tuning via CLI

Host: GitHub
URL: https://github.com/veralvx/xtts-finetune
Owner: veralvx
License: mpl-2.0
Created: 2025-10-13T03:11:38.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-10-16T02:45:51.000Z (8 months ago)
Last Synced: 2025-10-24T05:55:21.814Z (8 months ago)
Topics: ai, ai-training, audio, audio-processing, coqui, coqui-tts, docker, dockerfile, fine-tuning, finetuning, python, python3, text-to-speech, training, tts, tts-model, uv, xtts, xtts-v2, xttsv2
Language: Python
Homepage:
Size: 253 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# XTTS-FINETUNE 🚂

First, clone this repository and build the image:

```console
git clone https://github.com/veralvx/xtts-finetune xtts-finetune \
&& cd xtts-finetune \
&& podman build -f Dockerfile -t xtts-finetune
```

Start the container:

```console
podman run -it --rm --gpus=all -v ./dataset:/xtts/dataset -v ./run:/xtts/run xtts-finetune
```

If you want to use CPU only, omit `--gpus=all`.

Before fine-tuning, your directory layout should look like this:

```console
.
├── .venv
├── convert_audio.py
├── dataset
│ ├── metadata.csv
│ └── wavs
│ ├── 01.wav
│ ├── 02.wav
│ ├── reference.wav
│ └─ ...
├── Dockerfile
├── main.py
├── pyproject.toml
├── requirements.txt
├── finetune.py
├── transcribe.py
├── uv.lock
└── validate_audio.py
```

Notice:
- `.wav` files under `dataset/wavs`, with one file called `reference.wav` (~ 5s duration);
- `metadata.csv` under `dataset`

Audio files must use mono channel and 22050hz:

```
uv run main.py --validate dataset/wavs
```

```
uv run main.py --convert dataset/wavs
```

Or, using `ffmpeg`:

```
ffmpeg -i input.wav -ac 1 -ar 22050 output.wav
```

The `metadata.csv` can be obtained with:

```console
uv run main.py --transcribe ./dataset/wavs --lang en --model medium --device cuda
```

The metadata output will be under `dataset/wavs`, and it should be moved to `dataset/metadata.csv`.

Then:

```console
uv run main.py --lang en
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/veralvx/xtts-finetune

Awesome Lists containing this project

README