https://github.com/quangvu3/coqui-xtts

Coqui XTTS model with Vietnamese added
https://github.com/quangvu3/coqui-xtts

ai-model text-to-speech tts vi vietnamese

Last synced: 4 months ago
JSON representation

Coqui XTTS model with Vietnamese added

Host: GitHub
URL: https://github.com/quangvu3/coqui-xtts
Owner: quangvu3
License: mpl-2.0
Created: 2025-02-12T02:41:50.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-24T07:51:00.000Z (7 months ago)
Last Synced: 2025-04-22T07:17:17.652Z (6 months ago)
Topics: ai-model, text-to-speech, tts, vi, vietnamese
Language: Python
Homepage:
Size: 242 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          
## 🐸 Coqui XTTS

This is a fork of [Coqui-TTS](https://github.com/coqui-ai/TTS) to use XTTS model only.



## 

**🐸 XTTS is a library for advanced Text-to-Speech generation.**

🚀 Pretrained model in 17 languages (*Vietnamese newly added*).



______________________________________________________________________

## Installation

🐸XTTS is tested on Ubuntu 18.04 up to 24.04 with **python >= 3.9, < 3.12.**.

Clone 🐸XTTS and install it locally.

```bash

git clone https://github.com/quangvu3/coqui-xtts

pip install -e .

```

## Synthesizing speech by 🐸XTTS

### 🐍 Python

#### Running multi-lingual XTTS model

Synthesize speech with a built-in speaker's voice

```python

import os

import torch

import torchaudio

from huggingface_hub import snapshot_download

from TTS.tts.models.xtts import Xtts

from TTS.tts.configs.xtts_config import XttsConfig

# load configs and model

checkpoint_dir="/path/to/local/checkpoint_dir/"

os.makedirs(checkpoint_dir, exist_ok=True)

repo_id = "jimmyvu/xtts"

snapshot_download(repo_id=repo_id, 

        local_dir=checkpoint_dir, 

        allow_patterns=["*.safetensors", "*.json"])

config = XttsConfig()

config.load_json(os.path.join(checkpoint_dir, "config.json"))

xtts_model = Xtts.init_from_config(config)

xtts_model.load_safetensors_checkpoint(config, checkpoint_dir=checkpoint_dir)

text = "Good morning everyone. I'm an AI model. I can read text and generate speech with a given voice."

language = "en"

# synthesize with speaker id

out = xtts_model.synthesize(text=text, 

			config=xtts_model.config, 

			speaker_wav=None, 

			language=language, 

			speaker_id="Ana Florence")

# save output to wav file

out["wav"] = torch.tensor(out["wav"]).unsqueeze(0)

torchaudio.save("speech.wav", out["wav"], 24000)

```

or use a reference speaker (voice cloning)

```python

# reference speaker setup

speaker_audio_file = "/path/to/sample/audio/sample.wav"

gpt_cond_latent, speaker_embedding = xtts_model.get_conditioning_latents(

	audio_path=speaker_audio_file,

	gpt_cond_len=xtts_model.config.gpt_cond_len,

	max_ref_length=xtts_model.config.max_ref_len,

	sound_norm_refs=xtts_model.config.sound_norm_refs,

)

# inference

out = xtts_model.inference(

	text=text,

	language=language,

	gpt_cond_latent=gpt_cond_latent,

	speaker_embedding=speaker_embedding,

	enable_text_splitting=True,

)

# save output to wav file

out["wav"] = torch.tensor(out["wav"]).unsqueeze(0)

torchaudio.save("speech.wav", out["wav"], 24000)

```

## Directory Structure

```

|- utils/           (common utilities.)

|- TTS

    |- tts/             (text to speech models)

        |- layers/          (model layer definitions)

        |- models/          (model definitions)

        |- utils/           (model specific utilities.)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/quangvu3/coqui-xtts

Awesome Lists containing this project

README