https://github.com/josefalbers/e2tts-mlx

Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX
https://github.com/josefalbers/e2tts-mlx

e2-tts flow-matching mlx multistream text-to-speech transformer tts

Last synced: 3 months ago
JSON representation

Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX

Host: GitHub
URL: https://github.com/josefalbers/e2tts-mlx
Owner: JosefAlbers
License: apache-2.0
Created: 2024-10-06T04:06:15.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-10-15T12:21:33.000Z (about 1 year ago)
Last Synced: 2025-07-01T11:07:58.956Z (4 months ago)
Topics: e2-tts, flow-matching, mlx, multistream, text-to-speech, transformer, tts
Language: Python
Homepage:
Size: 2.87 MB
Stars: 27
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# e2tts-mlx: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS in MLX

A lightweight implementation of [Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS](https://arxiv.org/abs/2406.18009) model using MLX, with minimal dependencies and efficient computation on Apple Silicon.

## Quick Start

### Install

```zsh
# Quick install (note: PyPI version may not always be up to date)
pip install e2tts-mlx

# For the latest version, you can install directly from the repository:
# git clone https://github.com/JosefAlbers/e2tts-mlx.git
# cd e2tts-mlx
# pip install -e .
```

### Usage

To use a pre-trained model for text-to-speech:

```zsh
e2tts 'We must achieve our own salvation.'
```

This will write `tts_0.wav` to the current directory, which you can then play.

https://github.com/user-attachments/assets/89265113-0785-42f1-b867-c38c886acbef

To train a new model with default settings:

```zsh
e2tts
```

![e2tts](https://raw.githubusercontent.com/JosefAlbers/e2tts-mlx/main/assets/e2tts.png)

To train with custom options:

```zsh
e2tts --batch_size=16 --n_epoch=100 --lr=1e-4 --depth=8 --n_ode=32
```

Select training options:

- `--batch_size`: Set the batch size (default: 32)
- `--n_epoch`: Set the number of epochs (default: 10)
- `--lr`: Set the learning rate (default: 2e-4)
- `--depth`: Set the model depth (default: 8)
- `--n_ode`: Set the number of steps for sampling (default: 1)
- `--more_ds` parameter: Implements [two-set training](https://arxiv.org/pdf/2410.07041) (default: 'JosefAlbers/lj-speech')

## Acknowledgements

Special thanks to [lucidrains](https://github.com/lucidrains/e2-tts-pytorch)' fantastic code that inspired this project, and to [lucasnewman](https://github.com/lucasnewman/vocos-mlx)'s the Vocos implementation that made this possible.

## License

Apache License 2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/josefalbers/e2tts-mlx

Awesome Lists containing this project

README