https://github.com/lucasnewman/f5-tts-swift

Implementation of F5-TTS in Swift using MLX
https://github.com/lucasnewman/f5-tts-swift

diffusion-transformer flow-matching mlx mlx-swift swift text-to-speech tts

Last synced: 3 months ago
JSON representation

Implementation of F5-TTS in Swift using MLX

Host: GitHub
URL: https://github.com/lucasnewman/f5-tts-swift
Owner: lucasnewman
License: mit
Created: 2024-10-19T17:40:14.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-12-11T23:47:58.000Z (7 months ago)
Last Synced: 2025-03-24T11:43:40.762Z (3 months ago)
Topics: diffusion-transformer, flow-matching, mlx, mlx-swift, swift, text-to-speech, tts
Language: Swift
Homepage:
Size: 245 KB
Stars: 59
Watchers: 6
Forks: 10
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
# F5 TTS for Swift

Implementation of [F5-TTS](https://arxiv.org/abs/2410.06885) in Swift, using the [MLX Swift](https://github.com/ml-explore/mlx-swift) framework.

You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.

See the [Python repository](https://github.com/lucasnewman/f5-tts-mlx) for additional details on the model architecture.

This repository is based on the original Pytorch implementation available [here](https://github.com/SWivid/F5-TTS).

## Installation

The `F5TTS` Swift package can be built and run from Xcode or SwiftPM.

A pretrained model is available [on Huggingface](https://hf.co/lucasnewman/f5-tts-mlx).

## Usage

```swift

import F5TTS

let f5tts = try await F5TTS.fromPretrained(repoId: "lucasnewman/f5-tts-mlx")

let generatedAudio = try await f5tts.generate(text: "The quick brown fox jumped over the lazy dog.")

```

The result is an MLXArray with 24kHz audio samples.

If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

```swift

let generatedAudio = try await f5tts.generate(

    text: "The quick brown fox jumped over the lazy dog.",

    referenceAudioURL: ...,

    referenceAudioText: "This is the caption for the reference audio."

)

```

You can convert an audio file to the correct format with ffmpeg like this:

```bash

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

```

## Appreciation

[Yushen Chen](https://github.com/SWivid) for the original Pytorch implementation of F5 TTS and pretrained model.

[Phil Wang](https://github.com/lucidrains) for the E2 TTS implementation that this model is based on.

## Citations

```bibtex

@article{chen-etal-2024-f5tts,

      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 

      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},

      journal={arXiv preprint arXiv:2410.06885},

      year={2024},

}

```

```bibtex

@inproceedings{Eskimez2024E2TE,

    title   = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},

    author  = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},

    year    = {2024},

    url     = {https://api.semanticscholar.org/CorpusID:270738197}

}

```

## License

The code in this repository is released under the MIT license as found in the

[LICENSE](LICENSE) file.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucasnewman/f5-tts-swift

Awesome Lists containing this project

README