Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucasnewman/vocos-mlx

Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX
https://github.com/lucasnewman/vocos-mlx

mlx text-to-speech tts

Last synced: 3 months ago
JSON representation

Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX

Host: GitHub
URL: https://github.com/lucasnewman/vocos-mlx
Owner: lucasnewman
License: mit
Created: 2024-09-27T19:51:06.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-10-30T22:14:39.000Z (3 months ago)
Last Synced: 2024-10-30T22:17:54.557Z (3 months ago)
Topics: mlx, text-to-speech, tts
Language: Python
Homepage:
Size: 54.7 KB
Stars: 12
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Vocos — MLX 

Implementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.

### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)

## Installation

To use Vocos in inference mode, install it using:

```bash

pip install vocos-mlx

```

## Usage

### Mel Spectrogram

```python

from vocos_mlx import Vocos, load_audio, log_mel_spectrogram

vocos = Vocos.from_pretrained("lucasnewman/vocos-mel-24khz")

# reconstruct

audio = load_audio("audio.wav", 24_000)

reconstructed_audio = vocos(audio)

# decode from mel spec

mel_spec = log_mel_spectrogram(audio, n_mels = 100)

decoded_audio = vocos.decode(mel_spec)

```

### EnCodec

```python

from vocos_mlx import Vocos, load_audio

vocos = Vocos.from_pretrained("lucasnewman/vocos-encodec-24khz")

# reconstruct

audio = load_audio("audio.wav", 24_000)

reconstructed_audio = vocos(audio, bandwidth_id = 3)

# decode with encodec codes

codes = vocos.get_encodec_codes(audio, bandwidth_id = 3)

decoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)

```

## Appreciation

[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.

## Citations

```

@article{siuzdak2023vocos,

  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},

  author={Siuzdak, Hubert},

  journal={arXiv preprint arXiv:2306.00814},

  year={2023}

}

```

## License

The code in this repository is released under the MIT license as found in the

[LICENSE](LICENSE) file.