Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucasnewman/vocos-mlx
Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX
https://github.com/lucasnewman/vocos-mlx
mlx text-to-speech tts
Last synced: 3 months ago
JSON representation
Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX
- Host: GitHub
- URL: https://github.com/lucasnewman/vocos-mlx
- Owner: lucasnewman
- License: mit
- Created: 2024-09-27T19:51:06.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-30T22:14:39.000Z (3 months ago)
- Last Synced: 2024-10-30T22:17:54.557Z (3 months ago)
- Topics: mlx, text-to-speech, tts
- Language: Python
- Homepage:
- Size: 54.7 KB
- Stars: 12
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Vocos — MLX
Implementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.
### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)## Installation
To use Vocos in inference mode, install it using:
```bash
pip install vocos-mlx
```## Usage
### Mel Spectrogram
```python
from vocos_mlx import Vocos, load_audio, log_mel_spectrogramvocos = Vocos.from_pretrained("lucasnewman/vocos-mel-24khz")
# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio)# decode from mel spec
mel_spec = log_mel_spectrogram(audio, n_mels = 100)
decoded_audio = vocos.decode(mel_spec)
```### EnCodec
```python
from vocos_mlx import Vocos, load_audiovocos = Vocos.from_pretrained("lucasnewman/vocos-encodec-24khz")
# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio, bandwidth_id = 3)# decode with encodec codes
codes = vocos.get_encodec_codes(audio, bandwidth_id = 3)
decoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)
```## Appreciation
[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.
## Citations
```
@article{siuzdak2023vocos,
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
author={Siuzdak, Hubert},
journal={arXiv preprint arXiv:2306.00814},
year={2023}
}
```## License
The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.