Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucasnewman/descript-mlx
Implementation of the Descript Audio Codec in MLX
https://github.com/lucasnewman/descript-mlx
mlx neural-audio-codec text-to-speech tts
Last synced: about 1 month ago
JSON representation
Implementation of the Descript Audio Codec in MLX
- Host: GitHub
- URL: https://github.com/lucasnewman/descript-mlx
- Owner: lucasnewman
- License: mit
- Created: 2024-10-28T17:41:06.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-10-28T17:47:39.000Z (about 2 months ago)
- Last Synced: 2024-10-28T18:59:04.915Z (about 2 months ago)
- Topics: mlx, neural-audio-codec, text-to-speech, tts
- Language: Python
- Homepage:
- Size: 707 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Descript Audio Codec — MLX
Implementation of the [Descript Audio Codec](https://arxiv.org/abs/2306.06546), with the [MLX](https://github.com/ml-explore/mlx) framework.
Descript can compress 44kHz audio into discrete codes at 8kbps and produces high quality reconstructions at a 90:1 compression ratio compared to the raw audio.
This repository is based on the original Pytorch implementation available [here](https://github.com/descriptinc/descript-audio-codec).
## Installation
```bash
pip install descript-mlx
```## Usage
You can load a pretrained model from Python like this:
```python
import mlx.core as mxfrom descript_mlx import DAC
dac = DAC.from_pretrained("44khz") # or "24khz" / "16khz"
audio = mx.array(...)# encode into latents and codes
z, codes, latents, commitment_loss, codebook_loss = dac.encode(audio)# reconstruct from latents/codes to audio
reconstucted_audio = dac.decode(z)# compress audio to a DAC file
dac_file = dac.compress(audio)
dac_file.save("/path/to/file.dac")# decompress audio from a DAC file
reconstructed_audio = dac.decompress("/path/to/file.dac")
```## Citations
```bibtex
@misc{kumar2023highfidelityaudiocompressionimproved,
title={High-Fidelity Audio Compression with Improved RVQGAN},
author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar},
year={2023},
eprint={2306.06546},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2306.06546},
}
```## License
The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.