https://github.com/yrom/mlx-bigvgan
MLX implementation of https://github.com/NVIDIA/BigVGAN
https://github.com/yrom/mlx-bigvgan
gan mlx python3 vocoder
Last synced: 4 months ago
JSON representation
MLX implementation of https://github.com/NVIDIA/BigVGAN
- Host: GitHub
- URL: https://github.com/yrom/mlx-bigvgan
- Owner: yrom
- License: mit
- Created: 2025-04-28T12:06:40.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-14T04:16:38.000Z (about 1 year ago)
- Last Synced: 2025-10-29T16:31:01.751Z (8 months ago)
- Topics: gan, mlx, python3, vocoder
- Language: Jupyter Notebook
- Homepage:
- Size: 3.52 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MLX BigVGAN
An MLX-adapted implementation of [BigVGAN](https://github.com/NVIDIA/BigVGAN).
## Features
- **BigVGAN Integration**: Fully integrates the original BigVGAN model with MLX for enhanced compatibility and performance.
- **Flexible Conversion**: Includes tools to convert the original BigVGAN PyTorch weights to MLX format.
- **Customizable Configurations**: Supports various configurations for kernel sizes, dilation rates, and activation functions (e.g., `snake`, `snakebeta`).
- **Pretrained Models**: Easily load pretrained BigVGAN models from the Hugging Face Hub.
## Installation
```bash
pip install mlx-bigvgan
```
## Usage
### 1. Load Pretrained Model
```python
from mlx_bigvgan import BigVGAN
model = BigVGAN.from_pretrained("wyrom/mlx-bigvgan_v2_24khz_100band_256x")
model.eval()
mx.eval(model.parameters())
```
### 2. Generate Audio
```python
import numpy as np
import mlx.core as mx
from mlx_bigvgan import log_mel_spectrogram, load_audio
# Load audio file
audio = load_audio("path/to/audio.wav")
h = model.config
# Compute log-mel spectrogram
mel_spec = log_mel_spectrogram(audio,
n_fft=h.n_fft,
n_mels=h.num_mels,
sample_rate=h.sampling_rate,
hop_length=h.hop_size,
fmin=h.fmin,
fmax=h.fmax,
padding=(h.n_fft - h.hop_size) // 2,
mel_norm="slaney",
mel_scale="slaney",
power=1.0,
)
# reshape to [B(1), T, C_mels]
mel_spec = mx.expand_dims(mel_spec, 0)
# Generate waveform
waveform = model(mel_spec) # [B(1), T, 1]
# Reshape to [T, 1]
waveform_float = waveform.squeeze(0)
# Convert to int16
waveform_int16 = mx.clip(waveform_float * 32767, -32768, 32767).astype(mx.int16)
# save to wav
import soundfile as sf
sf.write("output.wav", waveform_int16, h.sampling_rate, "PCM_16")
```
### 3. Convert Original BigVGAN Weights to MLX Format
You can convert the original BigVGAN weights to MLX format using the provided script.
`repo_id` is the Hugging Face model ID of the original BigVGAN model you want to convert.
See [nvidia/BigVGAN](https://huggingface.co/collections/nvidia/bigvgan-66959df3d97fd7d98d97dc9a) for move pretrained models.
```bash
python -m mlx_bigvgan.convert --repo_id nvidia/bigvgan_v2_xxx --output_dir mlx_models
```
## References
- [BigVGAN](https://github.com/NVIDIA/BigVGAN)
- [MLX](https://github.com/ml-explore/mlx)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.