https://github.com/sony/bigvsan_eval
Evaluation tool used in the BigVSAN paper
https://github.com/sony/bigvsan_eval
pytorch
Last synced: 6 months ago
JSON representation
Evaluation tool used in the BigVSAN paper
- Host: GitHub
- URL: https://github.com/sony/bigvsan_eval
- Owner: sony
- License: mit
- Created: 2023-09-01T00:43:42.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-22T09:29:50.000Z (over 1 year ago)
- Last Synced: 2025-03-29T08:04:52.703Z (7 months ago)
- Topics: pytorch
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 11
- Watchers: 24
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Vocoder Evaluation
This repository contains the evaluation tool used in **"BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network"** (*[arXiv 2309.02836](https://arxiv.org/abs/2309.02836)*).
Please cite [[1](#citation)] in your work when using this code in your experiments.## Quick Start
First, prepare an environment
```shell
pip install -r requirements.txt
```Then, perform an evaluation
```shell
python evaluate.py ...
```
```gt_dir n``` means a directory that contains ground-truth audio files, and ```synth_dir n``` means a directory that contains synthesized audio files. Each file in ```synth_dir n``` needs to have the corresponding file that has the same name in ```gt_dir n```. Also, a corresponding pair needs to be time-aligned in advance.```evaluate.py``` will output calculated metrics for each ```gt_dir n```-```synth_dir n``` pair and the macro averages of them across all pairs. It will take some time to complete an evaluation.
## Supported evaluation metrics
This toolbox supports the following metrics:- M-STFT: Multi-resolution short-term Fourier transform
- PESQ: Perceptual evaluation of speech quality
- MCD: Mel-cepstral distortion
- Periodicity: Periodicity error
- V/UV F1: F1 score of voiced/unvoiced classification## Citation
If you find this tool useful, please consider citing
[1] Shibuya, T., Takida, Y., Mitsufuji, Y.,
"BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network,"
ICASSP 2024.
```bibtex
@inproceedings{shibuya2024bigvsan,
title={{BigVSAN}: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network},
author={Shibuya, Takashi and Takida, Yuhta and Mitsufuji, Yuki},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024}
}
```## References
> https://github.com/NVIDIA/BigVGAN
> https://github.com/csteinmetz1/auraloss
> https://github.com/ludlows/PESQ
> https://github.com/ttslr/python-MCD
> https://github.com/descriptinc/cargan