https://github.com/sony/bigvsan_eval

Evaluation tool used in the BigVSAN paper
https://github.com/sony/bigvsan_eval

pytorch

Last synced: 6 months ago
JSON representation

Evaluation tool used in the BigVSAN paper

Host: GitHub
URL: https://github.com/sony/bigvsan_eval
Owner: sony
License: mit
Created: 2023-09-01T00:43:42.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-03-22T09:29:50.000Z (over 1 year ago)
Last Synced: 2025-03-29T08:04:52.703Z (7 months ago)
Topics: pytorch
Language: Python
Homepage:
Size: 3.91 KB
Stars: 11
Watchers: 24
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Vocoder Evaluation

This repository contains the evaluation tool used in **"BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network"** (*[arXiv 2309.02836](https://arxiv.org/abs/2309.02836)*).
Please cite [[1](#citation)] in your work when using this code in your experiments.

## Quick Start

First, prepare an environment
```shell
pip install -r requirements.txt
```

Then, perform an evaluation
```shell
python evaluate.py ...
```
```gt_dir n``` means a directory that contains ground-truth audio files, and ```synth_dir n``` means a directory that contains synthesized audio files. Each file in ```synth_dir n``` needs to have the corresponding file that has the same name in ```gt_dir n```. Also, a corresponding pair needs to be time-aligned in advance.

```evaluate.py``` will output calculated metrics for each ```gt_dir n```-```synth_dir n``` pair and the macro averages of them across all pairs. It will take some time to complete an evaluation.

## Supported evaluation metrics
This toolbox supports the following metrics:

- M-STFT: Multi-resolution short-term Fourier transform
- PESQ: Perceptual evaluation of speech quality
- MCD: Mel-cepstral distortion
- Periodicity: Periodicity error
- V/UV F1: F1 score of voiced/unvoiced classification

## Citation

If you find this tool useful, please consider citing

[1] Shibuya, T., Takida, Y., Mitsufuji, Y.,
"BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network,"
ICASSP 2024.
```bibtex
@inproceedings{shibuya2024bigvsan,
title={{BigVSAN}: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network},
author={Shibuya, Takashi and Takida, Yuhta and Mitsufuji, Yuki},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024}
}
```

## References

> https://github.com/NVIDIA/BigVGAN

> https://github.com/csteinmetz1/auraloss

> https://github.com/ludlows/PESQ

> https://github.com/ttslr/python-MCD

> https://github.com/descriptinc/cargan

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sony/bigvsan_eval

Awesome Lists containing this project

README