Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MoonInTheRiver/NeuralSVB
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
https://github.com/MoonInTheRiver/NeuralSVB
acl2022 gan singing-synthesis singing-voice singing-voice-conversion singing-voice-synthesis
Last synced: 9 days ago
JSON representation
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
- Host: GitHub
- URL: https://github.com/MoonInTheRiver/NeuralSVB
- Owner: MoonInTheRiver
- License: gpl-3.0
- Created: 2022-03-01T07:28:19.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-01-02T08:30:32.000Z (11 months ago)
- Last Synced: 2024-08-02T11:24:39.817Z (3 months ago)
- Topics: acl2022, gan, singing-synthesis, singing-voice, singing-voice-conversion, singing-voice-synthesis
- Language: Python
- Homepage:
- Size: 1.95 MB
- Stars: 416
- Watchers: 13
- Forks: 51
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Learning the Beauty in Songs: Neural Singing Voice Beautifier
---
[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2202.13277)
[![GitHub Stars](https://img.shields.io/github/stars/MoonInTheRiver/NeuralSVB)](https://github.com/MoonInTheRiver/NeuralSVB)
![visitors](https://visitor-badge.glitch.me/badge?page_id=moonintheriver/NeuralSVB)This repository is the official PyTorch implementation of our ACL-2022 [paper](https://arxiv.org/abs/2202.13277).
## 0. Dataset (PopBuTFy) Acquirement
### Audio samples
- You can download the dataset from [here](https://drive.google.com/file/d/1IKFp7y1WeYGrwXgJ0HC3rdPj54WoqIsU/view?usp=sharing). Please send us an email for registration (See in [apply_form](resources/apply_form.md)).
- Dataset [preview](https://github.com/MoonInTheRiver/NeuralSVB/releases/download/pre-release/PopBuTFy-preview.zip).### Text labels
NeuralSVB does not need text as input, but the ASR model to extract PPG needs text. Thus we also provide the [text labels](https://github.com/MoonInTheRiver/NeuralSVB/releases/download/pre-release/text_labels.zip) of PopBuTFy.## 1. Preparation
### Environment Preparation
Most of the required packages are in https://github.com/NATSpeech/NATSpeech/blob/main/requirements.txtOr you can prepare environments with the Requirements.txt file in the repository directory.
```sh
pip install Requirements.txt
```
### Data Preparation1. Extract embeddings of vocal timbre:
```sh
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/save_emb.yaml
```
2. Pack the dataset:
```sh
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/para_bin.yaml
```### Vocoder Preparation
We provide the pre-trained model of [HifiGAN-Singing](https://github.com/MoonInTheRiver/NeuralSVB/releases/download/pre-release/1012_hifigan_all_songs_nsf.zip) which is specially designed for SVS with NSF mechanism.Please unzip pre-trained vocoder into `checkpoints` before training your acoustic model.
This singing vocoder is trained on 100+ hours singing data (including Chinese and English songs).
### PPG Extractor Preparation
We provide the pre-trained model of [PPG Extractor](https://github.com/MoonInTheRiver/NeuralSVB/releases/download/pre-release/1009_pretrain_asr_english.zip).Please unzip pre-trained PPG extractor into `checkpoints` before training your acoustic model.
After the instructions above, the directory structure should be as follows:
```
.
|--data
|--processed
|--PopBuTFy (unzip PopBuTFy.zip)
|--data
|--directories containing wavs
|--binary
|--PopBuTFyENSpkEM
|--checkpoints
|--1009_pretrain_asr_english
|--
|--config.yaml
|--1012_hifigan_all_songs_nsf
|--
|--config.yaml
```## 2. Training Example
```sh
CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset
```## 3. Inference
### Inference from packed test set```sh
CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset --infer
```
Inference results will be saved in `./checkpoints/EXP_NAME/generated_` by default.We provided:
- the [pre-trained model](https://github.com/MoonInTheRiver/NeuralSVB/releases/download/pre-release/1030_vae_mle.zip) of NSVB (en version);Remember to put the pre-trained models in `checkpoints` directory.
### Inference from raw inputs
WIP.## Limitations
See Appendix D "Limitations and Solutions" in our [paper](https://aclanthology.org/2022.acl-long.549.pdf).## Citation
If this repository helps your research, please cite:@inproceedings{liu-etal-2022-learning-beauty,
title = "Learning the Beauty in Songs: Neural Singing Voice Beautifier",
author = "Liu, Jinglin and
Li, Chengxi and
Ren, Yi and
Zhu, Zhiying and
Zhao, Zhou",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.549",
pages = "7970--7983",}## Issues
- Before raising a issue, please check our Readme and other issues for possible solutions.
- We will try to handle your problem in time but we could not guarantee a satisfying solution.
- Please be friendly.## Acknowledgements
* r9y9's [wavenet_vocoder](https://github.com/r9y9/wavenet_vocoder)
* Po-Hsun-Su's [ssim](https://github.com/Po-Hsun-Su/pytorch-ssim)
* descriptinc's [melgan](https://github.com/descriptinc/melgan-neurips)
* Official [espnet](https://github.com/espnet/espnet)
* Official [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)The framework of this repository is based on [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger),
and is a predecessor of [NATSpeech](https://github.com/NATSpeech/NATSpeech/).