https://github.com/playvoice/vi-svs

Singing Voice Synthesis based on VITS, different from VISinger
https://github.com/playvoice/vi-svs

diffsinger opencpop singing-synthesis singing-voice-synthesis speech-synthesis svs visinger vits vits-svs

Last synced: 3 months ago
JSON representation

Singing Voice Synthesis based on VITS, different from VISinger

Host: GitHub
URL: https://github.com/playvoice/vi-svs
Owner: PlayVoice
License: apache-2.0
Created: 2022-03-14T07:15:21.000Z (about 3 years ago)
Default Branch: VISinger
Last Pushed: 2023-11-13T03:37:39.000Z (over 1 year ago)
Last Synced: 2025-01-17T08:07:40.270Z (3 months ago)
Topics: diffsinger, opencpop, singing-synthesis, singing-voice-synthesis, speech-synthesis, svs, visinger, vits, vits-svs
Language: Python
Homepage:
Size: 2.14 MB
Stars: 187
Watchers: 8
Forks: 31
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Variational Inference with adversarial learning for end-to-end Singing Voice Synthesis

Different from VISinger, It is just VITS without MAS and DurationPredictor.

作为一个用于学习的项目，就这样了：Pitch的预测是需要改进的地方

![VISinger](https://github.com/MaxMax2016/VI-SVS/assets/16432329/c76ca716-b230-4852-b8f0-2c3041af7072)

![VI-SVS](https://github.com/MaxMax2016/VI-SVS/assets/16432329/128c0f33-4428-4b57-9cd3-b6237f53c1a4)

**Pitch and Duration will be developed as add-on!**

# 训练步骤

- 1 下载数据 segments.zip，并解压

```
segments
|-- test.txt
|-- train.txt
|-- transcriptions.txt
`-- wavs
|-- 2001000001.wav
|-- 2001000002.wav
|-- 2001000003.wav
```

- 2 转换采样率: 本项目采用32KHz
```
python util/resample.py -w segments/wavs/ -o data_svs/wavs -s 32000
```

- 3 生成数据标注
```
python util/generate_label.py --config configs/singing_base.yaml --data data_svs/ --file segments/transcriptions.txt
```

data_svs/labels.txt，内容格式：wave path|label path|score path|pitch path|slurs path

- 3 划分训练索引
```
python util/generate_label.py --file data_svs/labels.txt
```

生成 filelists/singing_train.txt 和 filelists/singing_valid.txt

- 4 启动训练
```
python svs_train.py -c configs/singing_base.yaml -n vits_svs
```

- 5 训练Pitch
```
python pit_train.py -c configs/singing_base.yaml -n pitch
```

# 推理验证

- 0 模型导出
```
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
```

- 1 推理验证: F0根据乐谱生成
```
python svs_infer.py --config configs/singing_base.yaml --model svs_opencpop.pt
```

- 2 完整歌曲合成（[使用release模型](https://github.com/PlayVoice/VI-SVS/releases/tag/0.0.3)）
```
python svs_song.py --config configs/singing_base.yaml --model svs_opencpop.pt
```

# 推理验证，使用Pitch预测，效果不佳

- 0 模型导出
```
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
```

```
python pit_export.py --config configs/singing_base.yaml --model chkpt/pitch/pitch_****.pt
```

- 1 推理验证
```
python svs_infer_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt
```

- 2 完整歌曲合成（[使用release模型](https://github.com/PlayVoice/VI-SVS/releases/tag/0.0.3)）
```
python svs_song_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt
```

# 数据

https://wenet.org.cn/opencpop/

# 歌声合成参考

https://github.com/SJTMusicTeam/Muskits

https://github.com/MoonInTheRiver/DiffSinger

[VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis](https://arxiv.org/abs/2110.08813)

# 模型设计参考

https://github.com/NVIDIA/BigVGAN

https://github.com/jaywalnut310/vits

https://github.com/mindslab-ai/univnet

https://github.com/PlayVoice/so-vits-svc-5.0

https://github.com/shivammehta25/Matcha-TTS

[RoFormer: Enhanced Transformer with rotary position embedding](https://arxiv.org/abs/2104.09864)

# Diffusion Pitch

https://github.com/thuhcsi/DiffVar

https://github.com/hayeong0/Diff-HierVC

https://github.com/tonnetonne814/SiFi-VITS2-44100-Ja

[Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech](https://arxiv.org/abs/2105.06337)

# Diffusion Pitch of Diff-HierVC
![DiffPitch](https://github.com/PlayVoice/VI-SVS/assets/16432329/055d75a4-7009-46c1-8603-65254cec47dd)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/playvoice/vi-svs

Awesome Lists containing this project

README

Variational Inference with adversarial learning for end-to-end Singing Voice Synthesis