https://github.com/sophiefy/Sovits

An unofficial implementation of the combination of Soft-VC and VITS
https://github.com/sophiefy/Sovits

Last synced: 23 days ago
JSON representation

An unofficial implementation of the combination of Soft-VC and VITS

Host: GitHub
URL: https://github.com/sophiefy/Sovits
Owner: sophiefy
License: mit
Created: 2022-08-29T16:31:13.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-11-13T11:48:32.000Z (over 2 years ago)
Last Synced: 2024-05-19T02:06:19.406Z (about 1 year ago)
Language: Jupyter Notebook
Homepage:
Size: 31.6 MB
Stars: 458
Watchers: 6
Forks: 53
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
Stella VC Based on Soft-VC and VITS


## **This project is closed...**

Contents


- [Update](#Update)

- [Introduction](#Introduction)

- [Models](#Models)

  - [A Certain Magical Index](#Index)

  - [Shiki Natsume](#Natsume)

  - [Shiki Natsume 2.0](#Natsume2)

- [How to use](#Usage)

- [TODO](#TODO)

- [Contact](#Contact)

- [Acknowledgement](#Ack)

- [References](#References)

Update


- Sovits 2.0 inference demo is available!

Introduction


Inspired by [Rcell](https://space.bilibili.com/343303724/?spm_id_from=333.999.0.0), I replaced the word embedding of `TextEncoder` in VITS with the output of the `ContentEncoder` used in [Soft-VC](https://github.com/bshall/soft-vc) to achieve any-to-one voice conversion with non-parallel data. Of course, any-to-many voice converison is also doable!

For better voice quality, in Sovits2, I utilize the f0 model used in [StarGANv2-VC](https://github.com/yl4579/StarGANv2-VC) to get fundamental frequency feature of an input audio and feed it to the vocoder of VITS.

Models


A Certain Magical Index


![index](assets/cover5.png)

- Description

|Speaker|ID|

|-|-|

|一方通行|0|

|上条当麻|1|

|御坂美琴|2|

|白井黑子|3|

- Model: [Google drive](https://drive.google.com/file/d/1QfLYyqCEKlqC6fLYccISoIRxeqKEUtLs/view?usp=sharing)

- Config: in this repository

- Demo

  - Colab: [Sovits (魔法禁书目录)](https://colab.research.google.com/drive/1OjfH2zpRkLFRp92aU6jAGhqZNopfZMjC?usp=sharing)

  - BILIBILI: [基于Sovits的4人声音转换模型](https://www.bilibili.com/video/BV1zY4y1T71W?share_source=copy_web&vd_source=630b87174c967a898cae3765fba3bfa8)

Shiki Natsume


![natsume](assets/cover2.png)

- Description

Single speaker model of Shiki Natsume.

- Model: [Google drive](https://drive.google.com/file/d/1eco4a1KTQt6YHv6Nza9XesF3Ao6JktBL/view?usp=sharing)

- Config: in this repository

- Demo

  - Colab: [Sovits (四季夏目)](https://colab.research.google.com/drive/190IbYEorG1wnw-QbUPH9SD6JytLF0KRv?usp=sharing)

  - BILIBILI: [枣子姐变声器](https://www.bilibili.com/video/BV13e411u7f1?share_source=copy_web&vd_source=630b87174c967a898cae3765fba3bfa8)

  

Shiki Natsume 2.0


![natsume](assets/cover6.png)

- Description

Single speaker model of Shiki Natsume, trained with F0 feature.

- Model: [Google drive](https://drive.google.com/file/d/1-0s7NBk49MMJzF-aBaqfuclVgF4yJzXa/view?usp=sharing)

- Config: in this repository

- Demo

  - Colab: [Sovits2 (四季夏目)](https://colab.research.google.com/drive/11GC7uAgPya2UIb5jfIuwnIqUN2qPF37w?usp=sharing)

How to use


### Train

#### Prepare dataset

Audio should be `wav` file, with mono channel and a sampling rate of 22050 Hz. 

Your dataset should be like:

```

└───wavs

    ├───dev

    │   ├───LJ001-0001.wav

    │   ├───...

    │   └───LJ050-0278.wav

    └───train

        ├───LJ002-0332.wav

        ├───...

        └───LJ047-0007.wav

```

#### Extract speech units

Utilize the content encoder to extract speech units in the audio.

For more information, refer to [this repo](https://github.com/bshall/acoustic-model).

```python

cd hubert

python3 encode.py soft path/to/wavs/directory path/to/soft/directory --extension .wav

```

Then you need to generate filelists for both your training and validation files. It's recommended that you prepare your filelists beforehand!

Your filelists should look like:

Single speaker:

```

path/to/wav|path/to/unit

...

```

Multi-speaker:

```

path/to/wav|id|path/to/unit

...

```

#### Train Sovits

Single speaker:

```

python train.py -c configs/config.json -m model_name

```

Multi-speaker:

```

python train_ms.py -c configs/config.json -m model_name

```

You may also refer to [train.ipynb](train.ipynb)

### Inference

Please refer to [inference.ipynb](inference.ipynb)

TOD0


- [x] Add F0 model

- [ ] Add F0 loss

Contact


QQ: 2235306122

BILIBILI: [Francis-Komizu](https://space.bilibili.com/636704927)

Ackowledgement


Special thanks to [Rcell](https://space.bilibili.com/343303724/?spm_id_from=333.999.0.0) for giving me both inspiration and advice!

References


[基于VITS和SoftVC实现任意对一VoiceConversion](https://www.bilibili.com/video/BV1S14y1x78X?share_source=copy_web&vd_source=630b87174c967a898cae3765fba3bfa8)

[Soft-VC](https://github.com/bshall/soft-vc)

[vits](https://github.com/jaywalnut310/vits)

[StarGANv2-VC](https://github.com/yl4579/StarGANv2-VC)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sophiefy/Sovits

Awesome Lists containing this project

README

Stella VC Based on Soft-VC and VITS

Contents

Update

Introduction

Models

A Certain Magical Index

Shiki Natsume

Shiki Natsume 2.0

How to use

TOD0

Contact

Ackowledgement

References