https://github.com/OlaWod/FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
https://github.com/OlaWod/FreeVC

pytorch speech voice-conversion

Last synced: 6 months ago
JSON representation

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Host: GitHub
URL: https://github.com/OlaWod/FreeVC
Owner: OlaWod
License: mit
Created: 2022-10-27T05:39:20.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-01-19T07:48:20.000Z (10 months ago)
Last Synced: 2025-01-19T08:28:10.302Z (10 months ago)
Topics: pytorch, speech, voice-conversion
Language: Python
Homepage:
Size: 14.9 MB
Stars: 617
Watchers: 19
Forks: 112
Open Issues: 46
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - OlaWod/FreeVC

README

          # FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2210.15418)

[![githubio](https://img.shields.io/static/v1?message=Audio%20Samples&logo=Github&labelColor=grey&color=blue&logoColor=white&label=%20&style=flat)](https://olawod.github.io/FreeVC-demo/)

![GitHub Repo stars](https://img.shields.io/github/stars/OlaWod/FreeVC)

![GitHub](https://img.shields.io/github/license/OlaWod/FreeVC)

In this [paper](https://arxiv.org/abs/2210.15418), we adopt the end-to-end framework of [VITS](https://arxiv.org/abs/2106.06103) for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to [WavLM](https://arxiv.org/abs/2110.13900) features, and propose the **spectrogram-resize** based data augmentation to improve the purity of extracted content information.

[🤗 Play online at HuggingFace Spaces](https://huggingface.co/spaces/OlaWod/FreeVC).

Visit our [demo page](https://olawod.github.io/FreeVC-demo) for audio samples.

We also provide the [pretrained models](https://1drv.ms/u/s!AnvukVnlQ3ZTx1rjrOZ2abCwuBAh?e=UlhRR5).

  

    

    

  

  

    (a) Training

    (b) Inference

  

## Updates

- Code release. (Nov 27, 2022)

- Online demo at HuggingFace Spaces. (Dec 14, 2022)

- Supports 24kHz outputs. See [here](https://github.com/OlaWod/FreeVC/tree/main/tips-for-synthesizing-24KHz-wavs-from-16kHz-wavs/) for details. (Dec 15, 2022)

- Fix data loading bug. (Jan 10, 2023)

## Pre-requisites

1. Clone this repo: `git clone https://github.com/OlaWod/FreeVC.git`

2. CD into this repo: `cd FreeVC`

3. Install python requirements: `pip install -r requirements.txt`

4. Download [WavLM-Large](https://github.com/microsoft/unilm/tree/master/wavlm) and put it under directory 'wavlm/'

5. Download the [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) dataset (for training only)

6. Download [HiFi-GAN model](https://github.com/jik876/hifi-gan) and put it under directory 'hifigan/' (for training with SR only)

## Inference Example

Download the pretrained checkpoints and run:

```python

# inference with FreeVC

CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc

# inference with FreeVC-s

CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s

```

## Training Example

1. Preprocess

```python

python downsample.py --in_dir 

ln -s dataset/vctk-16k DUMMY

# run this if you want a different train-val-test split

python preprocess_flist.py

# run this if you want to use pretrained speaker encoder

CUDA_VISIBLE_DEVICES=0 python preprocess_spk.py

# run this if you want to train without SR-based augmentation

CUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py

# run these if you want to train with SR-based augmentation

CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72

CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76

CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80

CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84

CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88

CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92

```

2. Train

```python

# train freevc

CUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc

# train freevc-s

CUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s

```

## References

- https://github.com/jaywalnut310/vits

- https://github.com/microsoft/unilm/tree/master/wavlm

- https://github.com/jik876/hifi-gan

- https://github.com/liusongxiang/ppg-vc

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/OlaWod/FreeVC

Awesome Lists containing this project

README