Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jackaduma/cyclegan-vc3

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3
https://github.com/jackaduma/cyclegan-vc3

aigc cyclegan cyclegan-vc cyclegan-vc2 cyclegan-vc3 gan pytorch pytorch-implementation voice-cloning voice-conversion

Last synced: about 1 month ago
JSON representation

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Host: GitHub
URL: https://github.com/jackaduma/cyclegan-vc3
Owner: jackaduma
License: mit
Created: 2020-10-27T16:03:09.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-05-05T02:52:52.000Z (over 2 years ago)
Last Synced: 2023-11-07T18:24:46.114Z (about 1 year ago)
Topics: aigc, cyclegan, cyclegan-vc, cyclegan-vc2, cyclegan-vc3, gan, pytorch, pytorch-implementation, voice-cloning, voice-conversion
Language: Python
Homepage:
Size: 388 KB
Stars: 128
Watchers: 10
Forks: 24
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # **CycleGAN-VC3-PyTorch**

[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/jackaduma/CycleGAN-VC2)

[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://paypal.me/jackaduma?locale.x=zh_XC)

[**中文说明**](./README.zh-CN.md) | [**English**](./README.md)

------

This code is a **PyTorch** implementation for paper: [CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion](https://arxiv.org/abs/2010.11672]), a nice work on **Voice-Conversion/Voice Cloning**.

- [x] Dataset

  - [ ] VC

- [x] Usage

  - [x] Training

  - [x] Example 

- [ ] Demo

- [x] Reference

------

## **CycleGAN-VC3**

### [**Project Page**](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html) 

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, CycleGAN-VC [3] and CycleGAN-VC2 [2] have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for **mel-spectrogram conversion**, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to **mel-spectrogram conversion**. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates **time-frequency adaptive normalization (TFAN)**. Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram.

![network comparison](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/images/comparison.png "comparison between vc2 and vc3")  _Figure 1. We developed time-frequency adaptive normalization (TFAN), which extends instance normalization [5] so that the affine parameters become element-dependent and are determined according to an entire input mel-spectrogram._

------

**This repository contains:** 

1. [TFAN module code](tfan_module.py) which implemented the TFAN module

1. [model code](model.py) which implemented the model network.

2. [audio preprocessing script](preprocess_training.py) you can use to create cache for [training data](data).

3. [training scripts](train.py) to train the model.

------

## **Table of Contents**

- [**CycleGAN-VC3-PyTorch**](#cyclegan-vc3-pytorch)

  - [**CycleGAN-VC3**](#cyclegan-vc3)

    - [**Project Page**](#project-page)

  - [**Table of Contents**](#table-of-contents)

  - [**Requirement**](#requirement)

  - [**Usage**](#usage)

  - [**Star-History**](#star-history)

  - [**Reference**](#reference)

  - [Donation](#donation)

  - [**License**](#license)

  

------

## **Requirement** 

```bash

pip install -r requirements.txt

```

## **Usage**

------

## **Star-History**

![star-history](https://api.star-history.com/svg?repos=jackaduma/CycleGAN-VC3&type=Date "star-history")

------

## **Reference**

1. **CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion.** [Paper](https://arxiv.org/abs/2010.11672), [Project](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html)

2. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. [Paper](https://arxiv.org/abs/1904.04631), [Project](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/index.html)

3. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. [Paper](https://arxiv.org/abs/1711.11293), [Project](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc/)

4. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. [Paper](https://arxiv.org/abs/1703.10593), [Project](https://junyanz.github.io/CycleGAN/), [Code](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)

5. Image-to-Image Translation with Conditional Adversarial Nets. [Paper](https://arxiv.org/abs/1611.07004), [Project](https://phillipi.github.io/pix2pix/), [Code](https://github.com/phillipi/pix2pix)

------

## Donation

If this project help you reduce time to develop, you can give me a cup of coffee :) 

AliPay(支付宝)



	



WechatPay(微信)



    



[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://paypal.me/jackaduma?locale.x=zh_XC)

------

## **License**

[MIT](LICENSE) © Kun