{"id":19589704,"url":"https://github.com/jackaduma/cyclegan-vc2","last_synced_at":"2025-04-05T00:07:51.635Z","repository":{"id":43573643,"uuid":"263610365","full_name":"jackaduma/CycleGAN-VC2","owner":"jackaduma","description":"Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2","archived":false,"fork":false,"pushed_at":"2023-06-10T12:00:48.000Z","size":89149,"stargazers_count":555,"open_issues_count":16,"forks_count":108,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-28T23:06:35.328Z","etag":null,"topics":["aigc","cyclegan","cyclegan-vc","cyclegan-vc2","deep-learning","deeplearning","gan","pix2pix","pytorch-implementation","speech-synthesis","voice-cloning","voice-conversion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jackaduma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-05-13T11:29:35.000Z","updated_at":"2025-03-28T04:46:14.000Z","dependencies_parsed_at":"2024-01-16T22:35:36.034Z","dependency_job_id":null,"html_url":"https://github.com/jackaduma/CycleGAN-VC2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FCycleGAN-VC2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FCycleGAN-VC2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FCycleGAN-VC2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FCycleGAN-VC2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jackaduma","download_url":"https://codeload.github.com/jackaduma/CycleGAN-VC2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247266563,"owners_count":20910836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc","cyclegan","cyclegan-vc","cyclegan-vc2","deep-learning","deeplearning","gan","pix2pix","pytorch-implementation","speech-synthesis","voice-cloning","voice-conversion"],"created_at":"2024-11-11T08:20:20.393Z","updated_at":"2025-04-05T00:07:51.600Z","avatar_url":"https://github.com/jackaduma.png","language":"Python","funding_links":["https://paypal.me/jackaduma?locale.x=zh_XC"],"categories":[],"sub_categories":[],"readme":"# **CycleGAN-VC2-PyTorch**\n\n[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/jackaduma/CycleGAN-VC2)\n[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://paypal.me/jackaduma?locale.x=zh_XC)\n\n[**中文说明**](./README.zh-CN.md) | [**English**](./README.md)\n\n------\n\nThis code is a **PyTorch** implementation for paper: [CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion](https://arxiv.org/abs/1904.04631]), a nice work on **Voice-Conversion/Voice Cloning**.\n\n- [x] Dataset\n  - [ ] VC\n  - [x] Chinese Male Speakers (S0913 from [AISHELL-Speech](https://openslr.org/33/) \u0026 [GaoXiaoSong: a Chinese star](https://en.wikipedia.org/wiki/Gao_Xiaosong))\n- [x] Usage\n  - [x] Training\n  - [x] Example \n- [ ] Demo\n- [x] Reference\n\n------\n\n## **Update**\n\n**2020.11.17**: fixed issues: re-implements the second step adverserial loss.\n\n**2020.08.27**: add the second step adverserial loss by [Jeffery-zhang-nfls](https://github.com/Jeffery-zhang-nfls)\n\n## **CycleGAN-VC2**\n\n### [**Project Page**](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/index.html)\n\n\nTo advance the research on non-parallel VC, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (Patch GAN).\n\n\n![network](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/images/network.png \"network\")\n\n------\n\n**This repository contains:** \n\n1. [model code](model_tf.py) which implemented the paper.\n2. [audio preprocessing script](preprocess_training.py) you can use to create cache for [training data](data).\n3. [training scripts](train.py) to train the model.\n4. [Examples of Voice Conversion](converted_sound/) - converted result after training.\n\n------\n\n## **Table of Contents**\n\n- [**CycleGAN-VC2-PyTorch**](#cyclegan-vc2-pytorch)\n  - [**Update**](#update)\n  - [**CycleGAN-VC2**](#cyclegan-vc2)\n    - [**Project Page**](#project-page)\n  - [**Table of Contents**](#table-of-contents)\n  - [**Requirement**](#requirement)\n  - [**Usage**](#usage)\n    - [**preprocess**](#preprocess)\n    - [**train**](#train)\n  - [**Pretrained**](#pretrained)\n  - [**Demo**](#demo)\n  - [**Star-History**](#star-history)\n  - [**Reference**](#reference)\n  - [Donation](#donation)\n  - [**License**](#license)\n  \n------\n\n\n\n## **Requirement** \n\n```bash\npip install -r requirements.txt\n```\n## **Usage**\n\n### **preprocess**\n\n```python\npython preprocess_training.py\n```\nis short for\n\n```python\npython preprocess_training.py --train_A_dir ./data/S0913/ --train_B_dir ./data/gaoxiaosong/ --cache_folder ./cache/\n```\n\n\n### **train** \n```python\npython train.py\n```\n\nis short for\n\n```python\npython train.py --logf0s_normalization ./cache/logf0s_normalization.npz --mcep_normalization ./cache/mcep_normalization.npz --coded_sps_A_norm ./cache/coded_sps_A_norm.pickle --coded_sps_B_norm ./cache/coded_sps_B_norm.pickle --model_checkpoint ./model_checkpoint/ --resume_training_at ./model_checkpoint/_CycleGAN_CheckPoint --validation_A_dir ./data/S0913/ --output_A_dir ./converted_sound/S0913 --validation_B_dir ./data/gaoxiaosong/ --output_B_dir ./converted_sound/gaoxiaosong/\n```\n\n------\n\n## **Pretrained**\n\na pretrained model which converted between S0913 and GaoXiaoSong\n\ndownload from [Google Drive](https://drive.google.com/file/d/1iamizL98NWIPw4pw0nF-7b6eoBJrxEfj/view?usp=sharing) \u003c735MB\u003e\n\n------\n\n## **Demo**\n\nSamples:\n\n\n**reference speaker A:** [S0913(./data/S0913/BAC009S0913W0351.wav)](https://drive.google.com/file/d/14zU1mI8QtoBwb8cHkNdZiPmXI6Mj6pVW/view?usp=sharing)\n\n**reference speaker B:** [GaoXiaoSong(./data/gaoxiaosong/gaoxiaosong_1.wav)](https://drive.google.com/file/d/1s0ip6JwnWmYoWFcEQBwVIIdHJSqPThR3/view?usp=sharing)\n\n\n\n**speaker A's speech changes to speaker B's voice:** [Converted from S0913 to GaoXiaoSong (./converted_sound/S0913/BAC009S0913W0351.wav)](https://drive.google.com/file/d/1S4vSNGM-T0RTo_aclxRgIPkUJ7NEqmjU/view?usp=sharing)\n\n------\n## **Star-History**\n\n![star-history](https://api.star-history.com/svg?repos=jackaduma/CycleGAN-VC2\u0026type=Date \"star-history\")\n\n------\n\n## **Reference**\n1. **CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion**. [Paper](https://arxiv.org/abs/1904.04631), [Project](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/index.html)\n2. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. [Paper](https://arxiv.org/abs/1711.11293), [Project](http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc/)\n3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. [Paper](https://arxiv.org/abs/1703.10593), [Project](https://junyanz.github.io/CycleGAN/), [Code](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)\n4. Image-to-Image Translation with Conditional Adversarial Nets. [Paper](https://arxiv.org/abs/1611.07004), [Project](https://phillipi.github.io/pix2pix/), [Code](https://github.com/phillipi/pix2pix)\n\n------\n\n## Donation\nIf this project help you reduce time to develop, you can give me a cup of coffee :) \n\nAliPay(支付宝)\n\u003cdiv align=\"center\"\u003e\n\t\u003cimg src=\"./misc/ali_pay.png\" alt=\"ali_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\nWechatPay(微信)\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./misc/wechat_pay.png\" alt=\"wechat_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\n[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://paypal.me/jackaduma?locale.x=zh_XC)\n\n------\n\n## **License**\n\n[MIT](LICENSE) © Kun\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fcyclegan-vc2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjackaduma%2Fcyclegan-vc2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fcyclegan-vc2/lists"}