{"id":17864717,"url":"https://github.com/k2kobayashi/crank","last_synced_at":"2025-04-06T22:12:17.184Z","repository":{"id":47318806,"uuid":"268407220","full_name":"k2kobayashi/crank","owner":"k2kobayashi","description":"A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder","archived":false,"fork":false,"pushed_at":"2024-07-25T11:09:32.000Z","size":13030,"stargazers_count":171,"open_issues_count":12,"forks_count":31,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-30T19:11:30.475Z","etag":null,"topics":["adversarial-learning","cyclic-constraints","speech-synthesis","vocoder","voice-conversion","vqvae"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/k2kobayashi.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-01T02:32:42.000Z","updated_at":"2025-03-05T15:21:31.000Z","dependencies_parsed_at":"2024-10-28T09:17:15.130Z","dependency_job_id":"0850d71f-de71-4afe-97a2-f5aef1b697e5","html_url":"https://github.com/k2kobayashi/crank","commit_stats":{"total_commits":273,"total_committers":2,"mean_commits":136.5,"dds":0.09523809523809523,"last_synced_commit":"80f859cca3dfbfd75cebf2c3f58fe3c1dbe2b7ba"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k2kobayashi%2Fcrank","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k2kobayashi%2Fcrank/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k2kobayashi%2Fcrank/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k2kobayashi%2Fcrank/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/k2kobayashi","download_url":"https://codeload.github.com/k2kobayashi/crank/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247557770,"owners_count":20958047,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-learning","cyclic-constraints","speech-synthesis","vocoder","voice-conversion","vqvae"],"created_at":"2024-10-28T09:15:09.215Z","updated_at":"2025-04-06T22:12:17.150Z","avatar_url":"https://github.com/k2kobayashi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![test](https://github.com/k2kobayashi/crank/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/k2kobayashi/crank/actions/workflows/ci.yml)\n[![PyPI version](https://badge.fury.io/py/crank-vc.svg)](https://badge.fury.io/py/crank-vc)\n\n# crank\n\nNon-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning\n\n## Setup\n\n- Install Python dependency\n\n```sh\n$ git clone https://github.com/k2kobayashi/crank.git\n$ cd crank/tools\n$ make\n```\n\n- install dependency for mosnet\n\n```sh\n$ sudo apt install ffmpeg   # mosnet dependency\n```\n\n## Recipes\n- English\n    - VCC2020\n    - VCC2018 (Thanks to [@unilight](https://github.com/unilight))\n- Japanese\n    - jsv_ver1\n\n### Conversion samples\nYou can access several converted audio samples of VCC 2018 dataset in the [URL](https://k2kobayashi.github.io/crankSamples/).\n- [vcc2020v1](https://drive.google.com/file/d/1uInvCwggpBYmpplYxuIOidvJkPmav8kE/view?usp=sharing)\n- [vcc2018v1](https://drive.google.com/file/d/1-Z_Y9pahPQcKR0rqdhu4elI6Hz686qX6/view?usp=sharing)\n\n## Run VCC2020 recipe\n\ncrank has prepared recipe for Voice Conversion Challenge 2020.\nIn crank recipe, there are 7 stages to implement non-parallel voice conversion.\n\n- stage 0\n    - download dataset\n- stage 1\n    - initialization\n        - generate scp files and figures to be determine speaker-dependent parameters\n- stage 2\n    - feature extraction\n        - extract mlfb and mcep features\n- stage 3\n    - training\n- stage 4\n    - reconstuction\n        - generate reconstructed feature for fine-tuning of neural vocoder\n- stage 5\n    - evaluation\n        - convert evaluation waveform\n- stage 6\n    - synthesis\n        - synthesis waveform by pre-trained [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)\n        - synthesis waveform by GriffinLim\n- stage 7\n    - objective evalution\n        - mel-cepstrum distortion\n        - mosnet\n\n### Put dataset to downloads\n\nNote that dataset is only released for the participants (2020/05/26).\n```\n$ cd egs/vaevc/vcc2020v1\n$ mkdir downloads \u0026\u0026 cd downloads\n$ mv \u003cpath_to_zip\u003e/vcc2020_{training,evaluation}.zip downloads\n$ unzip vcc2020_training.zip\n$ unzip vcc2020_evaluation.zip\n```\n\n### Run feature extraction and model training\n\nBecause the challenge defines its training and evaluation set, we have initially put configuration files.\nSo, you need to run from 2nd stage.\n\n```sh\n$ ./run.sh --n_jobs 10 --stage 2 --stop_stage 5\n```\n\nwhere the ```n_jobs``` indicates the number of CPU cores used in the training.\n\n\n## Configuration\nConfigurations are defined in ```conf/mlfb_vqvae.yml```.\nFollowings are explanation of representative parameters.\n\n- feature\n\nWhen you create your own recipe, be carefull to set parameters for feature extraction such as ```fs```, ```fftl```, ```hop_size```, ```framems```, ```shiftms```, and ```mcep_alpha```. These parameters depend on sampling frequency.\n\n- feat_type\n\nYou can choose ```feat_type``` either ```mlfb``` or ```mcep```.\nIf you choose ```mlfb```, the converted waveforms are generated by either GllifinLim vocoder or ParallelWaveGAN vocoder.\nIf you choose ```mcep```, the converted waveforms are generated by world vocoder (i.e., excitation generation and MLSA filtering).\n\n- trainer_type\n\nWe support training with ```vqvae```, ```lsgan```, ```cyclegan```, and ```stargan``` using same generator network.\n  - ```vqvae```: default vqvae setting\n  - ```lsgan```: vqvae with adversarial learning\n  - ```cyclegan```: vqvae with adevesarial learning and cyclic constraints\n  - ```stargan```: vqvae with adevesarial learning similar to cyclegan\n\n## Create your recipe\n\n### Copy recipe template\n\nPlease copy template directory to start creation of your recipe.\n\n```sh\n$ cp -r egs/vaevc/template egs/vaevc/\u003cnew_recipe\u003e\n$ cd egs/vaevc/\u003cnew_recipe\u003e\n```\n\n### Put .wav files\n\nYou need to put wav files appropriate directory.\nYou can choose either modifying ```download.sh``` or putting wav files.\nIn either case, the wav files should be located in each speaker like following\n```\u003cnew_recipe\u003e/downloads/wav/{spkr1, spkr2, ..., spkr3}/*.wav```.\n\nIf you modify ```downaload.sh```,\n\n```sh\n$ vim local/download.sh\n```\n\nIf you put wav files,\n\n```sh\n$ mkdir downloads\n$ mv \u003cpath_to_your_wav_directory\u003e downloads/wav\n$ touch downloads/.done\n```\n\n### Run initialization\n\nThe initialization process generates kaldi-like scp files.\n\n```sh\n$ ./run.sh --stage 0 --stop_stage 1\n```\n\nThen you modify speaker-dependent parameters in ```conf/spkr.yml``` using generated figures.\nPage 20~22 in [slide](https://www.slideshare.net/NU_I_TODALAB/hands-on-voice-conversion) help you how to set these parameters.\n\n\n### Run feature extraction, train, reconstruction, and evaluation\n\nAfter preparing configuration, you run it.\n\n```sh\n$ ./run.sh --stage 2 --stop_stage 7\n```\n\n## Citation\n\nPlease cite this paper when you use crank.\n\n```\nK. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda,\n\"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder\",\nProc. ICASSP, 2021. (accepted)\n```\n\n## Achknowledgements\n\nThank you [@kan-bayashi](https://github.com/kan-bayashi) for lots of contributions and encouragement helps.\n\n## Who we are\n\n- Kazuhiro Kobayashi [@k2kobayashi](https://github.com/k2kobayashi) [maintainer, design and development]\n\n- Wen-Chin Huang [@unilight](https://github.com/unilight) [maintainer, design and development]\n\n- [Tomoki Toda](https://sites.google.com/site/tomokitoda/) [advisor]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk2kobayashi%2Fcrank","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fk2kobayashi%2Fcrank","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk2kobayashi%2Fcrank/lists"}