{"id":13794012,"url":"https://github.com/OlaWod/FreeVC","last_synced_at":"2025-05-12T20:31:27.316Z","repository":{"id":62989461,"uuid":"558217726","full_name":"OlaWod/FreeVC","owner":"OlaWod","description":"FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion","archived":false,"fork":false,"pushed_at":"2025-01-19T07:48:20.000Z","size":15617,"stargazers_count":617,"open_issues_count":46,"forks_count":112,"subscribers_count":19,"default_branch":"main","last_synced_at":"2025-01-19T08:28:10.302Z","etag":null,"topics":["pytorch","speech","voice-conversion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OlaWod.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-27T05:39:20.000Z","updated_at":"2025-01-19T07:48:21.000Z","dependencies_parsed_at":"2024-08-03T23:02:53.234Z","dependency_job_id":"d726ba35-7b8e-419e-9ea7-06a3ba1efeef","html_url":"https://github.com/OlaWod/FreeVC","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OlaWod%2FFreeVC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OlaWod%2FFreeVC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OlaWod%2FFreeVC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OlaWod%2FFreeVC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OlaWod","download_url":"https://codeload.github.com/OlaWod/FreeVC/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253816751,"owners_count":21968877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","speech","voice-conversion"],"created_at":"2024-08-03T23:00:34.683Z","updated_at":"2025-05-12T20:31:22.254Z","avatar_url":"https://github.com/OlaWod.png","language":"Python","funding_links":[],"categories":["Modified","语音合成","Python"],"sub_categories":["Other Improvements","网络服务_其他"],"readme":"# FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion\r\n\r\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2210.15418)\r\n[![githubio](https://img.shields.io/static/v1?message=Audio%20Samples\u0026logo=Github\u0026labelColor=grey\u0026color=blue\u0026logoColor=white\u0026label=%20\u0026style=flat)](https://olawod.github.io/FreeVC-demo/)\r\n![GitHub Repo stars](https://img.shields.io/github/stars/OlaWod/FreeVC)\r\n![GitHub](https://img.shields.io/github/license/OlaWod/FreeVC)\r\n\r\nIn this [paper](https://arxiv.org/abs/2210.15418), we adopt the end-to-end framework of [VITS](https://arxiv.org/abs/2106.06103) for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to [WavLM](https://arxiv.org/abs/2110.13900) features, and propose the **spectrogram-resize** based data augmentation to improve the purity of extracted content information.\r\n\r\n[🤗 Play online at HuggingFace Spaces](https://huggingface.co/spaces/OlaWod/FreeVC).\r\n\r\nVisit our [demo page](https://olawod.github.io/FreeVC-demo) for audio samples.\r\n\r\nWe also provide the [pretrained models](https://1drv.ms/u/s!AnvukVnlQ3ZTx1rjrOZ2abCwuBAh?e=UlhRR5).\r\n\r\n\u003ctable style=\"width:100%\"\u003e\r\n  \u003ctr\u003e\r\n    \u003ctd\u003e\u003cimg src=\"./resources/train.png\" alt=\"training\" height=\"200\"\u003e\u003c/td\u003e\r\n    \u003ctd\u003e\u003cimg src=\"./resources/infer.png\" alt=\"inference\" height=\"200\"\u003e\u003c/td\u003e\r\n  \u003c/tr\u003e\r\n  \u003ctr\u003e\r\n    \u003cth\u003e(a) Training\u003c/th\u003e\r\n    \u003cth\u003e(b) Inference\u003c/th\u003e\r\n  \u003c/tr\u003e\r\n\u003c/table\u003e\r\n\r\n## Updates\r\n\r\n- Code release. (Nov 27, 2022)\r\n- Online demo at HuggingFace Spaces. (Dec 14, 2022)\r\n- Supports 24kHz outputs. See [here](https://github.com/OlaWod/FreeVC/tree/main/tips-for-synthesizing-24KHz-wavs-from-16kHz-wavs/) for details. (Dec 15, 2022)\r\n- Fix data loading bug. (Jan 10, 2023)\r\n\r\n## Pre-requisites\r\n\r\n1. Clone this repo: `git clone https://github.com/OlaWod/FreeVC.git`\r\n\r\n2. CD into this repo: `cd FreeVC`\r\n\r\n3. Install python requirements: `pip install -r requirements.txt`\r\n\r\n4. Download [WavLM-Large](https://github.com/microsoft/unilm/tree/master/wavlm) and put it under directory 'wavlm/'\r\n\r\n5. Download the [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) dataset (for training only)\r\n\r\n6. Download [HiFi-GAN model](https://github.com/jik876/hifi-gan) and put it under directory 'hifigan/' (for training with SR only)\r\n\r\n## Inference Example\r\n\r\nDownload the pretrained checkpoints and run:\r\n\r\n```python\r\n# inference with FreeVC\r\nCUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc\r\n\r\n# inference with FreeVC-s\r\nCUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s\r\n```\r\n\r\n## Training Example\r\n\r\n1. Preprocess\r\n\r\n```python\r\npython downsample.py --in_dir \u003c/path/to/VCTK/wavs\u003e\r\nln -s dataset/vctk-16k DUMMY\r\n\r\n# run this if you want a different train-val-test split\r\npython preprocess_flist.py\r\n\r\n# run this if you want to use pretrained speaker encoder\r\nCUDA_VISIBLE_DEVICES=0 python preprocess_spk.py\r\n\r\n# run this if you want to train without SR-based augmentation\r\nCUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py\r\n\r\n# run these if you want to train with SR-based augmentation\r\nCUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72\r\nCUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76\r\nCUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80\r\nCUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84\r\nCUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88\r\nCUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92\r\n```\r\n\r\n2. Train\r\n\r\n```python\r\n# train freevc\r\nCUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc\r\n\r\n# train freevc-s\r\nCUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s\r\n```\r\n\r\n## References\r\n\r\n- https://github.com/jaywalnut310/vits\r\n- https://github.com/microsoft/unilm/tree/master/wavlm\r\n- https://github.com/jik876/hifi-gan\r\n- https://github.com/liusongxiang/ppg-vc\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOlaWod%2FFreeVC","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOlaWod%2FFreeVC","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOlaWod%2FFreeVC/lists"}