{"id":19684156,"url":"https://github.com/maxmax2016/glow-svc","last_synced_at":"2026-03-09T13:05:31.998Z","repository":{"id":184134145,"uuid":"671322182","full_name":"MaxMax2016/Glow-SVC","owner":"MaxMax2016","description":"4G GPU \u0026 10 Minutes for train","archived":false,"fork":false,"pushed_at":"2023-08-09T03:04:13.000Z","size":77,"stargazers_count":12,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-29T05:35:36.527Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaxMax2016.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-27T04:06:06.000Z","updated_at":"2024-07-29T07:53:43.000Z","dependencies_parsed_at":"2023-08-09T04:29:56.549Z","dependency_job_id":null,"html_url":"https://github.com/MaxMax2016/Glow-SVC","commit_stats":null,"previous_names":["playvoice/so-vits-svc-6.0","playvoice/no-name-svc","yuchendd/glow-svc","maxmax2016/glow-svc"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/MaxMax2016/Glow-SVC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxMax2016%2FGlow-SVC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxMax2016%2FGlow-SVC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxMax2016%2FGlow-SVC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxMax2016%2FGlow-SVC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaxMax2016","download_url":"https://codeload.github.com/MaxMax2016/Glow-SVC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxMax2016%2FGlow-SVC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30297111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T11:12:22.024Z","status":"ssl_error","status_checked_at":"2026-03-09T11:10:54.577Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T18:16:59.376Z","updated_at":"2026-03-09T13:05:31.977Z","avatar_url":"https://github.com/MaxMax2016.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e Max's Singing Voice Conversion, Just for Playing! \u003c/h1\u003e\nAs this name show, this is a personal project. [WIP]\n\u003c/div\u003e\n\n## Setup Environment\n1. Install project dependencies\n\n    ```shell\n    pip install -r requirements.txt\n    ```\n\n2. Download the Timbre Encoder: [Speaker-Encoder by @mueller91](https://drive.google.com/drive/folders/15oeBYf6Qn1edONkVLXe82MzdIi3O_9m3), put `best_model.pth.tar`  into `speaker_pretrain/`.\n\n3. Download [hubert_soft model](https://github.com/bshall/hubert/releases/tag/v0.1)，put `hubert-soft-0d54a1f4.pt` into `hubert_pretrain/`.\n\n4. Download pretrain model, and put it into `vits_pretrain/`.\n    ```shell\n    python svc_inference.py --config configs/base.yaml --model ./vits_pretrain/svc.pretrain.pth --spk ./configs/singers/singer0001.npy --wave test.wav\n    ```\n\n## Dataset preparation\nPut the dataset into the `data_raw` directory following the structure below.\n```\ndata_raw\n├───speaker0\n│   ├───000001.wav\n│   ├───...\n│   └───000xxx.wav\n└───speaker1\n    ├───000001.wav\n    ├───...\n    └───000xxx.wav\n```\n\n## Data preprocessing\nAfter preprocessing you will get an output with following structure.\n```\ndata_svc/\n└── waves-16k\n│    └── speaker0\n│    │      ├── 000001.wav\n│    │      └── 000xxx.wav\n│    └── speaker1\n│           ├── 000001.wav\n│           └── 000xxx.wav\n└── waves-32k\n│    └── speaker0\n│    │      ├── 000001.wav\n│    │      └── 000xxx.wav\n│    └── speaker1\n│           ├── 000001.wav\n│           └── 000xxx.wav\n└── mel\n│    └── speaker0\n│    │      ├── 000001.mel.pt\n│    │      └── 000xxx.mel.pt\n│    └── speaker1\n│           ├── 000001.mel.pt\n│           └── 000xxx.mel.pt\n└── pitch\n│    └── speaker0\n│    │      ├── 000001.pit.npy\n│    │      └── 000xxx.pit.npy\n│    └── speaker1\n│           ├── 000001.pit.npy\n│           └── 000xxx.pit.npy\n└── hubert\n│    └── speaker0\n│    │      ├── 000001.vec.npy\n│    │      └── 000xxx.vec.npy\n│    └── speaker1\n│           ├── 000001.vec.npy\n│           └── 000xxx.vec.npy\n└── speaker\n│    └── speaker0\n│    │      ├── 000001.spk.npy\n│    │      └── 000xxx.spk.npy\n│    └── speaker1\n│           ├── 000001.spk.npy\n│           └── 000xxx.spk.npy\n└── singer\n    ├── speaker0.spk.npy\n    └── speaker1.spk.npy\n```\n\n1.  Re-sampling\n    - Generate audio with a sampling rate of 16000Hz in `./data_svc/waves-16k` \n    ```\n    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-16k -s 16000\n    ```\n    \n    - Generate audio with a sampling rate of 32000Hz in `./data_svc/waves-32k`\n    ```\n    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-32k -s 32000\n    ```\n2. Use 16K audio to extract pitch\n    ```\n    python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch\n    ```\n3. use 32k audio to extract mel\n    ```\n    python prepare/preprocess_spec.py -w data_svc/waves-32k/ -s data_svc/mel\n    ``` \n4. Use 16K audio to extract hubert\n    ```\n    python prepare/preprocess_hubert.py -w data_svc/waves-16k/ -v data_svc/hubert\n    ```\n5. Use 16k audio to extract timbre code\n    ```\n    python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker\n    ```\n6. Extract the average value of the timbre code for inference\n    ```\n    python prepare/preprocess_speaker_ave.py data_svc/speaker/ data_svc/singer\n    ``` \n8. Use 32k audio to generate training index\n    ```\n    python prepare/preprocess_train.py\n    ```\n9. Training file debugging\n    ```\n    python prepare/preprocess_zzz.py\n    ```\n\n## Train\n1. Start training\n   ```\n   python svc_trainer.py -c configs/base.yaml -n svc\n   ``` \n2. Resume training\n   ```\n   python svc_trainer.py -c configs/base.yaml -n svc -p chkpt/svc/***.pth\n   ```\n3. Log visualization\n   ```\n   tensorboard --logdir logs/\n   ```\n\n## Loss\nmel_loss should be less than 0.45\n\n## Inference\n\n1. Export inference model\n   ```\n   python svc_export.py --config configs/base.yaml --checkpoint_path chkpt/svc/***.pt\n   ```\n\n2. Inference\n    - if there is no need to adjust `f0`, just run the following command.\n        ```\n        python svc_inference.py --config configs/base.yaml --model svc.pth --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav --shift 0\n        ```\n    - if `f0` will be adjusted manually, follow the steps:\n\n        1. use hubert to extract content vector\n            ```\n            python hubert/inference.py -w test.wav -v test.vec.npy\n            ```\n        2. extract the F0 parameter to the csv text format\n            ```\n            python pitch/inference.py -w test.wav -p test.csv\n            ```\n        3. final inference\n            ```\n            python svc_inference.py --config configs/base.yaml --model svc.pth --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav --vec test.vec.npy --pit test.csv --shift 0\n            ```\n\n3. Convert mel to wave\n    ```\n    python svc_inference_wave.py --mel svc_out.mel.pt --pit svc_tmp.pit.csv\n    ```\n\n4. Debug mel for wave\n   \n    ```\n    python spec/inference.py -w test.wav -m test.mel.pt\n    ```\n\n## Code sources and references\n\nhttps://github.com/facebookresearch/speech-resynthesis [paper](https://arxiv.org/abs/2104.00355)\n\nhttps://github.com/jaywalnut310/vits [paper](https://arxiv.org/abs/2106.06103)\n\nhttps://github.com/NVIDIA/BigVGAN [paper](https://arxiv.org/abs/2206.04658)\n\nhttps://github.com/mindslab-ai/univnet [paper](https://arxiv.org/abs/2106.07889)\n\nhttps://github.com/mozilla/TTS\n\nhttps://github.com/bshall/soft-vc\n\nhttps://github.com/maxrmorrison/torchcrepe\n\n[SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech](https://github.com/hcy71o/SNAC)\n\n[Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers](https://arxiv.org/abs/2211.00585)\n\n[AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/pdf/2103.00993.pdf)\n\n[Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis](https://github.com/ubisoft/ubisoft-laforge-daft-exprt)\n\n[Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings](https://arxiv.org/abs/2305.05401)\n\n[Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion](https://arxiv.org/pdf/2305.09167.pdf)\n\n[Speaker normalization (GRL) for self-supervised speech emotion recognition](https://arxiv.org/abs/2202.01252)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxmax2016%2Fglow-svc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxmax2016%2Fglow-svc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxmax2016%2Fglow-svc/lists"}