{"id":13724188,"url":"https://github.com/PlayVoice/VI-SVS","last_synced_at":"2025-05-07T17:33:38.694Z","repository":{"id":37545811,"uuid":"469620305","full_name":"PlayVoice/VI-SVS","owner":"PlayVoice","description":"Singing Voice Synthesis based on VITS, different from VISinger","archived":false,"fork":false,"pushed_at":"2023-11-13T03:37:39.000Z","size":2241,"stargazers_count":187,"open_issues_count":4,"forks_count":31,"subscribers_count":8,"default_branch":"VISinger","last_synced_at":"2024-11-13T04:35:06.865Z","etag":null,"topics":["diffsinger","opencpop","singing-synthesis","singing-voice-synthesis","speech-synthesis","svs","visinger","vits","vits-svs"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PlayVoice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-03-14T07:15:21.000Z","updated_at":"2024-10-14T20:56:21.000Z","dependencies_parsed_at":"2023-11-09T11:48:09.240Z","dependency_job_id":null,"html_url":"https://github.com/PlayVoice/VI-SVS","commit_stats":null,"previous_names":["playvoice/x-sing","yuchendd/visinger-chinese","maxmax2016/visinger-chinese","maxmax2016/vi-svs","playvoice/vi-svs"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayVoice%2FVI-SVS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayVoice%2FVI-SVS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayVoice%2FVI-SVS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayVoice%2FVI-SVS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PlayVoice","download_url":"https://codeload.github.com/PlayVoice/VI-SVS/tar.gz/refs/heads/VISinger","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224628440,"owners_count":17343339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffsinger","opencpop","singing-synthesis","singing-voice-synthesis","speech-synthesis","svs","visinger","vits","vits-svs"],"created_at":"2024-08-03T01:01:51.721Z","updated_at":"2025-05-07T17:33:38.681Z","avatar_url":"https://github.com/PlayVoice.png","language":"Python","funding_links":[],"categories":["\u003cspan id=\"voice\"\u003eSinging Voice\u003c/span\u003e","语音合成"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","网络服务_其他"],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e Variational Inference with adversarial learning for end-to-end Singing Voice Synthesis \u003c/h1\u003e\n\nDifferent from VISinger, It is just VITS without MAS and DurationPredictor. \n\n作为一个用于学习的项目，就这样了：Pitch的预测是需要改进的地方\n\n![VISinger](https://github.com/MaxMax2016/VI-SVS/assets/16432329/c76ca716-b230-4852-b8f0-2c3041af7072)\n\n![VI-SVS](https://github.com/MaxMax2016/VI-SVS/assets/16432329/128c0f33-4428-4b57-9cd3-b6237f53c1a4)\n\n\u003c/div\u003e\n\n**Pitch and Duration will be developed as add-on!**\n\n# 训练步骤\n\n- 1 下载数据 segments.zip，并解压\n\n```\nsegments\n|-- test.txt\n|-- train.txt\n|-- transcriptions.txt\n`-- wavs\n    |-- 2001000001.wav\n    |-- 2001000002.wav\n    |-- 2001000003.wav\n```\n\n- 2 转换采样率: 本项目采用32KHz\n```\npython util/resample.py -w segments/wavs/ -o data_svs/wavs -s 32000\n```\n\n- 3 生成数据标注\n```\npython util/generate_label.py --config configs/singing_base.yaml --data data_svs/ --file segments/transcriptions.txt\n```\n\ndata_svs/labels.txt，内容格式：wave path|label path|score path|pitch path|slurs path\n\n- 3 划分训练索引\n```\npython util/generate_label.py --file data_svs/labels.txt\n```\n\n生成 filelists/singing_train.txt 和 filelists/singing_valid.txt\n\n- 4 启动训练\n```\npython svs_train.py -c configs/singing_base.yaml -n vits_svs\n```\n\n- 5 训练Pitch\n```\npython pit_train.py -c configs/singing_base.yaml -n pitch\n```\n\n# 推理验证\n\n- 0 模型导出\n```\npython svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt\n```\n\n- 1 推理验证: F0根据乐谱生成\n```\npython svs_infer.py --config configs/singing_base.yaml --model svs_opencpop.pt\n```\n\n- 2 完整歌曲合成（[使用release模型](https://github.com/PlayVoice/VI-SVS/releases/tag/0.0.3)）\n```\npython svs_song.py --config configs/singing_base.yaml --model svs_opencpop.pt\n```\n\n# 推理验证，使用Pitch预测，效果不佳\n\n- 0 模型导出\n```\npython svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt\n```\n\n```\npython pit_export.py --config configs/singing_base.yaml --model chkpt/pitch/pitch_****.pt\n```\n\n- 1 推理验证\n```\npython svs_infer_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt\n```\n\n- 2 完整歌曲合成（[使用release模型](https://github.com/PlayVoice/VI-SVS/releases/tag/0.0.3)）\n```\npython svs_song_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt\n```\n\n# 数据\n\nhttps://wenet.org.cn/opencpop/\n\n# 歌声合成参考\n\nhttps://github.com/SJTMusicTeam/Muskits\n\nhttps://github.com/MoonInTheRiver/DiffSinger\n\n[VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis](https://arxiv.org/abs/2110.08813)\n\n# 模型设计参考\n\nhttps://github.com/NVIDIA/BigVGAN\n\nhttps://github.com/jaywalnut310/vits\n\nhttps://github.com/mindslab-ai/univnet\n\nhttps://github.com/PlayVoice/so-vits-svc-5.0\n\nhttps://github.com/shivammehta25/Matcha-TTS\n\n[RoFormer: Enhanced Transformer with rotary position embedding](https://arxiv.org/abs/2104.09864)\n\n# Diffusion Pitch\n\nhttps://github.com/thuhcsi/DiffVar\n\nhttps://github.com/hayeong0/Diff-HierVC\n\nhttps://github.com/tonnetonne814/SiFi-VITS2-44100-Ja\n\n[Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech](https://arxiv.org/abs/2105.06337)\n\n# Diffusion Pitch of Diff-HierVC\n![DiffPitch](https://github.com/PlayVoice/VI-SVS/assets/16432329/055d75a4-7009-46c1-8603-65254cec47dd)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPlayVoice%2FVI-SVS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPlayVoice%2FVI-SVS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPlayVoice%2FVI-SVS/lists"}