{"id":13545334,"url":"https://github.com/openvpi/DiffSinger","last_synced_at":"2025-04-02T15:31:04.071Z","repository":{"id":52377583,"uuid":"520929785","full_name":"openvpi/DiffSinger","owner":"openvpi","description":"An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism","archived":false,"fork":true,"pushed_at":"2025-03-29T17:12:58.000Z","size":69128,"stargazers_count":2818,"open_issues_count":15,"forks_count":293,"subscribers_count":36,"default_branch":"main","last_synced_at":"2025-03-29T18:23:24.068Z","etag":null,"topics":["acoustic-model","diffusion","diffussion-model","melody-frontend","midi","pitch-prediction","rectified-flow","singing-voice","singing-voice-synthesis","svs"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"MoonInTheRiver/DiffSinger","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openvpi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-08-03T15:11:21.000Z","updated_at":"2025-03-29T17:13:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"5e1ebbcb-54b5-452b-8cca-2b31d7a244bc","html_url":"https://github.com/openvpi/DiffSinger","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openvpi%2FDiffSinger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openvpi%2FDiffSinger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openvpi%2FDiffSinger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openvpi%2FDiffSinger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openvpi","download_url":"https://codeload.github.com/openvpi/DiffSinger/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246841627,"owners_count":20842623,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acoustic-model","diffusion","diffussion-model","melody-frontend","midi","pitch-prediction","rectified-flow","singing-voice","singing-voice-synthesis","svs"],"created_at":"2024-08-01T11:01:01.078Z","updated_at":"2025-04-02T15:31:04.065Z","avatar_url":"https://github.com/openvpi.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DiffSinger (OpenVPI maintained version)\n\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2105.02446)\n[![downloads](https://img.shields.io/github/downloads/openvpi/DiffSinger/total.svg)](https://github.com/openvpi/DiffSinger/releases)\n[![Bilibili](https://img.shields.io/badge/Bilibili-Demo-blue)](https://www.bilibili.com/video/BV1be411N7JA/)\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/openvpi/DiffSinger/blob/main/LICENSE)\n\nThis is a refactored and enhanced version of _DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism_ based on the original [paper](https://arxiv.org/abs/2105.02446) and [implementation](https://github.com/MoonInTheRiver/DiffSinger), which provides:\n\n- Cleaner code structure: useless and redundant files are removed and the others are re-organized.\n- Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.\n- Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.\n- More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.\n- Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.\n\n|                                       Overview                                        |                                    Variance Model                                     |                                    Acoustic Model                                     |\n|:-------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------:|\n| \u003cimg src=\"docs/resources/arch-overview.jpg\" alt=\"arch-overview\" style=\"zoom: 60%;\" /\u003e | \u003cimg src=\"docs/resources/arch-variance.jpg\" alt=\"arch-variance\" style=\"zoom: 50%;\" /\u003e | \u003cimg src=\"docs/resources/arch-acoustic.jpg\" alt=\"arch-acoustic\" style=\"zoom: 60%;\" /\u003e |\n\n## User Guidance\n\n\u003e 中文教程 / Chinese Tutorials: [Text](https://openvpi-docs.feishu.cn/wiki/KmBFwoYDEixrS4kHcTAcajPinPe), [Video](https://space.bilibili.com/179281251/channel/collectiondetail?sid=1747910)\n\n- **Installation \u0026 basic usages**: See [Getting Started](docs/GettingStarted.md)\n- **Dataset creation pipelines \u0026 tools**: See [MakeDiffSinger](https://github.com/openvpi/MakeDiffSinger)\n- **Best practices \u0026 tutorials**: See [Best Practices](docs/BestPractices.md)\n- **Editing configurations**: See [Configuration Schemas](docs/ConfigurationSchemas.md)\n- **Deployment \u0026 production**: [OpenUTAU for DiffSinger](https://github.com/xunmengshe/OpenUtau), [DiffScope (under development)](https://github.com/openvpi/diffscope)\n- **Communication groups**: [QQ Group](http://qm.qq.com/cgi-bin/qm/qr?_wv=1027\u0026k=fibG_dxuPW5maUJwe9_ya5-zFcIwaoOR\u0026authKey=ZgLCG5EqQVUGCID1nfKei8tCnlQHAmD9koxebFXv5WfUchhLwWxb52o1pimNai5A\u0026noverify=0\u0026group_code=907879266) (907879266), [Discord server](https://discord.gg/wwbu2JUMjj)\n\n## Progress \u0026 Roadmap\n\n- **Progress since we forked into this repository**: See [Releases](https://github.com/openvpi/DiffSinger/releases)\n- **Roadmap for future releases**: See [Project Board](https://github.com/orgs/openvpi/projects/1)\n- **Thoughts, proposals \u0026 ideas**: See [Discussions](https://github.com/openvpi/DiffSinger/discussions)\n\n## Architecture \u0026 Algorithms\n\nTBD\n\n## Development Resources\n\nTBD\n\n## References\n\n### Original Paper \u0026 Implementation\n\n- Paper: [DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism](https://arxiv.org/abs/2105.02446)\n- Implementation: [MoonInTheRiver/DiffSinger](https://github.com/MoonInTheRiver/DiffSinger)\n\n### Generative Models \u0026 Algorithms\n\n- Denoising Diffusion Probabilistic Models (DDPM): [paper](https://arxiv.org/abs/2006.11239), [implementation](https://github.com/hojonathanho/diffusion)\n  - [DDIM](https://arxiv.org/abs/2010.02502) for diffusion sampling acceleration\n  - [PNDM](https://arxiv.org/abs/2202.09778) for diffusion sampling acceleration\n  - [DPM-Solver++](https://github.com/LuChengTHU/dpm-solver) for diffusion sampling acceleration\n  - [UniPC](https://github.com/wl-zhao/UniPC) for diffusion sampling acceleration\n- Rectified Flow (RF): [paper](https://arxiv.org/abs/2209.03003), [implementation](https://github.com/gnobitab/RectifiedFlow)\n\n### Dependencies \u0026 Submodules\n\n- [RoPE](https://github.com/lucidrains/rotary-embedding-torch) for transformer encoder\n- [HiFi-GAN](https://github.com/jik876/hifi-gan) and [NSF](https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf) for waveform reconstruction\n- [pc-ddsp](https://github.com/yxlllc/pc-ddsp) for waveform reconstruction\n- [RMVPE](https://github.com/Dream-High/RMVPE) and yxlllc's [fork](https://github.com/yxlllc/RMVPE) for pitch extraction\n- [Vocal Remover](https://github.com/tsurumeso/vocal-remover) and yxlllc's [fork](https://github.com/yxlllc/vocal-remover) for harmonic-noise separation\n\n## Disclaimer\n\nAny organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.\n\n## License\n\nThis forked DiffSinger repository is licensed under the [Apache 2.0 License](LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenvpi%2FDiffSinger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenvpi%2FDiffSinger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenvpi%2FDiffSinger/lists"}