{"id":13545338,"url":"https://github.com/MoonInTheRiver/DiffSinger","last_synced_at":"2025-04-02T15:31:07.761Z","repository":{"id":38412948,"uuid":"439211693","full_name":"MoonInTheRiver/DiffSinger","owner":"MoonInTheRiver","description":"DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS \u0026 TTS); AAAI 2022; Official code","archived":false,"fork":false,"pushed_at":"2025-03-19T06:51:33.000Z","size":64912,"stargazers_count":4433,"open_issues_count":52,"forks_count":729,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-03-30T09:04:34.013Z","etag":null,"topics":["aaai2022","diffusion-model","diffusion-speedup","midi","singing-synthesis","singing-voice","singing-voice-database","singing-voice-synthesis","speech-synthesis","text-to-speech","tts"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MoonInTheRiver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":"RayeRen","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2021-12-17T04:38:03.000Z","updated_at":"2025-03-29T13:50:21.000Z","dependencies_parsed_at":"2024-01-14T06:54:10.046Z","dependency_job_id":"a27260b7-7b07-468d-9c86-92045c46bf2c","html_url":"https://github.com/MoonInTheRiver/DiffSinger","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoonInTheRiver%2FDiffSinger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoonInTheRiver%2FDiffSinger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoonInTheRiver%2FDiffSinger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoonInTheRiver%2FDiffSinger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MoonInTheRiver","download_url":"https://codeload.github.com/MoonInTheRiver/DiffSinger/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246841637,"owners_count":20842625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aaai2022","diffusion-model","diffusion-speedup","midi","singing-synthesis","singing-voice","singing-voice-database","singing-voice-synthesis","speech-synthesis","text-to-speech","tts"],"created_at":"2024-08-01T11:01:01.145Z","updated_at":"2025-04-02T15:31:04.349Z","avatar_url":"https://github.com/MoonInTheRiver.png","language":"Python","readme":"# DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2105.02446)\n[![GitHub Stars](https://img.shields.io/github/stars/MoonInTheRiver/DiffSinger?style=social)](https://github.com/MoonInTheRiver/DiffSinger)\n[![downloads](https://img.shields.io/github/downloads/MoonInTheRiver/DiffSinger/total.svg)](https://github.com/MoonInTheRiver/DiffSinger/releases)\n[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue?label=TTSDemo)](https://huggingface.co/spaces/NATSpeech/DiffSpeech) \n[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue?label=SVSDemo)](https://huggingface.co/spaces/Silentlin/DiffSinger)\n\n\nThis repository is the official PyTorch implementation of our AAAI-2022 [paper](https://arxiv.org/abs/2105.02446), in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech).\n \n\n:tada: :tada: :tada: **Updates**:\n - Sep.11, 2022: :electric_plug: [DiffSinger-PN](docs/README-SVS-opencpop-pndm.md). Add plug-in [PNDM](https://arxiv.org/abs/2202.09778), ICLR 2022 in our laboratory, to accelerate DiffSinger freely.\n - Jul.27, 2022: Update documents for [SVS](docs/README-SVS.md). Add easy inference [A](docs/README-SVS-opencpop-cascade.md#4-inference-from-raw-inputs) \u0026 [B](docs/README-SVS-opencpop-e2e.md#4-inference-from-raw-inputs); Add Interactive SVS running on [HuggingFace🤗 SVS](https://huggingface.co/spaces/Silentlin/DiffSinger).\n - Mar.2, 2022: MIDI-B-version.\n - Mar.1, 2022: [NeuralSVB](https://github.com/MoonInTheRiver/NeuralSVB), for singing voice beautifying, has been released.\n - Feb.13, 2022: [NATSpeech](https://github.com/NATSpeech/NATSpeech), the improved code framework, which contains the implementations of DiffSpeech and our NeurIPS-2021 work [PortaSpeech](https://openreview.net/forum?id=xmJsuh8xlq) has been released. \n - Jan.29, 2022: support MIDI-A-version SVS.\n - Jan.13, 2022: support SVS, release PopCS dataset.\n - Dec.19, 2021: support TTS. [HuggingFace🤗 TTS](https://huggingface.co/spaces/NATSpeech/DiffSpeech)\n \n:rocket: **News**: \n - Feb.24, 2022: Our new work, NeuralSVB was accepted by ACL-2022 [![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2202.13277). [Demo Page](https://neuralsvb.github.io).\n - Dec.01, 2021: DiffSinger was accepted by AAAI-2022.\n - Sep.29, 2021: Our recent work `PortaSpeech: Portable and High-Quality Generative Text-to-Speech` was accepted by NeurIPS-2021 [![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2109.15166) .\n - May.06, 2021: We submitted DiffSinger to Arxiv [![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2105.02446).\n\n## Environments\n1. If you want to use env of anaconda:\n    ```sh\n    conda create -n your_env_name python=3.8\n    source activate your_env_name \n    pip install -r requirements_2080.txt   (GPU 2080Ti, CUDA 10.2)\n    or pip install -r requirements_3090.txt   (GPU 3090, CUDA 11.4)\n    ```\n\n2. Or, if you want to use virtual env of python:\n    ```sh\n    ## Install Python 3.8 first. \n    python -m venv venv\n    source venv/bin/activate\n    # install requirements.\n    pip install -U pip\n    pip install Cython numpy==1.19.1\n    pip install torch==1.9.0\n    pip install -r requirements.txt\n    ```\n\n## Documents\n- [Run DiffSpeech (TTS version)](docs/README-TTS.md).\n- [Run DiffSinger (SVS version)](docs/README-SVS.md).\n\n## Overview\n| Mel Pipeline                                                                                | Dataset                                                  | Pitch Input       | F0 Prediction |   Acceleration Method       | Vocoder                       |\n| ------------------------------------------------------------------------------------------- | ---------------------------------------------------------| ----------------- | ------------- | --------------------------- | ----------------------------- |\n| [DiffSpeech (Text-\u003eF0, Text+F0-\u003eMel, Mel-\u003eWav)](docs/README-TTS.md)                         | [Ljspeech](https://keithito.com/LJ-Speech-Dataset/)      | None              | Explicit      | Shallow Diffusion           | HiFiGAN                       |\n| [DiffSinger (Lyric+F0-\u003eMel, Mel-\u003eWav)](docs/README-SVS-popcs.md)                            | [PopCS](https://github.com/MoonInTheRiver/DiffSinger)    | Ground-Truth F0   | None          | Shallow Diffusion           | NSF-HiFiGAN                   |\n| [DiffSinger (Lyric+MIDI-\u003eF0, Lyric+F0-\u003eMel, Mel-\u003eWav)](docs/README-SVS-opencpop-cascade.md) | [OpenCpop](https://wenet.org.cn/opencpop/)               | MIDI              | Explicit      | Shallow Diffusion           | NSF-HiFiGAN                   |\n| [FFT-Singer (Lyric+MIDI-\u003eF0, Lyric+F0-\u003eMel, Mel-\u003eWav)](docs/README-SVS-opencpop-cascade.md) | [OpenCpop](https://wenet.org.cn/opencpop/)               | MIDI              | Explicit      | Invalid                     | NSF-HiFiGAN                   |\n| [DiffSinger (Lyric+MIDI-\u003eMel, Mel-\u003eWav)](docs/README-SVS-opencpop-e2e.md)                   | [OpenCpop](https://wenet.org.cn/opencpop/)               | MIDI              | Implicit      | None                        | Pitch-Extractor + NSF-HiFiGAN |\n| [DiffSinger+PNDM (Lyric+MIDI-\u003eMel, Mel-\u003eWav)](docs/README-SVS-opencpop-pndm.md)             | [OpenCpop](https://wenet.org.cn/opencpop/)               | MIDI              | Implicit      | PLMS                        | Pitch-Extractor + NSF-HiFiGAN |\n| [DiffSpeech+PNDM (Text-\u003eMel, Mel-\u003eWav)](docs/README-TTS-pndm.md)                   | [Ljspeech](https://keithito.com/LJ-Speech-Dataset/)      | None              | Implicit      | PLMS                        | HiFiGAN                       |\n\n\n## Tensorboard\n```sh\ntensorboard --logdir_spec exp_name\n```\n\u003ctable style=\"width:100%\"\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"resources/tfb.png\" alt=\"Tensorboard\" height=\"250\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Citation\n    @article{liu2021diffsinger,\n      title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},\n      author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},\n      journal={arXiv preprint arXiv:2105.02446},\n      volume={2},\n      year={2021}}\n\n\n## Acknowledgements\n* lucidrains' [denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch)\n* Official [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)\n* kan-bayashi's [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)\n* jik876's [HifiGAN](https://github.com/jik876/hifi-gan)\n* Official [espnet](https://github.com/espnet/espnet)\n* lmnt-com's [DiffWave](https://github.com/lmnt-com/diffwave)\n* keonlee9420's [Implementation](https://github.com/keonlee9420/DiffSinger). \n\nEspecially thanks to:\n\n* Team Openvpi's maintenance: [DiffSinger](https://github.com/openvpi/DiffSinger).\n* Your re-creation and sharing.\n    ","funding_links":["https://github.com/sponsors/RayeRen"],"categories":["Python","\u003cspan id=\"voice\"\u003eSinging Voice\u003c/span\u003e","语音合成"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMoonInTheRiver%2FDiffSinger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMoonInTheRiver%2FDiffSinger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMoonInTheRiver%2FDiffSinger/lists"}