{"id":13935574,"url":"https://github.com/NATSpeech/NATSpeech","last_synced_at":"2025-07-19T20:33:12.082Z","repository":{"id":37722757,"uuid":"458701673","full_name":"NATSpeech/NATSpeech","owner":"NATSpeech","description":"A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)","archived":false,"fork":false,"pushed_at":"2023-04-02T00:55:24.000Z","size":177,"stargazers_count":962,"open_issues_count":20,"forks_count":99,"subscribers_count":20,"default_branch":"main","last_synced_at":"2024-08-08T23:21:12.181Z","etag":null,"topics":["diffsinger","diffspeech","huggingface","portaspeech","pytorch","speech","speech-synthesis","tts"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NATSpeech.png","metadata":{"files":{"readme":"README-zh.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-13T03:44:45.000Z","updated_at":"2024-08-05T11:36:33.000Z","dependencies_parsed_at":"2022-07-06T11:54:55.730Z","dependency_job_id":null,"html_url":"https://github.com/NATSpeech/NATSpeech","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NATSpeech%2FNATSpeech","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NATSpeech%2FNATSpeech/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NATSpeech%2FNATSpeech/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NATSpeech%2FNATSpeech/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NATSpeech","download_url":"https://codeload.github.com/NATSpeech/NATSpeech/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677114,"owners_count":17666008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffsinger","diffspeech","huggingface","portaspeech","pytorch","speech","speech-synthesis","tts"],"created_at":"2024-08-07T23:01:53.849Z","updated_at":"2024-11-27T03:30:49.329Z","avatar_url":"https://github.com/NATSpeech.png","language":"Python","funding_links":[],"categories":["Python","语音合成"],"sub_categories":["网络服务_其他"],"readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"assets/logo.png\" width=\"200\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n\u003ch2 align=\"center\"\u003e\n\u003cp\u003e NATSpeech: A Non-Autoregressive Text-to-Speech Framework\u003c/p\u003e\n\u003c/h2\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![](https://img.shields.io/github/stars/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)\n[![](https://img.shields.io/github/forks/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)\n[![](https://img.shields.io/github/license/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech/blob/main/LICENSE)\n[![](https://img.shields.io/github/downloads/NATSpeech/NATSpeech/total?label=pretrained+model+downloads)](https://github.com/NATSpeech/NATSpeech/releases/tag/pretrained_models) | [English README](./README.md)\n\n\u003c/div\u003e\n\n\n本仓库包含了以下工作的官方PyTorch实现：\n\n- [PortaSpeech: Portable and High-Quality Generative Text-to-Speech](https://proceedings.neurips.cc/paper/2021/file/748d6b6ed8e13f857ceaa6cfbdca14b8-Paper.pdf) (NeurIPS 2021)  \n[Demo页面](https://portaspeech.github.io/) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/PortaSpeech)\n- [DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism](https://arxiv.org/abs/2105.02446) (DiffSpeech) (AAAI 2022)  \n[Demo页面](https://diffsinger.github.io/) | [项目主页](https://github.com/MoonInTheRiver/DiffSinger) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/DiffSpeech)\n\n## 主要特点 \n我们在本框架中实现了以下特点：\n\n- 基于[Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)的非自回归语音合成数据处理流程；\n- 便于使用和可扩展的训练和测试框架；\n- 简单但有效的随机访问数据集类的实现。\n\n## 安装依赖\n\n```bash\n## 在 Linux/Ubuntu 18.04 上通过测试 \n## 首先需要安装 Python 3.6+ (推荐使用Anaconda)\n\nexport PYTHONPATH=.\n# 创建虚拟环境 (推荐).\npython -m venv venv\nsource venv/bin/activate\n# 安装依赖\npip install -U pip\npip install Cython numpy==1.19.1\npip install torch==1.9.0 # 推荐 torch \u003e= 1.9.0\npip install -r requirements.txt\nsudo apt install -y sox libsox-fmt-mp3\nbash mfa_usr/install_mfa.sh # 安装强制对齐工具\n```\n\n## 文档\n\n- [关于本框架](./docs/zh/framework.md)\n- [运行PortaSpeech](./docs/portaspeech.md)\n- [运行DiffSpeech](./docs/diffspeech.md)\n\n## 引用\n\n如果本REPO对你的研究和工作有用，请引用以下论文：\n\n- PortaSpeech\n\n```bib\n@article{ren2021portaspeech,\n  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},\n  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},\n  journal={Advances in Neural Information Processing Systems},\n  volume={34},\n  year={2021}\n}\n```\n\n- DiffSpeech\n\n```bib\n@article{liu2021diffsinger,\n  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},\n  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},\n  journal={arXiv preprint arXiv:2105.02446},\n  volume={2},\n  year={2021}\n }\n```\n\n## 致谢\n\n我们的代码受以下代码和仓库启发：\n\n- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)\n- [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)\n- [Hifi-GAN](https://github.com/jik876/hifi-gan)\n- [espnet](https://github.com/espnet/espnet)\n- [Glow-TTS](https://github.com/jaywalnut310/glow-tts)\n- [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNATSpeech%2FNATSpeech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNATSpeech%2FNATSpeech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNATSpeech%2FNATSpeech/lists"}