https://github.com/NATSpeech/NATSpeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
https://github.com/NATSpeech/NATSpeech

diffsinger diffspeech huggingface portaspeech pytorch speech speech-synthesis tts

Last synced: 3 months ago
JSON representation

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Host: GitHub
URL: https://github.com/NATSpeech/NATSpeech
Owner: NATSpeech
License: mit
Created: 2022-02-13T03:44:45.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-04-02T00:55:24.000Z (over 2 years ago)
Last Synced: 2024-08-08T23:21:12.181Z (about 1 year ago)
Topics: diffsinger, diffspeech, huggingface, portaspeech, pytorch, speech, speech-synthesis, tts
Language: Python
Homepage:
Size: 173 KB
Stars: 962
Watchers: 20
Forks: 99
Open Issues: 20
Metadata Files:
- Readme: README-zh.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - NATSpeech/NATSpeech - TTS）框架，包括 PortaSpeech （NeurIPS 2021）和 DiffSpeech （AAAI 2022）的官方 PyTorch 实现。PortaSpeech：便携式和高质量的生成文本到语音转换（NeurIPS 2021）。DiffSinger：通过浅扩散机制合成歌唱声音（DiffSpeech）（AAAI 2022）。 (语音合成 / 网络服务_其他)

README

          


    


    

    






 NATSpeech: A Non-Autoregressive Text-to-Speech Framework




[![](https://img.shields.io/github/stars/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)

[![](https://img.shields.io/github/forks/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech)

[![](https://img.shields.io/github/license/NATSpeech/NATSpeech)](https://github.com/NATSpeech/NATSpeech/blob/main/LICENSE)

[![](https://img.shields.io/github/downloads/NATSpeech/NATSpeech/total?label=pretrained+model+downloads)](https://github.com/NATSpeech/NATSpeech/releases/tag/pretrained_models) | [English README](./README.md)



本仓库包含了以下工作的官方PyTorch实现：

- [PortaSpeech: Portable and High-Quality Generative Text-to-Speech](https://proceedings.neurips.cc/paper/2021/file/748d6b6ed8e13f857ceaa6cfbdca14b8-Paper.pdf) (NeurIPS 2021)  

[Demo页面](https://portaspeech.github.io/) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/PortaSpeech)

- [DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism](https://arxiv.org/abs/2105.02446) (DiffSpeech) (AAAI 2022)  

[Demo页面](https://diffsinger.github.io/) | [项目主页](https://github.com/MoonInTheRiver/DiffSinger) | [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/DiffSpeech)

## 主要特点 

我们在本框架中实现了以下特点：

- 基于[Montreal Forced Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)的非自回归语音合成数据处理流程；

- 便于使用和可扩展的训练和测试框架；

- 简单但有效的随机访问数据集类的实现。

## 安装依赖

```bash

## 在 Linux/Ubuntu 18.04 上通过测试 

## 首先需要安装 Python 3.6+ (推荐使用Anaconda)

export PYTHONPATH=.

# 创建虚拟环境 (推荐).

python -m venv venv

source venv/bin/activate

# 安装依赖

pip install -U pip

pip install Cython numpy==1.19.1

pip install torch==1.9.0 # 推荐 torch >= 1.9.0

pip install -r requirements.txt

sudo apt install -y sox libsox-fmt-mp3

bash mfa_usr/install_mfa.sh # 安装强制对齐工具

```

## 文档

- [关于本框架](./docs/zh/framework.md)

- [运行PortaSpeech](./docs/portaspeech.md)

- [运行DiffSpeech](./docs/diffspeech.md)

## 引用

如果本REPO对你的研究和工作有用，请引用以下论文：

- PortaSpeech

```bib

@article{ren2021portaspeech,

  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},

  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},

  journal={Advances in Neural Information Processing Systems},

  volume={34},

  year={2021}

}

```

- DiffSpeech

```bib

@article{liu2021diffsinger,

  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},

  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},

  journal={arXiv preprint arXiv:2105.02446},

  volume={2},

  year={2021}

 }

```

## 致谢

我们的代码受以下代码和仓库启发：

- [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)

- [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)

- [Hifi-GAN](https://github.com/jik876/hifi-gan)

- [espnet](https://github.com/espnet/espnet)

- [Glow-TTS](https://github.com/jaywalnut310/glow-tts)

- [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NATSpeech/NATSpeech

Awesome Lists containing this project

README