https://github.com/billwuhao/ComfyUI_StepAudioTTS

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/billwuhao/ComfyUI_StepAudioTTS
Owner: billwuhao
License: apache-2.0
Created: 2025-02-20T20:25:01.000Z (4 months ago)
Default Branch: master
Last Pushed: 2025-02-20T21:37:33.000Z (4 months ago)
Last Synced: 2025-02-20T22:25:13.671Z (4 months ago)
Language: Python
Size: 1.57 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README-en.md
- License: LICENSE

Awesome Lists containing this project

awesome-comfyui - **ComfyUI_StepAudioTTS** - Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice. (All Workflows Sorted by GitHub Stars)

README

        [中文](README.md) | English

# A Text To Speech node using Step-Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice.

![](https://github.com/billwuhao/ComfyUI_StepAudioTTS/blob/master/assets/2025-02-21_05-34-25.png)

assets/2025-02-21_05-34-25.png

## Model Download

Download to the `ComfyUI\models\TTS` folder

### Huggingface

| Models   | Links   |

|-------|-------|

| Step-Audio-Tokenizer | [🤗huggingface](https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer) |

| Step-Audio-TTS-3B | [🤗huggingface](https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B) |

### Modelscope

| Models   | Links   |

|-------|-------|

| Step-Audio-Tokenizer | [modelscope](https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer) |

| Step-Audio-TTS-3B | [modelscope](https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B) |

Where_you_download_dir should have the following structure:

```

ComfyUI\models\TTS

├── Step-Audio-Tokenizer

├── Step-Audio-TTS-3B

```

### Welcome to contribute more voices

The audio file is named as `{Speaker}_prompt_.WAV`, For example, `明文_prompt.WAV`. I will add them to the code. Thus, there is no need for cloning.

The currently supported voices are in the `Step-Audio-speakers` folder. Welcome to PR more voices.

## Supports Chinese, English, Korean, Japanese, Sichuanese, Cantonese etc.

## Acknowledgements

Part of the code for this project comes from:

* [Step-Audio](https://github.com/stepfun-ai/Step-Audio)

* [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)

* [transformers](https://github.com/huggingface/transformers)

* [FunASR](https://github.com/modelscope/FunASR)

Thank you to all the open-source projects for their contributions to this project!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/billwuhao/ComfyUI_StepAudioTTS

Awesome Lists containing this project

README