https://github.com/nobrainX2/comfyUI-customDia

ComfyUI Dia text to speech
https://github.com/nobrainX2/comfyUI-customDia

Last synced: about 2 months ago
JSON representation

ComfyUI Dia text to speech

Host: GitHub
URL: https://github.com/nobrainX2/comfyUI-customDia
Owner: nobrainX2
License: apache-2.0
Created: 2025-04-24T23:58:53.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-04-25T03:10:07.000Z (about 2 months ago)
Last Synced: 2025-04-25T04:20:54.645Z (about 2 months ago)
Language: Python
Size: 380 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

awesome-comfyui - **ComfyUI Custom Dia** - labs/dia/). Many thanks to nari-labs for their fantastic work. (All Workflows Sorted by GitHub Stars)

README

# ComfyUI Custom Dia

This is a ComfyUI integration of the [Dia TTS model](https://github.com/nari-labs/dia/).
Many thanks to **nari-labs** for their fantastic work.

## Installation

Download the `.pth` and `.json` files from [Hugging Face](https://huggingface.co/nari-labs/Dia-1.6B/tree/main)
Store them in any subfolder under `/models/` — the path is not hardcoded, and the node allows you to define it manually.
(Default path: `/models/Dia/dia-v0_1.pth`)

## Modifications from the Original Repository

The original Dia API has been **slightly modified** to support **multi-channel audio inputs**.
This allows for stereo files or tensors provided directly by ComfyUI nodes.

an extra node has been added to retime the output audio. See the example for usage.

plase note that the pitch preservation option requires the **librosa** package. It's not in requirements.txt because it's optionnal.

## Usage

This is an **output node**, meaning it can be used standalone and queued without connections.
In that case, you may want to enable `save_audio_file` to automatically save the result into ComfyUI’s output folder.

To use it in a pipeline, just connect the `audio` output to any compatible node.

### Speech Prompt

- Use the `text` field to define your dialogue, e.g.:
```
[S1] Hello.
[S2] Hi there! (laughs)
```

- Use `[S1]`, `[S2]`, etc. to switch speakers.
- Insert nonverbal tags (e.g. `(laughs)`, `(sighs)`) to enrich the audio.
- A list of available tags is provided in the third (inactive) text field.

![image](https://github.com/user-attachments/assets/d4a32dd7-0426-46c6-9685-2190dc7d6993)

## Voice Cloning

You can plug an `audio` tensor as input to enable **voice cloning**.
In this case, it is strongly recommended to provide a **transcript** of the input audio in the `input_audio_transcript` field to improve results.

![image](https://github.com/user-attachments/assets/9bac4077-9a71-4ee1-a279-0773bb51a75a)

## Troubleshooting and side effects
As stated in the requirement.txt file, you will have to install 2 python packages: **descript-audio-codec** and **soundfile**

Under certain circonstances, **descript-audio-codec** installation could auomatically downgrade **protobuf** back into 3.19.6 which could make some other nodes crash on startup. If it ever happens, just upgrade protobuf by opening comfyUI terminal and run
```
pip install protobuf --upgrade
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nobrainX2/comfyUI-customDia

Awesome Lists containing this project

README