https://github.com/devidw/dswav
Tooling to build datasets for audio model training
https://github.com/devidw/dswav
Last synced: 12 months ago
JSON representation
Tooling to build datasets for audio model training
- Host: GitHub
- URL: https://github.com/devidw/dswav
- Owner: devidw
- License: unlicense
- Created: 2023-11-23T20:36:07.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-30T18:18:50.000Z (about 2 years ago)
- Last Synced: 2025-04-22T10:43:36.349Z (12 months ago)
- Language: Python
- Homepage:
- Size: 516 KB
- Stars: 16
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dswav
Tool to build dataset for audio model training
Includes a series of helpers for dataset work, such as:
- transcribing audio source into a dataset of segments of text & audio pairs
- combining differnt data sources
- bulk lengthening audio samples
- bulk conversation of mp3s to wav at given sample rate
- building metadata files that can be used for training
Mostly focused around tooling for [StyleTTS2](https://github.com/yl4579/StyleTTS2) datasets, but can also be
used for other kinds of models / libraries such as [coqui](https://github.com/coqui-ai/TTS)
## Usage
```bash
docker run \
-p 7860:7860 \
-v ./projects:/app/projects \
ghcr.io/devidw/dswav:main
```
## TTS, LJSpeech
https://tts.readthedocs.io/en/latest/formatting_your_dataset.html
Supports output in LJSpeech dataset format (`metadata.csv`, `wavs/`) that can be used in the `TTS` py pkg to train models such as xtts2
## StyleTTS2
https://github.com/yl4579/StyleTTS2
Also supports output format for StyleTTS2
- `train_list.txt` 99 %
- `val_list.txt` 1 %
- `wavs/`
## Data sources
In order to import other data sources they must follow this structure:
- /your/path/index.json
- /your/path/wavs/[id].wav
```ts
{
id: string // unique identifier for each sample, should match file name in `./wavs/[id].wav` folder
content: string // the transcript
speaker_id?: string // optional when building for multi-speaker, unique on a per voice speaker basis
}[]
```
## Development
- need ffmpeg, espeak, whipser
```bash
git clone https://github.com/devidw/dswav
cd dswav
poetry install
make dev
```
## notes
- currently splitting based on sentences and not silence, which sometimes still keeps artifacts at the end, should
rather detect silence to have clean examples