{"id":15462708,"url":"https://github.com/devidw/dswav","last_synced_at":"2025-04-22T10:43:40.697Z","repository":{"id":208898192,"uuid":"722737252","full_name":"devidw/dswav","owner":"devidw","description":"Tooling to build datasets for audio model training","archived":false,"fork":false,"pushed_at":"2024-01-30T18:18:50.000Z","size":528,"stargazers_count":16,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-22T10:43:36.349Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devidw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"devidw"}},"created_at":"2023-11-23T20:36:07.000Z","updated_at":"2024-11-14T00:46:33.000Z","dependencies_parsed_at":"2023-11-23T22:23:09.818Z","dependency_job_id":"4631f738-eec0-4b9d-a858-acce524ba76c","html_url":"https://github.com/devidw/dswav","commit_stats":null,"previous_names":["devidw/dswav"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fdswav","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fdswav/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fdswav/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devidw%2Fdswav/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devidw","download_url":"https://codeload.github.com/devidw/dswav/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250222421,"owners_count":21394863,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T00:03:42.121Z","updated_at":"2025-04-22T10:43:40.636Z","avatar_url":"https://github.com/devidw.png","language":"Python","readme":"# dswav\n\nTool to build dataset for audio model training\n\nIncludes a series of helpers for dataset work, such as:\n\n- transcribing audio source into a dataset of segments of text \u0026 audio pairs\n- combining differnt data sources\n- bulk lengthening audio samples\n- bulk conversation of mp3s to wav at given sample rate\n- building metadata files that can be used for training\n\nMostly focused around tooling for [StyleTTS2](https://github.com/yl4579/StyleTTS2) datasets, but can also be\nused for other kinds of models / libraries such as [coqui](https://github.com/coqui-ai/TTS)\n\n## Usage\n\n```bash\ndocker run \\\n  -p 7860:7860 \\\n  -v ./projects:/app/projects \\\n  ghcr.io/devidw/dswav:main\n```\n\n## TTS, LJSpeech\n\nhttps://tts.readthedocs.io/en/latest/formatting_your_dataset.html\n\nSupports output in LJSpeech dataset format (`metadata.csv`, `wavs/`) that can be used in the `TTS` py pkg to train models such as xtts2\n\n## StyleTTS2\n\nhttps://github.com/yl4579/StyleTTS2\n\nAlso supports output format for StyleTTS2\n\n- `train_list.txt` 99 %\n- `val_list.txt` 1 %\n- `wavs/`\n\n## Data sources\n\nIn order to import other data sources they must follow this structure:\n\n- /your/path/index.json\n- /your/path/wavs/[id].wav\n\n```ts\n{\n    id: string // unique identifier for each sample, should match file name in `./wavs/[id].wav` folder\n    content: string // the transcript\n    speaker_id?: string // optional when building for multi-speaker, unique on a per voice speaker basis\n}[]\n```\n\n## Development\n\n- need ffmpeg, espeak, whipser\n\n```bash\ngit clone https://github.com/devidw/dswav\ncd dswav\n\npoetry install\n\nmake dev\n```\n\n## notes\n\n- currently splitting based on sentences and not silence, which sometimes still keeps artifacts at the end, should\n  rather detect silence to have clean examples","funding_links":["https://github.com/sponsors/devidw"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevidw%2Fdswav","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevidw%2Fdswav","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevidw%2Fdswav/lists"}