https://github.com/veralvx/trainscribe

A command-line tool for transcribing audio files in a folder to a metadata.csv file, using OpenAI's Whisper.
https://github.com/veralvx/trainscribe

audio-processing audio-transcribing audio-transcription cli fine-tuning ljspeech ljspeech-format openai-whisper python python3 training transcribe transcribe-audio-files transcriber transcription whisper

Last synced: 1 day ago
JSON representation

A command-line tool for transcribing audio files in a folder to a metadata.csv file, using OpenAI's Whisper.

Host: GitHub
URL: https://github.com/veralvx/trainscribe
Owner: veralvx
License: mit
Created: 2025-11-13T20:16:12.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-11-13T21:23:05.000Z (7 months ago)
Last Synced: 2026-04-03T08:53:53.932Z (2 months ago)
Topics: audio-processing, audio-transcribing, audio-transcription, cli, fine-tuning, ljspeech, ljspeech-format, openai-whisper, python, python3, training, transcribe, transcribe-audio-files, transcriber, transcription, whisper
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Trainscribe

Trainscribe is a command-line tool that transcribes audio files in a specified folder using [OpenAI's Whisper](https://github.com/openai/whisper) and generates a `metadata.csv` file. The produced metadata file is intended to use in training/finetune of text to speech (TTS) models, and may use one of the following formats:
- `file_id|transcribed_text`, or
- `file_id|transcribed_text|speaker`, if a speaker label is provided.

This is similar to LJ Speech format, but lacks an additional field with normalized transcribed text for pronuciation. Particularly, `file_id|transcribed_text` may be used in projects like [piper-train](https://github.com/veralvx/piper-train), and `file_id|transcribed_text|speaker` in [xtts-finetune](https://github.com/veralvx/xtts-finetune).

## Requirements

- Python >=3.10, <3.14
- [`uv`](https://docs.astral.sh/uv/)
- `ffmpeg` (install with `sudo apt install ffmpeg`)

## Usage

Run the tool with:

```console
uvx trainscribe --folder /path/to/audio/folder [options]
```

```console
Transcribe a folder of audio files to metadata.csv using Whisper.

options:
-h, --help show this help message and exit
--folder, -f FOLDER Folder with audio files
--lang, -l LANG Language code for transcription (e.g. 'en')
--model, -m MODEL Whisper model name (tiny, base, small, medium, large, turbo)
--speaker, -s SPEAKER
Speaker label to add to metadata lines
--device, -d DEVICE Device for whisper model (cuda/cpu)
--output, -o OUTPUT
```

### Example
Transcribe English audio in dataset/wavs using the medium model:

```console
uvx trainscribe --folder dataset/wavs --lang en --model medium
```

This generates `dataset/wavs/metadata.csv`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/veralvx/trainscribe

Awesome Lists containing this project

README