Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gabrieldiem/batch-stt-whisper

Using STT (Speech to text), transcribe to a text file all audio files from a folder using local inference with open source OpenAI's Whisper LLM for audio transcription.
https://github.com/gabrieldiem/batch-stt-whisper

Last synced: about 17 hours ago
JSON representation

Using STT (Speech to text), transcribe to a text file all audio files from a folder using local inference with open source OpenAI's Whisper LLM for audio transcription.

Host: GitHub
URL: https://github.com/gabrieldiem/batch-stt-whisper
Owner: gabrieldiem
Created: 2024-09-16T12:23:20.000Z (3 months ago)
Default Branch: master
Last Pushed: 2024-09-16T19:03:27.000Z (3 months ago)
Last Synced: 2024-11-05T14:04:25.665Z (about 2 months ago)
Language: Python
Homepage:
Size: 2.93 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Batch STT using Whisper from OpenAI

Using STT (Speech to text), transcribe to a text file all audio files from a folder using local inference with open source OpenAI's Whisper LLM for audio transcription.

## Requirements

An Nvidia graphic card with CUDA capability is required. Information about different Whisper model sizes [here](https://github.com/openai/whisper/blob/main/model-card.md).

The program is set up to use model size `medium` that has 769M parameters and can be loaded with ~5 GB of GPU VRAM. If you want to change the size of the model you can change the `WHISPER_MODEL_ID` constant in the script.

> Tested with Nvidia RTX 3060 with 6GB of VRAM.

## Run the script

Install the requirements:

```shell
pip install -r ./requirements.txt
```

Run the script:

```shell
python ./whisper_process_folder.py path/to/folder/with/audios
```

## Development

### Compile requirements

```shell
pip-compile --output-file=requirements.txt "./requirements.in"
```