Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/madh93/whisper

🎙️ My Whisper stuff
https://github.com/madh93/whisper

docker openai speech-recognition speech-to-text whisper whisper-cpp

Last synced: about 2 months ago
JSON representation

🎙️ My Whisper stuff

Host: GitHub
URL: https://github.com/madh93/whisper
Owner: Madh93
License: mit
Created: 2024-02-23T18:31:46.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-02-27T21:18:21.000Z (11 months ago)
Last Synced: 2024-02-27T22:29:23.076Z (11 months ago)
Topics: docker, openai, speech-recognition, speech-to-text, whisper, whisper-cpp
Language: Makefile
Homepage:
Size: 421 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Whisper

[![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)

Personal Makefile that provides a set of commands to manage the transcription and conversion process of audio files using [whisper.cpp](https://github.com/ggerganov/whisper.cpp). It supports both Docker-based and native execution.

## Requirements

- [Make](https://www.gnu.org/software/make/)

- [Docker](https://docs.docker.com/get-docker/)

- [FFmpeg](https://www.ffmpeg.org/download.html)

## Usage

Clone the repository and initialize the required dependencies:

```shell

make setup

```

**Optionally**, if you want AMD ROCm support to use your AMD GPU* just run:

```shell

WHISPER_HIPBLAS=1 make setup

```

*If your GPU is not officially supported don't forget to set the `HSA_OVERRIDE_GFX_VERSION` environment variable. More info [here](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides).

### Download models

Downloads the necessary models for transcription:

```shell

make download

```

Download specific model (available model [here](https://github.com/ggerganov/whisper.cpp/tree/master/models#available-models)):

```shell

make download model=tiny

```

By default, it uses Docker. To disable Docker:

```shell

DOCKER_ENABLED=no make download model=tiny

```

### Convert to .wav (optional)

Converts an input audio file to WAV format (currently `whisper.cpp` runs only with 16-bit WAV files, so make sure to convert your input before running the tool):

```shell

make convert-to-wav input=audios/jfk.mp3 output=audios/jfk.wav

```

### Transcribe audio

Transcribes the `.wav` audio file under `audios` directory using the specified model and language:

```shell

make transcribe model=small.en lang=en file=audios/jfk.wav

```

By default, it utilizes Docker for transcription. To opt for native execution:

```shell

DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav

```

To run in your unsupported AMD GPU, just override the LLVM target. Example:

```shell

HSA_OVERRIDE_GFX_VERSION=10.3.0 DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav

```

All methods generate `.srt`, `.lrt` and `.txt` transcription files.

### Convert to video

Converts the transcribed text into a video file with subtitles:

```shell

make convert-to-video input=audios/jfk.wav

```

## Useful Links

- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)

## License

This project is licensed under the [MIT license](LICENSE).