https://github.com/madh93/whisper
🎙️ My Whisper stuff
https://github.com/madh93/whisper
docker openai speech-recognition speech-to-text whisper whisper-cpp
Last synced: 9 months ago
JSON representation
🎙️ My Whisper stuff
- Host: GitHub
- URL: https://github.com/madh93/whisper
- Owner: Madh93
- License: mit
- Created: 2024-02-23T18:31:46.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-26T12:59:42.000Z (almost 2 years ago)
- Last Synced: 2025-03-22T05:11:18.393Z (about 1 year ago)
- Topics: docker, openai, speech-recognition, speech-to-text, whisper, whisper-cpp
- Language: Makefile
- Homepage:
- Size: 424 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper
[](https://lbesson.mit-license.org/)
Personal Makefile that provides a set of commands to manage the transcription and conversion process of audio files using [whisper.cpp](https://github.com/ggerganov/whisper.cpp). It supports both Docker-based and native execution.
## Requirements
- [Make](https://www.gnu.org/software/make/)
- [Docker](https://docs.docker.com/get-docker/)
- [FFmpeg](https://www.ffmpeg.org/download.html)
## Usage
Clone the repository and initialize the required dependencies:
```shell
make setup
```
**Optionally**, if you want AMD ROCm support to use your AMD GPU* just run:
```shell
WHISPER_HIPBLAS=1 make setup
```
*If your GPU is not officially supported don't forget to set the `HSA_OVERRIDE_GFX_VERSION` environment variable. More info [here](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides).
### Download models
Downloads the necessary models for transcription:
```shell
make download
```
Download specific model (available model [here](https://github.com/ggerganov/whisper.cpp/tree/master/models#available-models)):
```shell
make download model=tiny
```
By default, it uses Docker. To disable Docker:
```shell
DOCKER_ENABLED=no make download model=tiny
```
### Convert to .wav (optional)
Converts an input audio file to WAV format (currently `whisper.cpp` runs only with 16-bit WAV files, so make sure to convert your input before running the tool):
```shell
make convert-to-wav input=audios/jfk.mp3 output=audios/jfk.wav
```
### Transcribe audio
Transcribes the `.wav` audio file under `audios` directory using the specified model and language:
```shell
make transcribe model=small.en lang=en file=audios/jfk.wav
```
By default, it utilizes Docker for transcription. To opt for native execution:
```shell
DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav
```
To run in your unsupported AMD GPU, just override the LLVM target. Example:
```shell
HSA_OVERRIDE_GFX_VERSION=10.3.0 DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav
```
All methods generate `.srt`, `.lrt` and `.txt` transcription files.
### Convert to video
Converts the transcribed text into a video file with subtitles:
```shell
make convert-to-video input=audios/jfk.wav
```
## Useful Links
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
## License
This project is licensed under the [MIT license](LICENSE).