Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madh93/whisper
🎙️ My Whisper stuff
https://github.com/madh93/whisper
docker openai speech-recognition speech-to-text whisper whisper-cpp
Last synced: about 2 months ago
JSON representation
🎙️ My Whisper stuff
- Host: GitHub
- URL: https://github.com/madh93/whisper
- Owner: Madh93
- License: mit
- Created: 2024-02-23T18:31:46.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-27T21:18:21.000Z (11 months ago)
- Last Synced: 2024-02-27T22:29:23.076Z (11 months ago)
- Topics: docker, openai, speech-recognition, speech-to-text, whisper, whisper-cpp
- Language: Makefile
- Homepage:
- Size: 421 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper
[![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)
Personal Makefile that provides a set of commands to manage the transcription and conversion process of audio files using [whisper.cpp](https://github.com/ggerganov/whisper.cpp). It supports both Docker-based and native execution.
## Requirements
- [Make](https://www.gnu.org/software/make/)
- [Docker](https://docs.docker.com/get-docker/)
- [FFmpeg](https://www.ffmpeg.org/download.html)## Usage
Clone the repository and initialize the required dependencies:
```shell
make setup
```**Optionally**, if you want AMD ROCm support to use your AMD GPU* just run:
```shell
WHISPER_HIPBLAS=1 make setup
```*If your GPU is not officially supported don't forget to set the `HSA_OVERRIDE_GFX_VERSION` environment variable. More info [here](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides).
### Download models
Downloads the necessary models for transcription:
```shell
make download
```Download specific model (available model [here](https://github.com/ggerganov/whisper.cpp/tree/master/models#available-models)):
```shell
make download model=tiny
```By default, it uses Docker. To disable Docker:
```shell
DOCKER_ENABLED=no make download model=tiny
```### Convert to .wav (optional)
Converts an input audio file to WAV format (currently `whisper.cpp` runs only with 16-bit WAV files, so make sure to convert your input before running the tool):
```shell
make convert-to-wav input=audios/jfk.mp3 output=audios/jfk.wav
```### Transcribe audio
Transcribes the `.wav` audio file under `audios` directory using the specified model and language:
```shell
make transcribe model=small.en lang=en file=audios/jfk.wav
```By default, it utilizes Docker for transcription. To opt for native execution:
```shell
DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav
```To run in your unsupported AMD GPU, just override the LLVM target. Example:
```shell
HSA_OVERRIDE_GFX_VERSION=10.3.0 DOCKER_ENABLED=no make transcribe model=small.en lang=en file=audios/jfk.wav
```All methods generate `.srt`, `.lrt` and `.txt` transcription files.
### Convert to video
Converts the transcribed text into a video file with subtitles:
```shell
make convert-to-video input=audios/jfk.wav
```## Useful Links
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
## License
This project is licensed under the [MIT license](LICENSE).