https://github.com/ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
https://github.com/ahmetoner/whisper-asr-webservice
asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text
Last synced: 5 days ago
JSON representation
OpenAI Whisper ASR Webservice API
- Host: GitHub
- URL: https://github.com/ahmetoner/whisper-asr-webservice
- Owner: ahmetoner
- License: mit
- Created: 2022-09-22T14:26:49.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-18T01:00:07.000Z (about 2 months ago)
- Last Synced: 2025-04-03T12:51:18.071Z (12 days ago)
- Topics: asr, automatic-speech-recognition, docker, openai-whisper, speech, speech-recognition, speech-to-text
- Language: Python
- Homepage: https://ahmetoner.github.io/whisper-asr-webservice
- Size: 1.76 MB
- Stars: 2,494
- Watchers: 30
- Forks: 447
- Open Issues: 68
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
- awesome-ChatGPT-repositories - whisper-asr-webservice - OpenAI Whisper ASR Webservice API (Openai)
README



# Whisper ASR Box
Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
## Features
Current release (v1.8.2) supports following whisper models:
- [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
- [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.1.0)
- [whisperX](https://github.com/m-bain/whisperX)@[v3.1.1](https://github.com/m-bain/whisperX/releases/tag/v3.1.1)## Quick Usage
### CPU
```shell
docker run -d -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest
```### GPU
```shell
docker run -d --gpus all -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest-gpu
```#### Cache
To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:
```shell
docker run -d -p 9000:9000 \
-v $PWD/cache:/root/.cache/ \
onerahmet/openai-whisper-asr-webservice:latest
```## Key Features
- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
- Multiple output formats (text, JSON, VTT, SRT, TSV)
- Word-level timestamps support
- Voice activity detection (VAD) filtering
- Speaker diarization (with WhisperX)
- FFmpeg integration for broad audio/video format support
- GPU acceleration support
- Configurable model loading/unloading
- REST API with Swagger documentation## Environment Variables
Key configuration options:
- `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx)
- `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.)
- `ASR_MODEL_PATH`: Custom path to store/load models
- `ASR_DEVICE`: Device selection (cuda, cpu)
- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading## Documentation
For complete documentation, visit:
[https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)## Development
```shell
# Install poetry
pip3 install poetry# Install dependencies
poetry install# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
```After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints.
## Credits
- This software uses libraries from the [FFmpeg](http://ffmpeg.org) project under the [LGPLv2.1](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html)