https://github.com/runpod-workers/worker-faster_whisper

faster-whisper as serverless endpoint
https://github.com/runpod-workers/worker-faster_whisper

ai docker faster-whsiper runpod whisper

Last synced: 5 months ago
JSON representation

faster-whisper as serverless endpoint

Host: GitHub
URL: https://github.com/runpod-workers/worker-faster_whisper
Owner: runpod-workers
License: mit
Created: 2023-05-30T16:17:36.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-11-21T06:33:09.000Z (8 months ago)
Last Synced: 2025-11-21T08:30:20.344Z (8 months ago)
Topics: ai, docker, faster-whsiper, runpod, whisper
Language: Python
Homepage: https://runpod.io
Size: 1.98 MB
Stars: 125
Watchers: 4
Forks: 109
Open Issues: 18
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![Faster Whisper Logo](https://5ccaof7hvfzuzf4p.public.blob.vercel-storage.com/banner-pjbGKw0buxbWGhMVC165Gf9qgqWo7I.jpeg)

[Faster Whisper](https://github.com/guillaumekln/faster-whisper) is designed to process audio files using various Whisper models, with options for transcription formatting, language translation and more.

---

[![RunPod](https://api.runpod.io/badge/runpod-workers/worker-faster_whisper)](https://www.runpod.io/console/hub/runpod-workers/worker-faster_whisper)

---

## Models

- tiny
- base
- small
- medium
- large-v1
- large-v2
- large-v3
- distil-large-v2
- distil-large-v3
- turbo

## Input

| Input
| ----------------
| `audio`
| `audio_base64`
| `model`
| `transcription`
| `translate`
| `translation`
| `language`
| `temperature`
| `best_of`
| `beam_size`
| `patience`
| `length_penalty`
| `suppress_tokens`
| `initial_prompt`
| `condition_on_previous_text`
| `temperature_inc
| `compression_ratio_threshold`
| `logprob_threshold`
| `no_speech_threshold`
| `enable_vad`
| `word_timestamps` | Type | Description | ------------------- | ----- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Path | URL to Audio file | | str | Base64-encoded audio file | | str | Choose a Whisper model. Choices: "tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3", "distil-large-v2", "distil-large-v3", "turbo". Default: "base" | | str | Choose the format for the transcription. Choices: "plain_text", "formatted_text", "srt", "vtt". Default: "plain_text" | | bool | Translate the text to English when set to True. Default: False | | str | Choose the format for the translation. Choices: "plain_text", "formatted_text", "srt", "vtt". Default: "plain_text" | | str | Language spoken in the audio, specify None to perform language detection. Default: None | | float | Temperature to use for sampling. Default: 0 | | int | Number of candidates when sampling with non-zero temperature. Default: 5 | | int | Number of beams in beam search, only applicable when temperature is zero. Default: 5 | | float | Optional patience value to use in beam decoding. Default: None | | float | Optional token length penalty coefficient (alpha). Default: None | | str | Comma-separated list of token ids to suppress during sampling. Default: "-1" | | str | Optional text to provide as a prompt for the first window. Default: None | | bool | If True, provide the previous output of the model as a prompt for the next window. Default: True | rement_on_fallback` | float | Temperature to increase when falling back when the decoding fails. Default: 0.2 | | float | If the gzip compression ratio is higher than this value, treat the decoding as failed. Default: 2.4 | | float | If the average log probability is lower than this value, treat the decoding as failed. Default: -1.0 | | float | If the probability of the token is higher than this value, consider the segment as silence. Default: 0.6 | | bool | If True, use the voice activity detection (VAD) to filter out parts of the audio without speech. This step is using the Silero VAD model. Default: False | | bool | If True, include word timestamps in the output. Default: False |

### Example

The following inputs can be used for testing the model:

```json
{
"input": {
"audio": "https://github.com/runpod-workers/sample-inputs/raw/main/audio/gettysburg.wav",
"model": "turbo"
}
}
```

producing an output like this:

```json
{
"segments": [
{
"id": 1,
"seek": 106,
"start": 0.11,
"end": 3.11,
"text": " Hello and welcome!",
"tokens": [50364, 25, 7, 287, 50514],
"temperature": 0.1,
"avg_logprob": -0.8348079785480325,
"compression_ratio": 0.5789473684210527,
"no_speech_prob": 0.1453857421875
}
],
"detected_language": "en",
"transcription": "Hello and welcome!",
"translation": null,
"device": "cuda",
"model": "turbo",
"translation_time": 0.3796223163604736
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/runpod-workers/worker-faster_whisper

Awesome Lists containing this project

README