Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/evilfreelancer/docker-whisper-server

whisper.cpp HTTP transcription server with OpenAI-like API in Docker
https://github.com/evilfreelancer/docker-whisper-server

api api-server asr cuda docker docker-compose dockerfile nvidia openai openai-api whisper whisper-cpp

Last synced: 2 days ago
JSON representation

whisper.cpp HTTP transcription server with OpenAI-like API in Docker

Host: GitHub
URL: https://github.com/evilfreelancer/docker-whisper-server
Owner: EvilFreelancer
License: mit
Created: 2024-07-20T09:26:04.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-01-29T13:53:17.000Z (12 days ago)
Last Synced: 2025-01-30T18:06:08.836Z (11 days ago)
Topics: api, api-server, asr, cuda, docker, docker-compose, dockerfile, nvidia, openai, openai-api, whisper, whisper-cpp
Language: Python
Homepage:
Size: 849 KB
Stars: 15
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.en.md
- License: LICENSE

Awesome Lists containing this project

README

        # Whisper.cpp API Webserver in Docker

Whisper.cpp HTTP transcription server with OAI-like API in Docker.

This project provides a Dockerized transcription server based

on [whisper.cpp](https://github.com/ggerganov/whisper.cpp/tree/master/examples/server).

[Русский](./README.md) | [中文](./README.zh.md) | **English**

## Features

- Dockerized whisper.cpp HTTP server for audio transcription

- Configurable via environment variables

- Automatically converts audio to WAV format

- Automatically downloads required model on startup

- Can quantize any Whisper model to the required type on startup

## Requirements

Before you begin, ensure you have a machine with an GPU that supports modern CUDA, due to the computational

demands of the docker image.

* Nvidia GPU / Intel Arc

* CUDA / oneAPI

* Docker

* Docker Compose

* Nvidia Docker Runtime (Nvidia only)

For detailed instructions on how to prepare a Linux machine for running neural networks, including the installation of

CUDA, Docker, and Nvidia Docker Runtime, please refer to the

publication "[How to Prepare Linux for Running and Training Neural Networks? (+ Docker)](https://dzen.ru/a/ZVt9kRBCTCGlQqyP)"

on Russian.

## Installation

1. Clone the repo and switch to sources root:

   ```shell

   git clone https://github.com/EvilFreelancer/docker-whisper-server.git

   cd docker-whisper-server

   ```

2. Copy the provided Docker Compose template:

    ```shell

    cp docker-compose.dist.yml docker-compose.yml

    ```

Example for Intel Arc cards:

```yaml

x-shared-logs: &shared-logs

   logging:

      driver: "json-file"

      options:

         max-size: "10k"

services:

  whisper-intel:

    restart: "unless-stopped"

    build:

      context: ./whisper

      dockerfile: Dockerfile.intel  

      args:

        - WHISPER_VERSION=v1.7.4

    devices:

       - /dev/dri

    volumes:

      - ./models:/app/models

    ports:

      - "127.0.0.1:9000:9000"

    environment:

      WHISPER_MODEL: large-v3-turbo

      WHISPER_MODEL_QUANTIZATION: q4_0

    <<: *shared-logs

```

3. Build the Docker image:

    ```shell

    docker-compose build

    ```

4. Start the services:

    ```shell

    docker-compose up -d

    ```

5. Navigate to http://localhost:8080 in browser:

   ![Swagger UI](./assets/swagger.png)

## Endpoints

### /inference

Transcribe an audio file:

```shell

curl 127.0.0.1:9000/inference \

  -H "Content-Type: multipart/form-data" \

  -F file="@" \

  -F temperature="0.0" \

  -F temperature_inc="0.2" \

  -F response_format="json"

```

### /load

Load a new Whisper model:

```shell

curl 127.0.0.1:9000/load \

   -H "Content-Type: multipart/form-data" \

   -F model=""

```

## Environment variables

**Basic configuration**

| Name                         | Default                               | Description                                                                      |

|------------------------------|---------------------------------------|----------------------------------------------------------------------------------|

| `WHISPER_MODEL`              | base.en                               | The default Whisper model to use                                                 |

| `WHISPER_MODEL_PATH`         | /app/models/ggml-${WHISPER_MODEL}.bin | The default path to the Whisper model file                                       |

| `WHISPER_MODEL_QUANTIZATION` |                                       | Level of quantization (will be applied only if `WHISPER_MODEL_PATH` not changed) |

Advanced Configuration

| Name 
|--------------------- 
| `WHISPER_THREADS` 
| `WHISPER_PROCESSORS` 
| `WHISPER_HOST` 
| `WHISPER_PORT` 
| `WHISPER_INFERENCE_PATH` 
| `WHISPER_PUBLIC_PATH`     | 
| `WHISPER_REQUEST_PATH`    | 
| `WHISPER_OV_E_DEVICE` 
| `WHISPER_OFFSET_T` 
| `WHISPER_OFFSET_N` 
| `WHISPER_DURATION` 
| `WHISPER_MAX_CONTEXT` 
| `WHISPER_MAX_LEN` 
| `WHISPER_BEST_OF` 
| `WHISPER_BEAM_SIZE` 
| `WHISPER_AUDIO_CTX` 
| `WHISPER_WORD_THOLD` 
| `WHISPER_ENTROPY_THOLD` 
| `WHISPER_LOGPROB_THOLD` 
| `WHISPER_LANGUAGE` 
| `WHISPER_PROMPT` 
| `WHISPER_DTW` 
| `WHISPER_CONVERT` 
| `WHISPER_SPLIT_ON_WORD` 
| `WHISPER_DEBUG_MODE` 
| `WHISPER_TRANSLATE` 
| `WHISPER_DIARIZE` 
| `WHISPER_TINYDIARIZE` 
| `WHISPER_NO_FALLBACK` 
| `WHISPER_PRINT_SPECIAL` 
| `WHISPER_PRINT_COLORS` 
| `WHISPER_PRINT_REALTIME` 
| `WHISPER_PRINT_PROGRESS` 
| `WHISPER_NO_TIMESTAMPS` 
| `WHISPER_DETECT_LANGUAGE`

| Default    | Description                                         | ------|------------|-----------------------------------------------------| | 4          | Number of threads to use for inference              | | 1          | Number of processors to use for inference           | | 0.0.0.0    | Host IP or hostname to bind the server to           | | 9000       | Port number to listen on                            | | /inference | Inference path for all requests                     | | Path to the public folder                           | | Request path for all requests                       | | CPU        | OpenViBE Event Device to use                        | | 0          | Time offset in milliseconds                         | | 0          | Number of seconds to offset                         | | 0          | Duration of the audio file in milliseconds          | | -1         | Maximum context size for inference                  | | 0          | Maximum length of output text                       | | 2          | Best-of-N strategy for inference                    | | -1         | Beam size for search                                | | 0          | Audio context to use for inference                  | | 0.01       | Word threshold for segmentation                     | | 2.40       | Entropy threshold for segmentation                  | | -1.00      | Log probability threshold for segmentation          | | en         | Language code to use for translation or diarization | |            | Initial prompt                                      | |            | Compute token-level timestamps                      | | true       | Convert audio to WAV, requires ffmpeg on the server | | false      | Split on word rather than on token                  | | false      | Enable debug mode                                   | | false      | Translate from source language to english           | | false      | Stereo audio diarization                            | | false      | Enable tinydiarize (requires a tdrz model)          | | false      | Do not use temperature fallback while decoding      | | false      | Print special tokens                                | | false      | Print colors                                        | | false      | Print output in realtime                            | | false      | Print progress                                      | | false      | Do not print timestamps                             | | false      | Exit after automatically detecting language         |

## Links

- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)

- [server example](https://github.com/ggerganov/whisper.cpp/tree/master/examples/server) of whisper.cpp

## Citing

```text

[Pavel Rykov]. (2024). Whisper.cpp API Webserver in Docker. GitHub. https://github.com/EvilFreelancer/docker-whisper-server

```

```text

@misc{pavelrykov2024whisperapi,

  author = {Pavel Rykov},

  title  = {Whisper.cpp API Webserver in Docker},

  year   = {2024},

  url    = {https://github.com/EvilFreelancer/docker-whisper-server}

}

```