{"id":21081788,"url":"https://github.com/openvoiceos/ovos-docker-stt","last_synced_at":"2025-06-26T01:36:15.056Z","repository":{"id":171172614,"uuid":"647543947","full_name":"OpenVoiceOS/ovos-docker-stt","owner":"OpenVoiceOS","description":"Open Voice OS Speech-to-Text (STT) container images and docker-compose.yml file for x86_64 CPU architecture.","archived":false,"fork":false,"pushed_at":"2025-04-23T16:36:28.000Z","size":55,"stargazers_count":13,"open_issues_count":4,"forks_count":2,"subscribers_count":5,"default_branch":"dev","last_synced_at":"2025-06-19T20:49:17.081Z","etag":null,"topics":["fasterwhisper","openvoiceos","ovos","speech-to-text","stt","whisper"],"latest_commit_sha":null,"homepage":"https://openvoiceos.org","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenVoiceOS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["OpenVoiceOS"],"patreon":"openvoiceos","liberapay":"OpenVoiceOS-Foundation","custom":"https://paypal.me/openvoiceos"}},"created_at":"2023-05-31T02:30:07.000Z","updated_at":"2025-05-18T23:07:30.000Z","dependencies_parsed_at":"2024-05-09T19:29:17.622Z","dependency_job_id":"33d910c0-1451-4c38-b97c-de16e6dae477","html_url":"https://github.com/OpenVoiceOS/ovos-docker-stt","commit_stats":null,"previous_names":["openvoiceos/ovos-docker-stt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenVoiceOS/ovos-docker-stt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenVoiceOS%2Fovos-docker-stt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenVoiceOS%2Fovos-docker-stt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenVoiceOS%2Fovos-docker-stt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenVoiceOS%2Fovos-docker-stt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenVoiceOS","download_url":"https://codeload.github.com/OpenVoiceOS/ovos-docker-stt/tar.gz/refs/heads/dev","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenVoiceOS%2Fovos-docker-stt/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261983495,"owners_count":23240217,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fasterwhisper","openvoiceos","ovos","speech-to-text","stt","whisper"],"created_at":"2024-11-19T20:10:59.778Z","updated_at":"2025-06-26T01:36:15.030Z","avatar_url":"https://github.com/OpenVoiceOS.png","language":"Dockerfile","funding_links":["https://github.com/sponsors/OpenVoiceOS","https://patreon.com/openvoiceos","https://liberapay.com/OpenVoiceOS-Foundation","https://paypal.me/openvoiceos"],"categories":[],"sub_categories":[],"readme":"# Open Voice OS Speech-to-Text (STT) on Docker or Podman\n\n## What's a Speech-to-Text (STT)?\n\n*According to \u003chttps://aws.amazon.com/what-is/speech-to-text\u003e:*\n\n\u003e Speech to text is a speech recognition software that enables the recognition and translation of spoken language into text through computational linguistics. It is also known as speech recognition or computer speech recognition. Specific applications, tools, and devices can transcribe audio streams in real-time to display text and act on it.\n\nOpen Voice OS provides support for different STT engines via a plugin mechanism exposing HTTP endpoints to be consumed by the voice assistant.\n\n## Containerized STT plugins\n\nTo facilitate the installation and the adoption of local Speech-to-Text engine, we build a set of OCI images compatible with Docker, Podman and Kubernetes as well.\n\n| Image                                | Port | Description                                                                                                                                                          |\n|--------------------------------------| ---  | ---                                                                                                                                                                  |\n| `ovos-stt-plugin-chromium`           | 8082 | A STT plugin for OVOS using the Google Chrome browser API                                                                                                            |\n| `ovos-stt-plugin-deepgram`           | 8083 | Unmatched accuracy. Blazing fast. Enterprise scale. Hands-down the best price. Everything developers need to build with confidence and ship faster                   |\n| `ovos-stt-plugin-fasterwhisper`      | 8080 | High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model                                                                              |\n| `ovos-stt-plugin-fasterwhisper-cuda` | 8080 | High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model supporting Nvidia CUDA                                                       |\n| `ovos-stt-plugin-citrinet`           | 8084 | Conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS) |\n| `ovos-stt-plugin-vosk`               | 8081 | Vosk is a speech recognition toolkit supporting more than 20 languages and dialects, works offline and able to run on lightweight devices                            |\n\nUsing this approach allows you as well to decentralize the STT server which means that it doesn't have to run locally on the voice assistant but on a remote server with more compute power using CPU and/or GPU.\n\n### Image alternatives\n\nThere are two *(2)* different implementations for the Faster Whisper STT plugin.\n\n- `ovos-stt-plugin-fasterwhisper` image using only the CPU to transcribe *(default)*\n- `ovos-stt-plugin-fasterwhisper-cuda` image using only the GPU to transcribe\n\nTo use `ovos-stt-plugin-fasterwhisper-cuda`, please review the `docker-compose.yml` file.\n\n**Only one implementation can be selected at a time.**\n\n## Requirements\n\n### Docker or Podman\n\nDocker or Podman *(rootless)* is of course required and `docker compose` *(not `docker-compose`!!)* or `podman-compose` is a nice to have to simplify the whole process of deploying the whole stack by using the `docker-compose.yml` files *(for Docker, this command will be embedded depending the version, for Podman, `podman-compose` command comes from a different package)*.\n\n**If you plan to passthrough GPUs in order to leverage Nvidia CUDA with Docker or Podman, please make you configured your container engine properly to support GPUs.**\n\n## How to build these images\n\nThe `base` image is the main layer for the other images, for example the `fasterwhisper` image requires the `base` image to be build.\n\n```bash\ngit clone https://github.com/OpenVoiceOS/ovos-docker-stt.git\ncd ovos-docker-stt\ndocker buildx build fasterwhisper/ -t smartgic/ovos-stt-server-fasterwhisper:alpha --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') --no-cache\n  # Or:\npodman buildx build fasterwhisper/ -t smartgic/ovos-stt-server-fasterwhisper:alpha --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') --no-cache\n```\n\n### Arguments\n\nThere are a list of available arguments that could be used during the image build process.\n\n| Name         | Value                              | Default   | Description                                                           |\n| ---          | ---                                |-----------| ---                                                                   |\n| `ALPHA`      | `true`                             | `false`   | Using the alpha releases from PyPi built from the `dev` branches      |\n| `BUILD_DATE` | `$(date -u +'%Y-%m-%dT%H:%M:%SZ')` | `unknown` | Used as `LABEL` within the Dockerfile to determine the build date     |\n| `TAG`        | `dev`                              | `dev`     | OCI image tag, (e.g. `docker pull smartgic/ovos-stt-server-base:dev`) |\n| `VERSION`    | `0.0.8a`                           | `unknown` | Used as `LABEL` within the Dockerfile to determine the version        |\n\nPre-build images are already available [here](https://hub.docker.com/u/smartgic) and are the default referenced within the `docker-compose.yml` file.\n\n## How to use these images\n\n`docker-compose.yml` file provides an easy way to provision the container stack *(volumes and services)* with the required configuration for each of them. `docker compose` or `podman-compose` both support environment files, check the `.env` file.\n\n```bash\ngit clone https://github.com/OpenVoiceOS/ovos-docker-stt.git\nmkdir -p ~/ovos-tts-stt/config\nchown ${USER}:${USER} -R ~/ovos-tts-stt\ncd ovos-docker-stt\ndocker compose up -d\n  # Or:\npodman-compose up -d\n```\n\nTo reduce the potential overhead due to the image downloads and extracts, the `--parallel` option could be user in order to process the images by batch of `x` *(where `x` is an integer)*.\n\n```bash\ndocker compose --parallel 3 up -d\n  # Or:\npodman-compose --parallel 3 up -d\n```\n\nIf you only plan to use the Faster Whisper STT server then you could reference it to the command line.\n\n```bash\ndocker compose up -d ovos_stt_fasterwhisper\n  # Or:\npodman-compose up -d ovos_stt_fasterwhisper\n```\n\nSome variables might need to be tuned to match your setup such as the timezone, the directories, *etc...*, have a look into the `.env` files befor running `docker compose` or `podman-compose`.\n\nThe `OVOS_USER` variable should be changed **only** if you build the Docker images with a different user than `ovos`.\n\n## How to update the current stack\n\nThe easiest way to update a stack already deployed by `docker compose` or `podman-compose` is to use `docker compose` or `podman-compose`. :relaxed:\n\nBecause the `pull_policy` option of each service is set to `always`, everytime that a new image is uploaded with the same tag then `docker compose` or `podman-compose` will pull-it and re-create the container based on this new image.\n\n```bash\ndocker compose up -d\n  # Or:\npodman-compose up -d\n```\n\nIf you want to change the tag to deploy, update the `.env` file with the new value.\n\n## Configuration the STT plugins\n\n`~/ovos/config/mycroft.conf` configuration file is used to configura the STT plugin. Make sure to adapt the sample below to fit your requirements.\n\n```json\n{\n  \"logs\": {\n    \"path\": \"stdout\"\n  },\n  \"stt\": {\n    \"module\": \"ovos-stt-plugin-fasterwhisper\",\n    \"ovos-stt-plugin-fasterwhisper\": {\n        \"model\": \"whisper-large-v3-turbo\",\n        \"compute_type\": \"float16\",\n        \"use_cuda\": true,\n        \"cpu_thread\": 8\n    },\n    \"ovos-stt-plugin-vosk-streaming\": {\n        \"model\": \"https://alphacephei.com/vosk/models/vosk-model-en-us-0.42-gigaspeech.zip\",\n        \"verbose\": false\n    },\n    \"ovos-stt-plugin-vosk\": {\n        \"model\": \"http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip\",\n        \"verbose\": false\n    },\n    \"ovos-stt-plugin-deepgram\": {\n      \"key\": \"GET A KEY FROM DEEPGRAM WEBSITE :)\"\n    },\n    \"ovos-stt-plugin-chromium\": {\n        \"lang\": \"en-US\",\n        \"pfilter\": false,\n        \"debug\": false\n    }\n  }\n}\n```\n\nIf you don't plan to use Nvidia CUDA with the STT Faster Whisper plugin, then `use_cuda` should be set to `false` and `compute_type` set to `int8`.\n\n## Configure the voice assistant\n\nOnce the STT servers are up and running, the voice assistant must be configured to reference them. Please make sure to add the section below to your `~/ovos/config/mycroft.conf` configuration file.\n\n```json\n{\n  \"stt\": {\n      \"module\": \"ovos-stt-plugin-server\",\n      \"fallback_module\": \"ovos-stt-plugin-vosk\",\n      \"ovos-stt-plugin-server\": {\n        \"urls\": [\n          \"http://192.168.1.227:8080/stt\",\n          \"http://192.168.1.227:8081/stt\",\n          \"http://192.168.1.227:8082/stt\",\n          \"http://192.168.1.227:8083/stt\",\n          \"http://192.168.1.227:8084/stt\",\n          \"https://stt.openvoiceos.org/stt\"\n        ]\n      }\n  }\n}\n```\n\nThe configuration means that `ovos-stt-plugin-server` will be used as default STT plugin. The plugin has a list of five *(5)* STT servers, if one is down then the plugin goes to the next one, etc...\n\nIf all the STT servers from `ovos-stt-plugin-server` are down then the voice assistant will fallback to the `ovos-stt-plugin-vosk` STT server running locally to the voice assistant.\n\n## Debug\n\n### Is the STT alive?\n\nIn order to check if a STT server is up and running, the `/status` endpoint should be called *(`jq` command is not mandatory just nice to have)*.\n\n```bash\ncurl -v http://192.168.1.227:8080/status | jq\n```\n\n### Logging\n\nEnable debug mode in `~/ovos/config/mycroft.conf` to get more verbosity from the logs. All containers will have to be restarted to receive the configuration change.\n\n```json\n{\n  \"debug\": true,\n  \"log_level\": \"DEBUG\",\n  \"logs\": {\n    \"path\": \"stdout\"\n  }\n}\n```\n\n### Container debugging\n\nTo access all the container logs at the same time, run the following command *(make sure it matches the `docker compose` or `podman-compose` command you run to deploy the stack)*:\n\n```bash\ndocker compose logs -f --tail 200\n  # Or:\npodman-compose logs -n -f --tail 200\n```\n\nTo access the logs of a specific container, run the following command:\n\n```bash\ndocker logs -f --tail 200 ovos_stt_fasterwhisper\n  # Or:\npodman logs -f --tail 200 ovos_stt_fasterwhisper\n```\n\nTo go inside a container and run multiple commands, run the following command *(where `bash` is the available shell in there)*:\n\n```bash\ndocker exec -ti ovos_stt_fasterwhisper bash\n  # Or:\npodman exec -ti ovos_stt_fasterwhisper bash\n```\n\nIf the configuration file is not valid JSON, `jq` will return something like this:\n\n```text\nparse error: Expected another key-value pair at line 81, column 3\n```\n\nTo get the CPU, memory and I/O consumption per container, run the following command:\n\n```bash\ndocker stats -a --no-trunc\n  # Or:\npodman stats -a --no-trunc\n```\n\n### Validate configuration\n\nMake sure `mycroft.conf` configuration file is JSON valid by using the `jq` command.\n\n```bash\ncat ~/ovos/config/mycroft.conf | jq\n```\n\n## Support\n\n- [Matrix channel](https://matrix.to/#/#openvoiceos:matrix.org)\n- [Open Voice OS documentation](https://openvoiceos.github.io/community-docs/)\n- [Contribute to Open Voice OS](https://openvoiceos.github.io/community-docs/contributing/)\n- [Report bugs related to these Docker images](https://github.com/OpenVoiceOS/ovos-docker-stt/issues)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenvoiceos%2Fovos-docker-stt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenvoiceos%2Fovos-docker-stt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenvoiceos%2Fovos-docker-stt/lists"}