{"id":16947015,"url":"https://github.com/mainro/deepspeech-server","last_synced_at":"2025-04-05T21:05:51.808Z","repository":{"id":41249710,"uuid":"110953941","full_name":"MainRo/deepspeech-server","owner":"MainRo","description":"A testing server for a speech to text service based on coqui.ai","archived":false,"fork":false,"pushed_at":"2022-07-12T21:45:16.000Z","size":82,"stargazers_count":215,"open_issues_count":0,"forks_count":71,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-29T20:05:22.659Z","etag":null,"topics":["coqui-ai","deepspeech","reactive-extensions","reactivex","rxpy","speech-recognition","speech-to-text"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MainRo.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-16T09:53:40.000Z","updated_at":"2025-02-17T12:35:50.000Z","dependencies_parsed_at":"2022-07-10T02:46:02.644Z","dependency_job_id":null,"html_url":"https://github.com/MainRo/deepspeech-server","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MainRo%2Fdeepspeech-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MainRo%2Fdeepspeech-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MainRo%2Fdeepspeech-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MainRo%2Fdeepspeech-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MainRo","download_url":"https://codeload.github.com/MainRo/deepspeech-server/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399871,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coqui-ai","deepspeech","reactive-extensions","reactivex","rxpy","speech-recognition","speech-to-text"],"created_at":"2024-10-13T21:45:35.088Z","updated_at":"2025-04-05T21:05:51.781Z","avatar_url":"https://github.com/MainRo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"==================\nDeepSpeech Server\n==================\n\n.. image:: https://github.com/MainRo/deepspeech-server/actions/workflows/pythonpackage.yml/badge.svg\n    :target: https://github.com/MainRo/deepspeech-server/actions/workflows/pythonpackage.yml\n\n.. image:: https://badge.fury.io/py/deepspeech-server.svg\n    :target: https://badge.fury.io/py/deepspeech-server\n\nKey Features\n============\n\nThis is an http server that can be used to test the Coqui STT project (the\nsuccessor of the Mozilla DeepSpeech project). You need an environment with\nDeepSpeech or Coqui to run this server.\n\nThis code uses the Coqui STT 1.0 APIs.\n\nInstallation\n=============\n\nThe server is available on pypi, so you can install it with pip:\n\n.. code-block:: console\n\n    pip3 install deepspeech-server\n\n\nYou can also install deepspeech server from sources:\n\n.. code-block:: console\n\n    python3 setup.py install\n\nNote that python 3.6 is the minimum version required to run the server.\n\nStarting the server\n====================\n\n.. code-block:: console\n\n    deepspeech-server --config config.yaml\n\nWhat is a STT model?\n--------------------\n\nThe quality of the speech-to-text engine depends heavily on which models it\nloads at runtime. Think of them as a sort of pattern that controls how the\nengine works.\n\nHow to use a specific STT model\n-------------------------------\n\nYou can use coqui without training a model. Pre-trained models are on\noffer at the Coqui Model Zoo (Make sure the STT Models tab is selected):\n\nhttps://coqui.ai/models\n\nOnce you've downloaded a pre-trained model, make a copy of the sample\nconfiguration file. Edit the `\"model\"` and `\"scorer\"` fields in your new file\nfor the engine you want to use so that they match the downloaded files:\n\n.. code-block:: console\n\n    cp config.sample.yaml config.yaml\n    $EDITOR config.yaml\n\nLastly, start the server:\n\n.. code-block:: console\n\n    deepspeech-server --config config.yaml\n\nServer configuration\n=====================\n\nThe configuration is done with a yaml file, provided with the \"--config\" argument.\nIts structure is the following one:\n\n.. code-block:: yaml\n\n    coqui:\n      model: coqui-1.0.tflite\n      scorer: huge-vocabulary.scorer\n      beam_width: 500\n    server:\n      http:\n        host: \"0.0.0.0\"\n        port: 8080\n        request_max_size: 1048576\n    log:\n      level:\n        - logger: deepspeech_server\n          level: DEBUG\n\nThe configuration file contains several sections and sub-sections.\n\ncoqui section configuration\n---------------------------\n\nSection \"coqui\" contains configuration of the coqui-stt engine:\n\n**model**: The model that was trained by coqui. Must be a tflite (TensorFlow Lite) file.\n\n**scorer**: [Optional] The scorer file. Use this to tune the STT to understand certain phrases better.\n\n**lm_alpha**: [Optional] alpha hyperparameter for the scorer.\n\n**lm_beta**: [Optional] beta hyperparameter for the scorer.\n\n**beam_width**: [Optional] The size of the beam search. Corresponds directly to how long decoding takes.\n\nhttp section configuration\n--------------------------\n\n**request_max_size** (default value: 1048576, i.e. 1MiB) is the maximum payload\nsize allowed by the server. A received payload size above this threshold will\nreturn a \"413: Request Entity Too Large\" error.\n\n**host**  The listen address of the http server.\n\n**port** The listening port of the http server.\n\nlog section configuration\n-------------------------\n\nThe log section can be used to set the log levels of the server. This section\ncontains a list of log entries. Each log entry contains the name of a **logger** \nand its **level**. Both follow the convention of the python logging module.\n\n\nUsing the server\n================\n\nInference on the model is done via http post requests. For example with the\nfollowing curl command:\n\n.. code-block:: console\n\n     curl -X POST --data-binary @testfile.wav http://localhost:8080/stt\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmainro%2Fdeepspeech-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmainro%2Fdeepspeech-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmainro%2Fdeepspeech-server/lists"}