{"id":17464311,"url":"https://github.com/daanzu/deepspeech-websocket-server","last_synced_at":"2025-09-06T09:44:30.958Z","repository":{"id":79700601,"uuid":"152767647","full_name":"daanzu/deepspeech-websocket-server","owner":"daanzu","description":"Server \u0026 client for DeepSpeech using WebSockets for real-time speech recognition in separate environments","archived":false,"fork":false,"pushed_at":"2020-05-29T05:57:54.000Z","size":37,"stargazers_count":102,"open_issues_count":10,"forks_count":32,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-16T06:55:50.454Z","etag":null,"topics":["deepspeech","deepspeech-server","speech-recognition","speech-to-text","websocket"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daanzu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"daanzu","patreon":"daanzu","custom":"https://paypal.me/daanzu"}},"created_at":"2018-10-12T15:02:54.000Z","updated_at":"2024-12-04T18:15:20.000Z","dependencies_parsed_at":"2023-02-28T21:46:26.833Z","dependency_job_id":null,"html_url":"https://github.com/daanzu/deepspeech-websocket-server","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/daanzu/deepspeech-websocket-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fdeepspeech-websocket-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fdeepspeech-websocket-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fdeepspeech-websocket-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fdeepspeech-websocket-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daanzu","download_url":"https://codeload.github.com/daanzu/deepspeech-websocket-server/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fdeepspeech-websocket-server/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263480755,"owners_count":23473158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepspeech","deepspeech-server","speech-recognition","speech-to-text","websocket"],"created_at":"2024-10-18T10:45:20.631Z","updated_at":"2025-07-04T09:02:41.608Z","avatar_url":"https://github.com/daanzu.png","language":"Python","funding_links":["https://github.com/sponsors/daanzu","https://patreon.com/daanzu","https://paypal.me/daanzu","https://www.patreon.com/daanzu"],"categories":[],"sub_categories":[],"readme":"# DeepSpeech WebSocket Server\n\n[![Donate](https://img.shields.io/badge/donate-GitHub-pink.svg)](https://github.com/sponsors/daanzu)\n[![Donate](https://img.shields.io/badge/donate-Patreon-orange.svg)](https://www.patreon.com/daanzu)\n[![Donate](https://img.shields.io/badge/donate-PayPal-green.svg)](https://paypal.me/daanzu)\n[![Donate](https://img.shields.io/badge/preferred-GitHub-black.svg)](https://github.com/sponsors/daanzu)\n[**GitHub** is currently matching all my donations $-for-$.]\n\nThis is a [WebSocket](https://en.wikipedia.org/wiki/WebSocket) server (\u0026 client) for Mozilla's [DeepSpeech](https://github.com/mozilla/DeepSpeech), to allow easy real-time speech recognition, using a separate client \u0026 server that can be run in different environments, either locally or remotely.\n\nWork in progress. Developed to quickly test new models running DeepSpeech in [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about) using microphone input from host Windows. Available to save others some time.\n\n## Features\n\n* Server\n    - Tested and works with DeepSpeech v0.7 (thanks [@Kai-Karren](https://github.com/Kai-Karren))\n    - Streaming inference via DeepSpeech v0.2+\n    - Streams raw audio data from client via WebSocket\n    - Multi-user (only decodes one stream at a time, but can block until decoding is available)\n* Client\n    - Streams raw audio data from microphone to server via WebSocket\n    - Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances\n    - Hypnotizing spinner to indicate voice activity is detected!\n    - Option to automatically save each utterance to a separate .wav file, for later testing\n    - Need to pause/unpause listening? [See here](https://github.com/daanzu/deepspeech-websocket-server/issues/6).\n\n## Installation\n\nThis package is developed in Python 3.\nActivate a virtualenv, then install the requirements for the server and/or client, depending on usage:\n\n```bash\npip install -r requirements-server.txt\n### AND/OR ###\npip install -r requirements-client.txt\n```\n\nTo run the server in an environment, you also need to install DeepSpeech, which requires choosing either the CPU xor GPU version:\n\n```bash\npip install deepspeech\n### XOR ###\npip install deepspeech-gpu\n```\n\nUpgrade to the latest DeepSpeech with `pip install deepspeech --upgrade` (or gpu version). This package works with v0.3.0.\n\nThe client uses `pyaudio` and `portaudio` for microphone access. In my experience, this works out of the box on Windows. \nOn Linux, you may need to install portaudio header files to compile the pyaudio package: `sudo apt install portaudio19-dev` .\nOn MacOS, try installing portaudio with brew: `brew install portaudio` .\n\n## Server\n\n```\n\u003e python server.py --model ../models/daanzu-6h-512l-0001lr-425dr/ -l -t\nInitializing model...\n2018-10-06 AM 05:55:16.357: __main__: INFO: \u003cmodule\u003e(): args.model: ../models/daanzu-6h-512l-0001lr-425dr/output_graph.pb\n2018-10-06 AM 05:55:16.357: __main__: INFO: \u003cmodule\u003e(): args.alphabet: ../models/daanzu-6h-512l-0001lr-425dr/alphabet.txt\nTensorFlow: v1.6.0-18-g5021473\nDeepSpeech: v0.2.0-0-g009f9b6\nWarning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.\n2018-10-06 05:55:16.358385: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\n2018-10-06 AM 05:55:16.395: __main__: INFO: \u003cmodule\u003e(): args.lm: ../models/daanzu-6h-512l-0001lr-425dr/lm.binary\n2018-10-06 AM 05:55:16.395: __main__: INFO: \u003cmodule\u003e(): args.trie: ../models/daanzu-6h-512l-0001lr-425dr/trie\nBottle v0.12.13 server starting up (using GeventWebSocketServer())...\nListening on http://127.0.0.1:8080/\nHit Ctrl-C to quit.\n\n2018-10-06 AM 05:55:30.194: __main__: INFO: echo(): recognized: 'alpha bravo charlie'\n2018-10-06 AM 05:55:32.297: __main__: INFO: echo(): recognized: 'delta echo foxtrot'\n2018-10-06 AM 05:55:54.747: __main__: INFO: echo(): dead websocket\n^CKeyboardInterrupt\n```\n\n```\n\u003e python server.py -h\nusage: server.py [-h] -m MODEL [-a [ALPHABET]] [-l [LM]] [-t [TRIE]] [--lw LW]\n                 [--vwcw VWCW] [--bw BW] [-p PORT]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -m MODEL, --model MODEL\n                        Path to the model (protocol buffer binary file, or\n                        directory containing all files for model)\n  -a [ALPHABET], --alphabet [ALPHABET]\n                        Path to the configuration file specifying the alphabet\n                        used by the network. Default: alphabet.txt\n  -l [LM], --lm [LM]    Path to the language model binary file. Default:\n                        lm.binary\n  -t [TRIE], --trie [TRIE]\n                        Path to the language model trie file created with\n                        native_client/generate_trie. Default: trie\n  --lw LW               The alpha hyperparameter of the CTC decoder. Language\n                        Model weight. Default: 1.5\n  --vwcw VWCW           Valid word insertion weight. This is used to lessen\n                        the word insertion penalty when the inserted word is\n                        part of the vocabulary. Default: 2.25\n  --bw BW               Beam width used in the CTC decoder when building\n                        candidate transcriptions. Default: 1024\n  -p PORT, --port PORT  Port to run server on. Default: 8080\n```\n\n## Client\n\n```\nλ py client.py\nListening...\nRecognized: alpha bravo charlie\nRecognized: delta echo foxtrot\n^C\n```\n\n```\nλ py client.py -h\nusage: client.py [-h] [-s SERVER] [-a AGGRESSIVENESS] [--nospinner]\n                 [-w SAVEWAV]\n\nStreams raw audio data from microphone with VAD to server via WebSocket\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -s SERVER, --server SERVER\n                        Default: ws://localhost:8080/recognize\n  -a AGGRESSIVENESS, --aggressiveness AGGRESSIVENESS\n                        Set aggressiveness of VAD: an integer between 0 and 3,\n                        0 being the least aggressive about filtering out non-\n                        speech, 3 the most aggressive. Default: 3\n  --nospinner           Disable spinner\n  -w SAVEWAV, --savewav SAVEWAV\n                        Save .wav files of utterences to given directory\n```\n\n## Contributions\n\nPull requests welcome.\n\nContributors:\n* [@Zeddy913](https://github.com/Zeddy913)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanzu%2Fdeepspeech-websocket-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaanzu%2Fdeepspeech-websocket-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanzu%2Fdeepspeech-websocket-server/lists"}