{"id":20581638,"url":"https://github.com/rhasspy/wyoming","last_synced_at":"2025-04-12T23:32:59.770Z","repository":{"id":197266659,"uuid":"698319158","full_name":"rhasspy/wyoming","owner":"rhasspy","description":"Peer-to-peer protocol for voice assistants","archived":false,"fork":false,"pushed_at":"2024-08-09T20:41:47.000Z","size":79,"stargazers_count":103,"open_issues_count":12,"forks_count":17,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-09-04T10:45:05.226Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rhasspy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-29T16:37:53.000Z","updated_at":"2024-08-29T16:27:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"79a9dcbc-8b76-4c6c-856f-f8c80e2b604f","html_url":"https://github.com/rhasspy/wyoming","commit_stats":null,"previous_names":["rhasspy/wyoming"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhasspy%2Fwyoming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhasspy%2Fwyoming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhasspy%2Fwyoming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhasspy%2Fwyoming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rhasspy","download_url":"https://codeload.github.com/rhasspy/wyoming/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248647257,"owners_count":21139081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T06:29:48.234Z","updated_at":"2025-04-12T23:32:59.741Z","avatar_url":"https://github.com/rhasspy.png","language":"Python","funding_links":[],"categories":["Install from Source","1. Local Agents"],"sub_categories":["Voice Assistants","Audio / Voice Agents"],"readme":"# Wyoming Protocol\n\nA peer-to-peer protocol for voice assistants (basically [JSONL](https://jsonlines.org/) + PCM audio)\n\n``` text\n{ \"type\": \"...\", \"data\": { ... }, \"data_length\": ..., \"payload_length\": ... }\\n\n\u003cdata_length bytes (optional)\u003e\n\u003cpayload_length bytes (optional)\u003e\n```\n\nUsed in [Rhasspy](https://github.com/rhasspy/rhasspy3/) and [Home Assistant](https://www.home-assistant.io/integrations/wyoming) for communication with voice services.\n\n[![An open standard from the Open Home Foundation](https://www.openhomefoundation.org/badges/ohf-open-standard.png)](https://www.openhomefoundation.org/)\n\n## Wyoming Projects\n\n* Voice satellites\n    * [Satellite](https://github.com/rhasspy/wyoming-satellite) for Home Assistant \n* Audio input/output\n    * [mic-external](https://github.com/rhasspy/wyoming-mic-external)\n    * [snd-external](https://github.com/rhasspy/wyoming-snd-external)\n    * [SDL2](https://github.com/rhasspy/wyoming-sdl2)\n* Wake word detection\n    * [openWakeWord](https://github.com/rhasspy/wyoming-openwakeword)\n    * [porcupine1](https://github.com/rhasspy/wyoming-porcupine1)\n    * [snowboy](https://github.com/rhasspy/wyoming-snowboy)\n    * [microWakeWord](https://github.com/rhasspy/wyoming-microwakeword)\n* Speech-to-text\n    * [Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)\n    * [Vosk](https://github.com/rhasspy/wyoming-vosk)\n    * [Whisper.cpp](https://github.com/rhasspy/wyoming-whisper-cpp)\n* Text-to-speech\n    * [Piper](https://github.com/rhasspy/wyoming-piper)\n* Intent handling\n    * [handle-external](https://github.com/rhasspy/wyoming-handle-external)\n\n## Format\n\n1. A JSON object header as a single line with `\\n` (UTF-8, required)\n    * `type` - event type (string, required)\n    * `data` - event data (object, optional)\n    * `data_length` - bytes of additional data (int, optional)\n    * `payload_length` - bytes of binary payload (int, optional)\n2. Additional data (UTF-8, optional)\n    * JSON object with additional event-specific data\n    * Merged on top of header `data`\n    * Exactly `data_length` bytes long\n    * Immediately follows header `\\n`\n3. Payload\n    * Typically PCM audio but can be any binary data\n    * Exactly `payload_length` bytes long\n    * Immediately follows additional data or header `\\n` if no additional data\n\n\n## Event Types\n\nAvailable events with `type` and fields.\n\n### Audio\n\nSend raw audio and indicate begin/end of audio streams.\n\n* `audio-chunk` - chunk of raw PCM audio\n    * `rate` - sample rate in hertz (int, required)\n    * `width` - sample width in bytes (int, required)\n    * `channels` - number of channels (int, required)\n    * `timestamp` - timestamp of audio chunk in milliseconds (int, optional)\n    * Payload is raw PCM audio samples\n* `audio-start` - start of an audio stream\n    * `rate` - sample rate in hertz (int, required)\n    * `width` - sample width in bytes (int, required)\n    * `channels` - number of channels (int, required)\n    * `timestamp` - timestamp in milliseconds (int, optional)\n* `audio-stop` - end of an audio stream\n    * `timestamp` - timestamp in milliseconds (int, optional)\n    \n    \n### Info\n\nDescribe available services.\n\n* `describe` - request for available voice services\n* `info` - response describing available voice services\n    * `asr` - list speech recognition services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `tts` - list text to speech services (optional)\n        * `models` - list of available models\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `speakers` - list of speakers (optional)\n                * `name` - unique name of speaker (required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `wake` - list wake word detection services( optional )\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `handle` - list intent handling services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `intent` - list intent recognition services (optional)\n        * `models` - list of available models (required)\n            * `name` - unique name (required)\n            * `languages` - supported languages by model (list of string, required)\n            * `attribution` (required)\n                * `name` - name of creator (required)\n                * `url` - URL of creator (required)\n            * `installed` - true if currently installed (bool, required)\n            * `description` - human-readable description (string, optional)\n            * `version` - version of the model (string, optional)\n    * `satellite` - information about voice satellite (optional)\n        * `area` - name of area where satellite is located (string, optional)\n        * `has_vad` - true if the end of voice commands will be detected locally (boolean, optional)\n        * `active_wake_words` - list of wake words that are actively being listend for (list of string, optional)\n        * `max_active_wake_words` - maximum number of local wake words that can be run simultaneously (number, optional)\n        * `supports_trigger` - true if satellite supports remotely-triggered pipelines\n    * `mic` - list of audio input services (optional)\n        * `mic_format` - audio input format (required)\n            * `rate` - sample rate in hertz (int, required)\n            * `width` - sample width in bytes (int, required)\n            * `channels` - number of channels (int, required)\n    * `snd` - list of audio output services (optional)\n        * `snd_format` - audio output format (required)\n            * `rate` - sample rate in hertz (int, required)\n            * `width` - sample width in bytes (int, required)\n            * `channels` - number of channels (int, required)\n    \n### Speech Recognition\n\nTranscribe audio into text.\n\n* `transcribe` - request to transcribe an audio stream\n    * `name` - name of model to use (string, optional)\n    * `language` - language of spoken audio (string, optional)\n    * `context` - context from previous interactions (object, optional)\n* `transcript` - response with transcription\n    * `text` - text transcription of spoken audio (string, required)\n    * `context` - context for next interaction (object, optional)\n\n### Text to Speech\n\nSynthesize audio from text.\n\n* `synthesize` - request to generate audio from text\n    * `text` - text to speak (string, required)\n    * `voice` - use a specific voice (optional)\n        * `name` - name of voice (string, optional)\n        * `language` - language of voice (string, optional)\n        * `speaker` - speaker of voice (string, optional)\n    \n### Wake Word\n\nDetect wake words in an audio stream.\n\n* `detect` - request detection of specific wake word(s)\n    * `names` - wake word names to detect (list of string, optional)\n* `detection` - response when detection occurs\n    * `name` - name of wake word that was detected (int, optional)\n    * `timestamp` - timestamp of audio chunk in milliseconds when detection occurred (int optional)\n* `not-detected` - response when audio stream ends without a detection\n\n### Voice Activity Detection\n\nDetects speech and silence in an audio stream.\n\n* `voice-started` - user has started speaking\n    * `timestamp` - timestamp of audio chunk when speaking started in milliseconds (int, optional)\n* `voice-stopped` - user has stopped speaking\n    * `timestamp` - timestamp of audio chunk when speaking stopped in milliseconds (int, optional)\n    \n### Intent Recognition\n\nRecognizes intents from text.\n\n* `recognize` - request to recognize an intent from text\n    * `text` - text to recognize (string, required)\n    * `context` - context from previous interactions (object, optional)\n* `intent` - response with recognized intent\n    * `name` - name of intent (string, required)\n    * `entities` - list of entities (optional)\n        * `name` - name of entity (string, required)\n        * `value` - value of entity (any, optional)\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n* `not-recognized` - response indicating no intent was recognized\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n\n### Intent Handling\n\nHandle structured intents or text directly.\n\n* `handled` - response when intent was successfully handled\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n* `not-handled` - response when intent was not handled\n    * `text` - response for user (string, optional)\n    * `context` - context for next interactions (object, optional)\n\n### Audio Output\n\nPlay audio stream.\n\n* `played` - response when audio finishes playing\n\n### Voice Satellite\n\nControl of one or more remote voice satellites connected to a central server.\n\n* `run-satellite` - informs satellite that server is ready to run pipelines\n* `pause-satellite` - informs satellite that server is not ready anymore to run pipelines\n* `satellite-connected` - satellite has connected to the server\n* `satellite-disconnected` - satellite has been disconnected from the server\n* `streaming-started` - satellite has started streaming audio to the server\n* `streaming-stopped` - satellite has stopped streaming audio to the server\n\nPipelines are run on the server, but can be triggered remotely from the server as well.\n\n* `run-pipeline` - runs a pipeline on the server or asks the satellite to run it when possible\n    * `start_stage` - pipeline stage to start at (string, required)\n    * `end_stage` - pipeline stage to end at (string, required)\n    * `wake_word_name` - name of detected wake word that started this pipeline (string, optional)\n        * From client only\n    * `wake_word_names` - names of wake words to listen for (list of string, optional)\n        * From server only\n        * `start_stage` must be \"wake\"\n    * `announce_text` - text to speak on the satellite\n        * From server only\n        * `start_stage` must be \"tts\"\n    * `restart_on_end` - true if the server should re-run the pipeline after it ends (boolean, default is false)\n        * Only used for always-on streaming satellites\n\n### Timers\n\n* `timer-started` - a new timer has started\n    * `id` - unique id of timer (string, required)\n    * `total_seconds` - number of seconds the timer should run for (int, required)\n    * `name` - user-provided name for timer (string, optional)\n    * `start_hours` - hours the timer should run for as spoken by user (int, optional)\n    * `start_minutes` - minutes the timer should run for as spoken by user (int, optional)\n    * `start_seconds` - seconds the timer should run for as spoken by user (int, optional)\n    * `command` - optional command that the server will execute when the timer is finished\n        * `text` - text of command to execute (string, required)\n        * `language` - language of the command (string, optional)\n* `timer-updated` - timer has been paused/resumed or time has been added/removed\n    * `id` - unique id of timer (string, required)\n    * `is_active` - true if timer is running, false if paused (bool, required)\n    * `total_seconds` - number of seconds that the timer should run for now (int, required)\n* `timer-cancelled` - timer was cancelled\n    * `id` - unique id of timer (string, required)\n* `timer-finished` - timer finished without being cancelled\n    * `id` - unique id of timer (string, required)\n\n## Event Flow\n\n* \u0026rarr; is an event from client to server\n* \u0026larr; is an event from server to client\n\n\n### Service Description\n\n1. \u0026rarr; `describe` (required) \n2. \u0026larr; `info` (required)\n\n\n### Speech to Text\n\n1. \u0026rarr; `transcribe` event with `name` of model to use or `language` (optional)\n2. \u0026rarr; `audio-start` (required)\n3. \u0026rarr; `audio-chunk` (required)\n    * Send audio chunks until silence is detected\n4. \u0026rarr; `audio-stop` (required)\n5. \u0026larr; `transcript`\n    * Contains text transcription of spoken audio\n\n### Text to Speech\n\n1. \u0026rarr; `synthesize` event with `text` (required)\n2. \u0026larr; `audio-start`\n3. \u0026larr; `audio-chunk`\n    * One or more audio chunks\n4. \u0026larr; `audio-stop`\n\n### Wake Word Detection\n\n1. \u0026rarr; `detect` event with `names` of wake words to detect (optional)\n2. \u0026rarr; `audio-start` (required)\n3. \u0026rarr; `audio-chunk` (required)\n    * Keep sending audio chunks until a `detection` is received\n4. \u0026larr; `detection`\n    * Sent for each wake word detection \n5. \u0026rarr; `audio-stop` (optional)\n    * Manually end audio stream\n6. \u0026larr; `not-detected`\n    * Sent after `audio-stop` if no detections occurred\n    \n### Voice Activity Detection\n\n1. \u0026rarr; `audio-chunk` (required)\n    * Send audio chunks until silence is detected\n2. \u0026larr; `voice-started`\n    * When speech starts\n3. \u0026larr; `voice-stopped`\n    * When speech stops\n    \n### Intent Recognition\n\n1. \u0026rarr; `recognize` (required)\n2. \u0026larr; `intent` if successful\n3. \u0026larr; `not-recognized` if not successful\n\n### Intent Handling\n\nFor structured intents:\n\n1. \u0026rarr; `intent` (required)\n2. \u0026larr; `handled` if successful\n3. \u0026larr; `not-handled` if not successful\n\nFor text only:\n\n1. \u0026rarr; `transcript` with `text` to handle (required)\n2. \u0026larr; `handled` if successful\n3. \u0026larr; `not-handled` if not successful\n    \n### Audio Output\n\n1. \u0026rarr; `audio-start` (required)\n2. \u0026rarr; `audio-chunk` (required)\n    * One or more audio chunks\n3. \u0026rarr; `audio-stop` (required)\n4. \u0026larr; `played`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhasspy%2Fwyoming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frhasspy%2Fwyoming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhasspy%2Fwyoming/lists"}