{"id":29175845,"url":"https://github.com/deepgram/va-api-spec-v1","last_synced_at":"2026-02-04T05:01:21.064Z","repository":{"id":286030865,"uuid":"940331656","full_name":"deepgram/VA-API-Spec-v1","owner":"deepgram","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-24T18:57:30.000Z","size":40,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-24T19:45:02.092Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepgram.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-28T01:53:28.000Z","updated_at":"2025-04-24T18:57:33.000Z","dependencies_parsed_at":"2025-04-04T00:22:12.079Z","dependency_job_id":"5443fdd2-176c-4068-a221-a21aedcaf92a","html_url":"https://github.com/deepgram/VA-API-Spec-v1","commit_stats":null,"previous_names":["deepgram/va-api-spec-v1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deepgram/VA-API-Spec-v1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2FVA-API-Spec-v1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2FVA-API-Spec-v1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2FVA-API-Spec-v1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2FVA-API-Spec-v1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepgram","download_url":"https://codeload.github.com/deepgram/VA-API-Spec-v1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2FVA-API-Spec-v1/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262996821,"owners_count":23396910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-01T16:11:47.409Z","updated_at":"2026-02-04T05:01:16.039Z","avatar_url":"https://github.com/deepgram.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"### Endpoint\n\n- `wss://agent.deepgram.com/v1/agent/converse`\n\n### Connection\n\nWebSocket-based, real-time bidirectional communication.\n\n---\n\n### Client Messages\n\n| Type                 | Structure                                                                                                                    | Notes                                                                                    |\n| -------------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |\n| Settings             | `{ \"type\": \"Settings\", ...Settings }`                                                                                        | Initializes the voice agent and sets up audio transmission formats                       |\n| UpdatePrompt         | `{ \"type\": \"UpdatePrompt\", \"prompt\": \"\" }`                                                                                   | Allows giving additional instructions to the Think model in the middle of a conversation. Passed as the system_prompt to LLMs |\n| UpdateSpeak          | `{ \"type\": \"UpdateSpeak\", \"speak\": { \"provider\": { \"type\": \"\", \"model\": \"\" }, \"endpoint\": { \"url\": \"\", \"headers\": {} } } }`  | Enables changing the Speak model during the conversation                                 |\n| InjectAgentMessage   | `{ \"type\": \"InjectAgentMessage\", \"content\": \"\" }`                                                                            | Triggers an immediate statement from the agent                                           |\n| FunctionCallResponse | `{ \"type\": \"FunctionCallResponse\", \"id\": \"\", \"name\": \"\", \"content\": \"\" }`                                                    | Sends the result of a function call back to the server                                   |\n| KeepAlive            | `{ \"type\": \"KeepAlive\" }`                                                                                                    | Instructs the server to keep the connection open even if the client isn't sending audio  |\n| Binary Audio         | `[binary data]`                                                                                                              | Audio input per settings                                                                 |\n\n#### Settings Example\n\n```json\n{\n  \"type\": \"Settings\",\n  \"experimental\": false, // default is false\n  \"audio\": {\n    \"input\": { // optional, default is 16kHz linear16\n      \"encoding\": \"\", // string\n      \"sample_rate\": // int\n    },\n    \"output\": { // optional\n      \"encoding\": \"\",\n      \"sample_rate\": ,\n      \"bitrate\": ,\n      \"container\": \"\"\n    }\n  },\n  \"agent\": {\n    \"language\": \"\", // optional\n    \"listen\": { // optional, to deepgram's latest model\n      \"provider\": {\n        \"type\": \"\",\n        \"model\": \"\",\n        \"keyterms\": [\"\"]\n      }\n    },\n    \"think\": {\n      \"provider\": {\n        \"type\": \"\",\n        \"model\": \"\",\n        \"temperature\": // float, optional\n      },\n      \"endpoint\": { // optional\n        \"url\": \"\",\n        \"headers\": { // optional\n          \"key1\": \"val1\",\n          \"key2\": \"val2\",\n          ...\n        }\n      },\n      \"functions\": [ // optional\n        {\n          \"name\": \"\",\n          \"description\": \"\",\n          \"parameters\": {},\n          \"endpoint\": { // optional, if not passed, function called client-side\n            \"url\": \"\",\n            \"method\": \"\",\n            \"headers\": { // optional\n              \"key1\": \"val1\",\n              \"key2\": \"val2\",\n              ...\n            }\n          }\n        }\n      ],\n      \"prompt\": \"\" // optional\n    },\n    \"speak\": { // optional, defaults to latest deepgram TTS model\n      \"provider\": {\n        \"type\": \"\",\n        ... // provider specific fields\n      },\n      \"endpoint\": { // optional if provider.type = 'deepgram', required for non-deepgram TTS providers\n        \"url\": \"\", // pass a `ws` or `wss` url to use the provider's websocket API\n        \"headers\": { // optional\n          \"key1\": \"val1\",\n          \"key2\": \"val2\",\n          ...\n        }\n      }\n    },\n    \"greeting\": \"\" // optional\n  }\n}\n```\n\n#### Function Call Response Example\n\n```json\n{\n  \"type\": \"FunctionCallResponse\",\n  \"id\": \"\",\n  \"name\": \"\",\n  \"content\": \"\"\n}\n```\n\n---\n\n### Server Messages\n\n| Type                         | Structure                                                                                                             | Notes                                                                                  |\n| ---------------------------- | --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |\n| **`welcome`**                | `{ \"type\": \"Welcome\", \"request_id\": \"\" }`                                                                             | Confirms that the WebSocket connection has been successfully opened.                   |\n| **`SettingsApplied`**        | `{ \"type\": \"SettingsApplied\" }`                                                                                       | Confirms that the configuration settings have been applied.                            |\n| **`PromptUpdated`**          | `{ \"type\": \"PromptUpdated\" }`                                                                                         | Confirms that an `UpdatePrompt` message from the client has been applied.              |\n| **`SpeakUpdated`**           | `{ \"type\": \"SpeakUpdated\" }`                                                                                          | Confirms that an `UpdateSpeak` message from the client has been applied.               |\n| **`ConversationText`**       | `{ \"type\": \"ConversationText\", \"role\": \"user\" \\| \"assistant\", \"content\": \"\" }`                                        | Provides the text of what was spoken by either the user or the agent.                  |\n| **`UserStartedSpeaking`**    | `{ \"type\": \"UserStartedSpeaking\" }`                                                                                   | Notifies the client that the user has begun speaking.                                  |\n| **`AgentThinking`**          | `{ \"type\": \"AgentThinking\", \"content\": \"\" }`                                                                          | Informs the client that the agent is processing information.                           |\n| **`FunctionCallRequest`**    | `{ \"type\": \"FunctionCallRequest\", \"functions\": [{ \"id\": \"\", \"name\": \"\", \"arguments\": \"\", \"client_side\": false }] }`   | Sent when the agent makes a function call; may request client-side function execution. |\n| **`FunctionCallResponse`**   | `{ \"type\": \"FunctionCallResponse\", \"id\": \"\", \"name\": \"\", \"content\": \"\" }`                                             | Sent when the agent makes a server-side function call; purely informational.           |\n| **`AgentStartedSpeaking`**   | `{ \"type\": \"AgentStartedSpeaking\" }`                                                                                  | Signals that the server has begun streaming the agent’s audio response.                |\n| **`AgentAudioDone`**         | `{ \"type\": \"AgentAudioDone\" }`                                                                                        | Indicates that the server has finished sending the final audio segment to the client.  |\n| **`Error`**                  | `{ \"type\": \"Error\", \"description\": \"\", \"code\": \"\" }`                                                                                  | Notifies the client of fatal errors that occurred on the server side.                  |\n| **`Warning`**                | `{ \"type\": \"Warning\", \"description\": \"\", \"code\": \"\" }`                                                                                | Notifies the client of non-fatal errors or warnings.                                   |\n| **`Binary Audio`**           | `[binary data]`                                                                                                       | Audio output sent as binary data, per the settings configuration.                      |\n\n---\n\n#### Function Call Request Example\n\n```json\n{\n  \"type\": \"FunctionCallRequest\",\n  \"functions\": [\n    {\n      \"id\": \"\",\n      \"name\": \"\",\n      \"arguments\": \"\",\n      \"client_side\": false\n    }\n  ]\n}\n```\n\n### Settings\n\n| Parameter                      | Type/Details                                                            | Notes                                          |\n| ------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------- |\n| type                           | String, \"Settings\"                                                      | Identifies config type                         |\n| experimental                   | Boolean, default false                                                  | Enables undocumented features                  |\n| audio.input.encoding           | String                                                                  | Input audio encoding                           |\n| audio.input.sample_rate        | Integer                                                                 | Input audio sample rate                        |\n| audio.output.encoding          | String, optional                                                        | Output audio encoding                          |\n| audio.output.sample_rate       | Integer, optional                                                       | Output audio sample rate                       |\n| audio.output.bitrate           | Integer, optional                                                       | Output audio bitrate                           |\n| audio.output.container         | String, optional                                                        | Output file container                          |\n| agent.greeting                 | String, optional                                                        | Message that agent will speak at the start     |\n| agent.listen.provider.type     | \"deepgram\"                                                              | STT provider                                   |\n| agent.listen.provider.model    | String                                                                  | STT model                                      |\n| agent.listen.provider.keyterms | Array of strings, optional                                              | Prompt key-term recognition (nova-3 'en' only) |\n| agent.think                    | can be object or list of objects to support fallback                    | LLM settings                                   |\n| agent.think.provider.type      | \"open_ai\", \"anthropic\", \"x_ai\"                                          | LLM provider (supports name aliases)           |\n| agent.think.provider.model     | String                                                                  | LLM model                                      |\n| agent.think.provider.temp      | Number, optional (0-2 OpenAI, 0-1 Anthropic)                            | Response randomness                            |\n| agent.think.endpoint.url       | String                                                                  | Custom LLM endpoint                            |\n| agent.think.functions          | Array of Function objects                                               | Callable functions                             |\n| agent.think.prompt             | String, optional                                                        | LLM system prompt                              |\n| agent.think.endpoint.headers   | Object, optional                                                        | Custom headers for LLM                         |\n| agent.speak                    | can be object or list of objects to support fallback                    | TTS configuration                              |\n| agent.speak.provider.type      | \"deepgram\", \"eleven_labs\", \"cartesia\", \"open_ai\"                        | TTS provider                                   |\n| agent.speak.endpoint.url       | String                                                                  | Custom TTS endpoint                            |\n| agent.speak.endpoint.headers   | Object, optional                                                        | Custom headers for TTS                         |\n| agent.language                 | String, optional, default \"en\"                                          | Agent language                                 |\n\n#### Provider-Specific Speak Parameters\n\n- **deepgram**: `model`\n- **eleven_labs**: `model_id`, `language_code` (optional)\n- **cartesia**: `model_id`, `voice`, `language` (optional), `mode: \"id\"`, `id`\n- **open_ai**: `model`, `voice`\n\n---\n\n### Notes\n\n- Audio: Binary messages will match `audio.input`/`audio.output` settings.\n- Rollout: 2-week dev/test, then 2-week migration period announced via email/Slack.\n- Internal Features: Hidden unless `experimental: true`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fva-api-spec-v1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepgram%2Fva-api-spec-v1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fva-api-spec-v1/lists"}