{"id":28074232,"url":"https://github.com/resemble-ai/resemble-live-sts-socket","last_synced_at":"2025-05-12T23:32:55.789Z","repository":{"id":242776973,"uuid":"810418939","full_name":"resemble-ai/resemble-live-sts-socket","owner":"resemble-ai","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-05T02:06:20.000Z","size":54,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-10T06:03:08.877Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/resemble-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-04T16:55:38.000Z","updated_at":"2025-02-08T14:31:31.000Z","dependencies_parsed_at":"2024-06-04T23:57:55.078Z","dependency_job_id":"ef363568-d3a0-49be-881f-ff3287a514a3","html_url":"https://github.com/resemble-ai/resemble-live-sts-socket","commit_stats":null,"previous_names":["resemble-ai/resemble-live-sts-socket"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/resemble-ai%2Fresemble-live-sts-socket","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/resemble-ai%2Fresemble-live-sts-socket/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/resemble-ai%2Fresemble-live-sts-socket/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/resemble-ai%2Fresemble-live-sts-socket/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/resemble-ai","download_url":"https://codeload.github.com/resemble-ai/resemble-live-sts-socket/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253840939,"owners_count":21972569,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-12T23:32:20.123Z","updated_at":"2025-05-12T23:32:55.787Z","avatar_url":"https://github.com/resemble-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Resemble.AI Live STS Socket Client\n\nThis repository contains a fully featured sample script to demonstrate how to connect to the Resemble.AI Live STS server using Socket.IO for real-time speech-to-speech.\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n- [Arguments](#arguments)\n- [Developing Custom Socket Client](#developing-custom-socket-client)\n- [License](#license)\n\n## Installation\n\n1. Clone the repository:\n    ```sh\n    git clone https://github.com/resemble-ai/resemble-live-sts-socket.git  \n    cd resemble-live-sts-socket\n    ```\n\n2. Install the required dependencies:\n    ```sh\n    conda create -n socket_demo python=3.11.4\n    pip install -r requirements.txt\n    ```\n\n## Usage\nRunning the script:\n```sh\npython main.py --url \u003cserver_url\u003e --voice \u003cvoice\u003e\n# Or if you are using authentication\npython main.py --url \u003cserver_url\u003e --voice \u003cvoice\u003e --auth \u003cusername:password\u003e\n```\nIf you do not want to input your microphone and speaker IDs each time, then the use following command with the two IDs you have been choosing:\n```sh\npython main.py --url \u003cserver_url\u003e --voice \u003cvoice\u003e \\\n               --input_device \u003cmicrophone id\u003e \\\n               --output_device \u003cspeaker id\u003e\n```\n\n\n## Arguments\n```yaml\nusage: main.py [-h] --url URL [--auth AUTH] [--debug] [--num_chunks NUM_CHUNKS] [--wave_file_path WAVE_FILE_PATH]\n[--voice VOICE] [--vad VAD] [--gpu GPU] [--extra_convert_size EXTRA_CONVERT_SIZE] [--pitch PITCH]\n[--crossfade_offset_rate CROSSFADE_OFFSET_RATE] [--crossfade_end_rate CROSSFADE_END_RATE]\n[--crossfade_overlap_size CROSSFADE_OVERLAP_SIZE] [--input_device INPUT_DEVICE] [--output_device OUTPUT_DEVICE]\n\nResemble.AI LiveVC socket sample script. Press Ctrl+C to stop.\n\noptions:\n  -h, --help            show this help message and exit\n  --url URL             URL of the server (required)\n  --auth AUTH           ngrok `username:password` for authentication.\n  --debug               Enable debug mode for logging.\n\nclient parameters:\n  --num_chunks NUM_CHUNKS           Number of 2880-frame chunks to send to the server (default: 8).\n  --wave_file_path WAVE_FILE_PATH   Path to save the WAV file (default: output.wav).\n\nvoice parameters:\n  --voice VOICE                                         Name of the voice to use for synthesis.\n  --vad VAD                                             VAD level (0: off, 1: low, 2: medium, 3: high) (default: 1).\n  --gpu GPU                                             CUDA device ID (default: 0).\n  --extra_convert_size EXTRA_CONVERT_SIZE               Amount of context for the server to use (4096, 8192, 16384,\n                                                                             32768, 65536, 131072) (default: 4096).\n  --pitch PITCH                                         Pitch factor (default: 0).\n  --crossfade_offset_rate CROSSFADE_OFFSET_RATE         Crossfade offset rate (0.0 - 1.0) (default: 0.1)\n  --crossfade_end_rate CROSSFADE_END_RATE               Crossfade end rate (0.0 - 1.0) (default: 0.9).\n  --crossfade_overlap_size CROSSFADE_OVERLAP_SIZE       Crossfade overlap size (default: 2048).\n\naudio device selection. If not specified the user will be provided with a list of devices to choose from.:\n  --input_device INPUT_DEVICE, -i INPUT_DEVICE          Index of the input audio device.\n  --output_device OUTPUT_DEVICE, -o OUTPUT_DEVICE       Index of the output audio device.\n```\n## Developing Custom Socket Client\nThis implementation uses Socket.IO to connect to the server and achieves real-time voice conversion on a M2 Macbook Air client machine, but a lower level websocket implementation can be made as well. \n\n### Events\n- **request_conversion**:\n    - Type: `Emit`\n    - Description: Sends audio data to the server\n    - Data type: `AudioData`\n    - Triggers: `on_response`\n- **request_conversion_debug**:\n  - Type: `Emit`\n  - Description: Identical to `request_conversion`, but asks the server to return the unconverted audio. This is good for testing the effects of server latency on local audio stitching on clean audio\n  - Data type: `AudioData`\n  - Triggers: `on_response`\n- **update_model_settings**: \n    - Type: `Emit`\n    - Description: Sends updated settings to the server\n    - Data type: `VoiceSettings`\n    - Triggers: `on_message`\n- **get_settings**:\n    - Type: `Emit`\n    - Description: Requests the current settings dict from the server.\n    - Triggers: `on_message`\n- **get_voices**:\n  - Type: `Emit`\n  - Description: Requests a list of available voices from the server.\n  - Triggers: `on_message`\n- **get_gpus**:\n  - Type `Emit`\n  - Description: Requests a list of available GPUs from the server.\n  - Triggers: `on_message`\n- **on_connect**:\n    - Type: `Response`\n    - Description: Callback for when a connection is established to the server\n- **on_disconnect**:\n    - Type: `Response`\n    - Description: Callback for when the connection to the server is disconnected.\n- **on_response**:\n    - Type: `Response`\n    - Description: Called when the client receives the audio response from the server.\n    - Data type: `MessageResponse`\n- **on_message**:\n    - Type: `Response`\n    - Description: Called when the client receives a message from the server.\n    - Data type: `MessageResponse`\n\n### Data types (`datatypes.py`)\nAll data sent to and from the server will be in the following data types.\n```py\nclass MessageResponse(TypedDict):\n    status: HTTPStatus\n    message: str | dict\n    endpoint: str\nclass AudioData(TypedDict):\n    timestamp: int      # milliseconds\n    audio_data: bytes   # packed little-endian short integers\nclass VoiceSettings(TypedDict):\n    voice: str                 # The name of the voice being used\n    crossFadeOffsetRate: float # 0.0 - 1.0\n    crossFadeEndRate: float    # 0.0 - 1.0 \n    crossFadeOverlapSize: int  # 2048\n    extraConvertSize: Literal[4096, 8192, 16384, 32768, 65536, 131072]\n    gpu: int    # CUDA device ID\n    pitch: float\n    vad: Literal[0, 1, 2, 3] # 0: off, 1: low, 2: medium, 3: high\n```\n\n### Constants (`constants.py`)\nThese constants are configured to be exactly what the server expects. They cannot be changed.\n\n- **SAMPLERATE = 48000**: Sample rate for audio processing.\n- **CHUNK_SIZE_MULT = 2880**: Chunk size multiplier.\n- **AUDIO_FORMAT = 'int16'**: Audio format.\n- **ENDPOINT = '/synthesize'**: Socket endpoint.\n\n---\n\nFor more information, please refer to the code comments and docstrings within the scripts.\n\n## License\n\nThis project is licensed under MIT. See the [LICENSE](LICENSE) file for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fresemble-ai%2Fresemble-live-sts-socket","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fresemble-ai%2Fresemble-live-sts-socket","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fresemble-ai%2Fresemble-live-sts-socket/lists"}