{"id":50672032,"url":"https://github.com/eja/s2s","last_synced_at":"2026-06-08T12:04:03.685Z","repository":{"id":359766595,"uuid":"1246579004","full_name":"eja/s2s","owner":"eja","description":"Lightweight local STT/TTS API server powered by Rust and sherpa-onnx.","archived":false,"fork":false,"pushed_at":"2026-05-23T10:26:49.000Z","size":47,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T12:12:59.207Z","etag":null,"topics":["api","rust","stt","tts"],"latest_commit_sha":null,"homepage":"https://eja.it","language":"Rust","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eja.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-22T10:33:37.000Z","updated_at":"2026-05-23T10:24:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eja/s2s","commit_stats":null,"previous_names":["eja/s2s"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/eja/s2s","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eja%2Fs2s","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eja%2Fs2s/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eja%2Fs2s/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eja%2Fs2s/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eja","download_url":"https://codeload.github.com/eja/s2s/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eja%2Fs2s/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34061124,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","rust","stt","tts"],"created_at":"2026-06-08T12:04:02.255Z","updated_at":"2026-06-08T12:04:03.680Z","avatar_url":"https://github.com/eja.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# S2S\n\nA high-performance, lightweight API server written in Rust that provides local, privacy-conscious Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities. By leveraging the `sherpa-onnx` framework, S2S offers efficient local inference with minimal latency, requiring no external cloud dependencies.\n\nThe project aims to provide a drop-in local alternative for speech processing, featuring an API structure inspired by industry standards.\n\n## Key Features\n\n- **Local Inference:** All processing is done locally on your hardware.\n- **Request Tracing:** Integrated logging providing real-time insights into IP addresses, status codes, and request latency.\n- **Automated Model Management:** Built-in bootstrap logic to download necessary models automatically when using the `--auto` flag.\n- **Flexible Service Fallbacks:** The server starts as long as at least one model is present. If only one model is loaded, requests to the missing service will return `404 Not Found`.\n- **OpenAI-Compatible Voice Directory:** Exposes a standard `/v1/audio/voices` list, allowing client integrations to discover voices dynamically.\n- **Broad STT Language Support:** Supports 25+ languages including English, Spanish, German, French, Russian, and many more.\n- **Flexible TTS:** Integration with the **Kokoro** model, supporting over 50 distinct voices across 9 major languages.\n- **Robust STT:** Powered by the **NVIDIA Parakeet TDT** model for accurate transcriptions.\n\n---\n\n## Getting Started\n\n### Installation\nDownload the latest executable for your platform from the [Releases](https://github.com/eja/s2s/releases) page.\n\n### Running the Server\nThe application requires at least one of the two models to be present locally in order to run. Execute the binary to start the server:\n\n```bash\n./s2s\n```\n\nIf neither model is found on your system, the server will inform you and exit. You can instruct the server to automatically download and configure the required ONNX models (~1GB total) by specifying the `--auto` flag:\n\n```bash\n./s2s --auto\n```\n\n### Configuration Options\nThe server can be customized via command-line arguments:\n\n| Argument | Description | Default |\n| :--- | :--- | :--- |\n| `--host` | The IP address to bind the server to | `127.0.0.1` |\n| `--port` | The port to listen on | `35248` |\n| `--kokoro` | Path to the Kokoro TTS model directory | `./models/kokoro...` |\n| `--parakeet` | Path to the Parakeet STT model directory | `./models/sherpa...` |\n| `--threads` | Number of threads for inference | `4` |\n| `--auto` | Automatically download missing models | `false` |\n| `--log` | Path to a file for persistent logging | `stderr` |\n\n---\n\n## API Reference\n\n\u003e **Note:** If the TTS or STT model is missing at startup, the server still launches, but any requests to the missing endpoints will return `404 Not Found`.\n\n### 1. Speech-to-Text (STT)\n**Endpoint:** `POST /v1/audio/transcriptions`\n\nTranscribe an audio file to text. The endpoint expects a `multipart/form-data` request containing a WAV file. The model automatically detects the language from the supported list.\n\n**Request:**\n```bash\ncurl http://127.0.0.1:35248/v1/audio/transcriptions \\\n  -H \"Content-Type: multipart/form-data\" \\\n  -F \"file=@audio.wav\"\n```\n\n**Response:**\n```json\n{\n  \"text\": \"Hello world, this is a local transcription.\"\n}\n```\n\n### 2. Text-to-Speech (TTS)\n**Endpoint:** `POST /v1/audio/speech`\n\nSynthesize text into audio.\n\n**Request Body:**\n| Field | Type | Description |\n| :--- | :--- | :--- |\n| `input` | String | The text to be synthesized |\n| `voice` | String | (Optional) The voice ID (Default: `af_alloy`) |\n\n**Example:**\n```bash\ncurl http://127.0.0.1:35248/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"input\": \"Hello, I am a locally hosted voice.\",\n    \"voice\": \"af_bella\"\n  }' --output output.wav\n```\n\n### 3. Voice Discovery\n**Endpoint:** `GET /v1/audio/voices`\n\nRetrieve the list of available TTS voices sorted alphabetically.\n\n**Request:**\n```bash\ncurl http://127.0.0.1:35248/v1/audio/voices\n```\n\n**Response:**\n```json\n{\n  \"voices\": [\n    { \"id\": \"af_alloy\", \"name\": \"af_alloy\" },\n    { \"id\": \"af_aoede\", \"name\": \"af_aoede\" }\n  ]\n}\n```\n\n---\n\n## Language \u0026 Voice Support\n\n### Speech-to-Text (STT) Languages\nS2S supports transcription for the following languages:\n\n| | | | | |\n| :--- | :--- | :--- | :--- | :--- |\n| Bulgarian (`bg`) | Croatian (`hr`) | Czech (`cs`) | Danish (`da`) | Dutch (`nl`) |\n| English (`en`) | Estonian (`et`) | Finnish (`fi`) | French (`fr`) | German (`de`) |\n| Greek (`el`) | Hungarian (`hu`) | Italian (`it`) | Latvian (`lv`) | Lithuanian (`lt`) |\n| Maltese (`mt`) | Polish (`pl`) | Portuguese (`pt`) | Romanian (`ro`) | Slovak (`sk`) |\n| Slovenian (`sl`) | Spanish (`es`) | Swedish (`sv`) | Russian (`ru`) | Ukrainian (`uk`) |\n\n### Text-to-Speech (TTS) Voices\nFor TTS, the language is determined automatically based on the prefix of the selected voice.\n\n| Language | Voice Prefix | Examples |\n| :--- | :--- | :--- |\n| **English (US)** | `af_`, `am_` | `af_alloy`, `af_sky`, `am_adam`, `am_echo` |\n| **English (UK)** | `bf_`, `bm_` | `bf_alice`, `bm_daniel` |\n| **Spanish** | `ef_`, `em_` | `ef_dora`, `em_alex` |\n| **French** | `ff_` | `ff_siwis` |\n| **Hindi** | `hf_`, `hm_` | `hf_alpha`, `hm_psi` |\n| **Italian** | `if_`, `im_` | `if_sara`, `im_nicola` |\n| **Japanese** | `jf_`, `jm_` | `jf_alpha`, `jm_kumo` |\n| **Portuguese** | `pf_`, `pm_` | `pf_dora`, `pm_santa` |\n| **Chinese** | `zf_`, `zm_` | `zf_xiaobei`, `zm_yunxi` |\n\n---\n\n## Requirements\n\n- **Operating System:** Linux, macOS, or Windows.\n- **Audio Format:** For STT, input must be in **WAV** format (16kHz mono recommended).\n- **Disk Space:** Approximately 1.5GB for models and dependencies.\n\n## Acknowledgments\n\n- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for the underlying inference engine.\n- [Kokoro](https://github.com/hexgrad/Kokoro) for the TTS weights.\n- [NVIDIA](https://nvidia.com) for the Parakeet TDT ASR models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feja%2Fs2s","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feja%2Fs2s","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feja%2Fs2s/lists"}