{"id":50412315,"url":"https://github.com/cskwork/supertonic-tts","last_synced_at":"2026-05-31T04:04:54.122Z","repository":{"id":358481957,"uuid":"1240815943","full_name":"cskwork/supertonic-tts","owner":"cskwork","description":"Local Supertonic 3 text-to-speech: web app (WebGPU/WASM, EN/KO/JA) + cross-platform CLI (supertts) with auto-play","archived":false,"fork":false,"pushed_at":"2026-05-17T15:02:55.000Z","size":73,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-17T17:26:22.765Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://cskwork.github.io/supertonic-tts/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cskwork.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-16T15:52:53.000Z","updated_at":"2026-05-17T15:12:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cskwork/supertonic-tts","commit_stats":null,"previous_names":["cskwork/supertonic-tts"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/cskwork/supertonic-tts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Fsupertonic-tts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Fsupertonic-tts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Fsupertonic-tts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Fsupertonic-tts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cskwork","download_url":"https://codeload.github.com/cskwork/supertonic-tts/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cskwork%2Fsupertonic-tts/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33718496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-31T04:04:53.547Z","updated_at":"2026-05-31T04:04:54.117Z","avatar_url":"https://github.com/cskwork.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Supertonic TTS — Web App + CLI\n\nA clean, beginner-friendly text-to-speech project built on\n[Supertonic 3](https://github.com/supertone-inc/supertonic). Two ways to use it:\n\n- **Web app** — Three ready-made UI languages (English, Korean, Japanese), six\n  preset voices with one-tap preview, paste-or-upload input (`.txt` / `.docx`),\n  instant WAV download. Runs entirely in your browser via WebGPU/WebAssembly.\n- **CLI** — `supertonic-tts \"hello\"` from any terminal on macOS, Windows, or\n  Linux. Installed globally with `npm`, native ONNX runtime, no GPU required.\n\nNo accounts, no API keys, no cloud round-trips.\n\n## Features\n\n- **3 UI languages**: English, Korean, Japanese\n- **32 TTS language tags** available in the underlying Supertonic text processor\n- **6 voice styles** with click-to-preview\n- **Paste or upload**: drop in `.txt` or `.docx`\n- **Sample text presets** per language\n- **One-tap \"Speak\"** with autoplay + transcript view\n- **WAV download** of any generated audio\n- **WebGPU acceleration** with automatic WASM fallback\n- **Fully local**: text never leaves the browser\n\n## Supported TTS options\n\nThe app has two language layers:\n\n- **Current UI choices**: English (`en`), Korean (`ko`), Japanese (`ja`).\n  These are the languages with ready-made sample text, preview text, and UI\n  tabs in `app/main.js`.\n- **Underlying Supertonic language tags**: `en`, `ko`, `ja`, `ar`, `bg`, `cs`,\n  `da`, `de`, `el`, `es`, `et`, `fi`, `fr`, `hi`, `hr`, `hu`, `id`, `it`,\n  `lt`, `lv`, `nl`, `pl`, `pt`, `ro`, `ru`, `sk`, `sl`, `sv`, `tr`, `uk`,\n  `vi`, `na`. These are accepted by the text processor in `app/helper.js`.\n\nTo expose another language in the UI, add an entry to `LANGS` in `app/main.js`\nwith preview and preset text, then add or render the matching language tab.\n\n### Voice styles\n\nEvery voice style can be used with every supported TTS language tag:\n\n| ID | Display name | Type | Style file |\n| --- | --- | --- | --- |\n| `F1` | Mina | Female | `voice_styles/F1.json` |\n| `F2` | Sora | Female | `voice_styles/F2.json` |\n| `F3` | Yuna | Female | `voice_styles/F3.json` |\n| `M1` | Aiden | Male | `voice_styles/M1.json` |\n| `M2` | Hiro | Male | `voice_styles/M2.json` |\n| `M3` | Leo | Male | `voice_styles/M3.json` |\n\n`F1` / Mina is the default voice. Voice styles are downloaded from\n`Supertone/supertonic-3` and loaded on demand from `assets/voice_styles/` in\ndevelopment, or from the Hugging Face CDN in production.\n\n### Model/runtime options\n\n- **TTS model family**: Supertonic 3 from `Supertone/supertonic-3`.\n- **ONNX model files**: `duration_predictor.onnx`, `text_encoder.onnx`,\n  `vector_estimator.onnx`, `vocoder.onnx`.\n- **Runtime**: WebGPU first, then WebAssembly fallback.\n- **Generation controls**: quality steps from 4 to 16, and speed from 0.7 to\n  1.8.\n- **Output**: mono 44.1 kHz, 16-bit PCM WAV generated locally in the browser.\n\n## CLI\n\nA standalone Node CLI ships in this package. Install once and run from any\ndirectory. Two equivalent commands are exposed: short (`supertts`) and full\n(`supertonic-tts`).\n\n```bash\n# global install — Windows, macOS, Linux\nnpm install -g supertonic-tts\n\n# simplest form — positional text, auto-detects KO/JA/EN\nsupertts \"Hello from Supertonic!\"\nsupertts \"안녕하세요\"\nsupertts \"こんにちは\" --voice M1\n\n# explicit flags\nsupertts -t \"Hi there\" -o hi.wav --voice F2\nsupertts -f input.txt --lang ko -o out.wav\necho \"piped text\" | supertts -o piped.wav\n```\n\nOn the first synth, model assets (~380 MB) are auto-downloaded from Hugging\nFace into a platform-appropriate user cache:\n\n| Platform | Default assets directory |\n| --- | --- |\n| Windows | `%LOCALAPPDATA%\\supertonic-tts\\assets` |\n| macOS   | `~/Library/Caches/supertonic-tts/assets` |\n| Linux   | `$XDG_CACHE_HOME/supertonic-tts/assets` (or `~/.cache/...`) |\n\nOverride with `--assets \u003cdir\u003e` or the `SUPERTONIC_ASSETS` env var. Pre-fetch\nwithout synthesizing via `supertonic-tts --download`.\n\n### CLI flags\n\n| Flag | Default | Description |\n| --- | --- | --- |\n| `-t, --text \u003cs\u003e` | — | inline text |\n| `-f, --file \u003cp\u003e` | — | read text from a `.txt` file |\n| `-o, --out \u003cp\u003e`  | `./out-\u003ctimestamp\u003e.wav` | output WAV path |\n| `-l, --lang \u003cc\u003e` | auto | language tag (auto-detects ko/ja/en; see `--list-langs`) |\n| `-v, --voice \u003cid\u003e` | `F1` | voice id: `F1`–`F3`, `M1`–`M3` |\n| `-s, --speed \u003cn\u003e` | `1.05` | 0.7 – 1.8 |\n| `--steps \u003cn\u003e` | `8` | quality steps 4 – 16 |\n| `--silence \u003cs\u003e` | `0.3` | inter-chunk pause (sec) |\n| `--assets \u003cdir\u003e` | auto | override assets directory |\n| `--download` | — | only fetch / verify assets |\n| `--no-play` | — | don't auto-play the generated WAV |\n| `--list-voices` | — | print voice catalog |\n| `--list-langs` | — | print supported language tags |\n| `-q, --quiet` | — | suppress progress logs |\n| `-h, --help` | — | show help |\n\nBy default the generated WAV plays back immediately\n(macOS `afplay`, Windows `Media.SoundPlayer`, Linux `paplay`/`aplay`/`play`/\n`ffplay`). Playback is blocking — the command returns once the audio has\nfinished. Pass `--no-play` for batch / scripted usage.\n\nThe CLI prints the output path on `stdout` (one line, easy to pipe). All\nprogress / status messages go to `stderr`.\n\n```bash\n# capture the output path without playback\nOUT=$(supertts \"audio test\" --quiet --no-play)\necho \"wrote $OUT\"\n```\n\n## Web app quick start\n\nRequires Node.js 18+ only. Model assets (~380 MB) are streamed directly from\nHugging Face — no `git-lfs` needed.\n\n```bash\n# Install + auto-download the model assets\nnpm install\n\n# Start the dev server (opens http://localhost:3000)\nnpm run dev\n```\n\nIf the asset download was interrupted, just re-run it; existing files are\nskipped automatically:\n\n```bash\nnpm run assets\n```\n\n## Production build\n\n```bash\nnpm run build     # outputs to ./dist\nnpm start         # serves ./dist on http://localhost:3000\n```\n\nIn production builds, the app **fetches model weights directly from the\nHugging Face CDN at runtime** (`huggingface.co/Supertone/supertonic-3`),\nso deployments don't have to ship the 380 MB of `.onnx` files. The CDN sets\nproper CORS headers and long cache lifetimes.\n\n## Deploying\n\n### GitHub Pages (zero-config)\n\nA workflow at `.github/workflows/deploy.yml` builds and publishes on every\npush to `main`.\n\n1. Push the repo to GitHub\n2. In repo settings → Pages → Build and deployment → Source: **GitHub Actions**\n3. Push to `main` (or trigger the workflow manually)\n4. App is live at `https://\u003cuser\u003e.github.io/\u003crepo\u003e/`\n\nThe workflow sets `VITE_BASE=/\u003crepo\u003e/` so all relative URLs resolve under\nthe subpath. No model files are uploaded to Pages.\n\n### Vercel\n\n```bash\nvercel --prod\n```\n\n`vercel.json` is already configured with:\n\n- `Cross-Origin-Opener-Policy: same-origin`\n- `Cross-Origin-Embedder-Policy: credentialless` (enables faster\n  multi-threaded WASM where supported)\n- Long-cache headers for `/assets/*`\n- `.vercelignore` excludes the local `assets/` directory from upload\n\n### Self-hosting\n\n`npm run build` emits a fully static `./dist` directory — serve it with any\nstatic host (nginx, Caddy, Cloudflare Pages, S3 + CloudFront, etc.). If you\nalso want multi-threaded WASM acceleration, send these response headers:\n\n```\nCross-Origin-Opener-Policy: same-origin\nCross-Origin-Embedder-Policy: credentialless\n```\n\n## Project layout\n\n```\n.\n├── app/                  # Vite project root (the web app)\n│   ├── index.html\n│   ├── main.js           # UI + synthesis orchestration\n│   ├── helper.js         # Supertonic ONNX runtime helpers\n│   └── style.css\n├── assets/               # Model weights \u0026 voice styles (downloaded)\n│   ├── onnx/*.onnx\n│   ├── onnx/tts.json\n│   ├── onnx/unicode_indexer.json\n│   └── voice_styles/*.json\n├── scripts/\n│   └── download-assets.mjs\n├── vite.config.js\n└── package.json\n```\n\n## How it works\n\n1. The browser loads four ONNX models (duration predictor, text encoder,\n   vector estimator, vocoder) and a voice style tensor.\n2. Your text is preprocessed (NFKD-normalised, emoji-stripped, wrapped with\n   the language tag) and converted to token IDs.\n3. A short diffusion loop denoises a latent audio representation.\n4. The vocoder synthesises 44.1 kHz, 16-bit PCM. The WAV file is built\n   client-side and offered for playback / download.\n\nEvery step runs locally — your text and the generated audio never leave\nthe device.\n\n## Troubleshooting\n\n- **\"Loading model\" stays forever**: open DevTools → Network. If the model\n  files (`.onnx`) 404, run `npm run assets` again.\n- **WebGPU disabled**: only modern Chrome / Edge / Safari Tech Preview\n  support WebGPU. The app silently falls back to WebAssembly — slower but\n  works everywhere.\n- **DOCX upload fails**: complex DOCX files with embedded objects may not\n  parse cleanly. Save as plain `.txt` as a fallback.\n- **Korean / Japanese sound rushed**: drop \"Speed\" in Advanced options\n  to ~0.95.\n\n## License\n\nApp code: MIT. Supertonic model weights are subject to\n[Supertone's license](https://huggingface.co/Supertone/supertonic-3).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcskwork%2Fsupertonic-tts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcskwork%2Fsupertonic-tts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcskwork%2Fsupertonic-tts/lists"}