https://github.com/shreyaskarnik/svara-tts-webgpu
Browser-native multilingual Indic TTS via WebGPU + transformers.js v4 (Svara + SNAC)
https://github.com/shreyaskarnik/svara-tts-webgpu
Last synced: 10 days ago
JSON representation
Browser-native multilingual Indic TTS via WebGPU + transformers.js v4 (Svara + SNAC)
- Host: GitHub
- URL: https://github.com/shreyaskarnik/svara-tts-webgpu
- Owner: shreyaskarnik
- Created: 2026-04-26T23:02:32.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-06T05:51:57.000Z (about 1 month ago)
- Last Synced: 2026-05-06T07:30:45.816Z (about 1 month ago)
- Language: JavaScript
- Size: 685 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
title: SvΔra TTS WebGPU
emoji: π£οΈ
colorFrom: yellow
colorTo: red
sdk: static
pinned: false
license: apache-2.0
short_description: Multilingual Indic TTS in your browser, via WebGPU
models:
- shreyask/svara-tts-v1-ONNX
- onnx-community/snac_24khz-ONNX
tags:
- text-to-speech
- indic
- webgpu
- on-device
---
# SvΔra TTS Β· WebGPU
Browser-native multilingual TTS for **19 Indian languages** powered by [Svara](https://huggingface.co/kenpath/svara-tts-v1), [SNAC](https://huggingface.co/hubertsiuzdak/snac_24khz), and [Transformers.js v4](https://huggingface.co/docs/transformers.js). Runs 100% locally in the browser after the one-time model download.
This build adds an explicit model load step, browser-side caching, multilingual voice switching, prompt presets, and a WebGPU worker tuned around the ONNX-exported SvΔra model.
## Architecture
```
text β tokenizer β Llama-3.2-3B (q4f16, transformers.js v4 + WebGPU) β
audio token IDs in [128266, 156938) β
group every 7 β SNAC frame (3 hierarchical levels) β
SNAC decoder ONNX (q4f16/fp16 from onnx-community/snac_24khz-ONNX) β
24 kHz mono PCM β WAV blob β
```
## Models
| Repo | Size | Notes |
|------|------|-------|
| [`shreyask/svara-tts-v1-ONNX`](https://huggingface.co/shreyask/svara-tts-v1-ONNX) | ~1.95 GB | Llama-3.2-3B q4f16, GQA, KV-cache |
| [`onnx-community/snac_24khz-ONNX`](https://huggingface.co/onnx-community/snac_24khz-ONNX) | ~26 MB (fp16) | SNAC decoder |
## Run locally
```sh
npm install
npm run dev # http://localhost:5173
```
First run downloads the selected model into the browser cache (LM + codec + tokenizer). Subsequent runs reuse the cached weights.
## Voices
Use a string of the form `" ()"`. **38 voices across 19 languages**: Hindi, Bengali, Marathi, Telugu, Kannada, Tamil, Malayalam, Gujarati, Punjabi, Assamese, Bhojpuri, Magahi, Maithili, Chhattisgarhi, Bodo, Dogri, Nepali, Sanskrit, English (Indian) β male + female each.
## Notes
- `q4f16` is the fastest cold-start option and works well for short prompts.
- `q8` is heavier but can sound cleaner on more difficult prompts.
- Emotion tags such as `` and `` can be appended at the end of a line.
- Everything stays local to the browser after the model has loaded.
## Credits
- [Kenpath](https://huggingface.co/kenpath) β Svara TTS v1 base model.
- [Canopy Labs](https://huggingface.co/canopylabs) β Orpheus 3B Hindi base.
- [Hugging Face](https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu) β original `text-to-speech-webgpu` scaffold this project forked from.
- License: Apache 2.0.