https://github.com/codebysonu95/voxsherpa-tts
๐๏ธ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android โก Sherpa-ONNX powered ๐ Natural voice synthesis ๐ฑ Fully offline processing ๐ No cloud โข No limits
https://github.com/codebysonu95/voxsherpa-tts
android android-ai android-app hindi-tts kokoro-82m kokoro-onnx kokoro-tts local-ai local-first offline-tts on-device-ai piper-tts sherpa-onnx text-to-speech tts-kokoro-android
Last synced: about 1 month ago
JSON representation
๐๏ธ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android โก Sherpa-ONNX powered ๐ Natural voice synthesis ๐ฑ Fully offline processing ๐ No cloud โข No limits
- Host: GitHub
- URL: https://github.com/codebysonu95/voxsherpa-tts
- Owner: CodeBySonu95
- License: gpl-3.0
- Created: 2026-03-01T15:34:09.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-25T08:32:04.000Z (about 1 month ago)
- Last Synced: 2026-04-25T09:28:40.580Z (about 1 month ago)
- Topics: android, android-ai, android-app, hindi-tts, kokoro-82m, kokoro-onnx, kokoro-tts, local-ai, local-first, offline-tts, on-device-ai, piper-tts, sherpa-onnx, text-to-speech, tts-kokoro-android
- Language: Java
- Homepage: https://codebysonu95.github.io/VoxSherpa-TTS/
- Size: 47.3 MB
- Stars: 58
- Watchers: 2
- Forks: 9
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://play.google.com/store/apps/details?id=com.CodeBySonu.VoxSherpa)
[](https://codebysonu95.github.io/VoxSherpa-TTS/assets/support.html)
[](https://android.com)
[](LICENSE)
[](https://github.com/k2-fsa/sherpa-onnx)
[](https://github.com/CodeBySonu95/VoxSherpa-TTS/releases)
VoxSherpa TTS
Studio-quality offline neural text-to-speech for Android.
Hindi ยท English ยท British ยท Japanese ยท Chinese ยท and more โ No cloud. No limits. No compromise.
---
## ๐ Featured In
> VoxSherpa TTS is listed in the **official README** of [k2-fsa/sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) โ the core inference library powering this app.
[](https://github.com/k2-fsa/sherpa-onnx#voxsherpa-tts)
[](https://huggingface.co/CodeBySonu95/VoxSherpa-TTS)
---
## Why VoxSherpa?
Most TTS apps make you choose between **quality** and **privacy**. Cloud-based tools like ElevenLabs sound incredible โ but they require internet, send your text to remote servers, and charge per character.
**VoxSherpa breaks that tradeoff.**
It runs two professional-grade neural engines entirely on your device:
| Engine | Quality | Speed | Best For |
|--------|---------|-------|----------|
| ๐ง **Kokoro-82M** | Studio-grade ยท rivals ElevenLabs | Slower on budget hardware | Audiobooks, voiceovers, professional content |
| โก **Piper / VITS** | Natural ยท clear | Fast on any device | Daily use, quick synthesis |
---
## Screenshots
| Generate | Models | Library | Settings |
|:---:|:---:|:---:|:---:|
|
|
|
|
|
---
## Features
### ๐๏ธ Dual Neural Engine
- **Kokoro-82M** โ 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ more languages. Same architecture used by top-tier commercial TTS services.
- **Piper / VITS** โ Fast, lightweight, natural. Generates speech in seconds on any Android device.
### ๐ 100% Offline & Private
- All processing happens on your device
- No internet required after model download
- No account, no telemetry, no data collection
- Your text never leaves your phone
### ๐ฆ Model Management
- Download models directly from the app
- Import your own `.onnx` models from local storage
- Multiple models installed simultaneously
- Smart storage tracking
### ๐ง Audio Controls
- Real-time waveform visualization
- Adjustable speed and pitch
- Interactive audio seeking with mini player controls
- MediaStyle notification with full playback controls
- Play, pause, and replay generated audio
- Export as WAV with correct sample rate per model
### ๐ Speech Library
- Save all generated audio locally
- Favorites system for quick access
- View generation history with timestamps
- Voice model attribution per recording
### โ๏ธ Smart Settings
- **Smart Punctuation** โ natural pauses after sentence breaks
- **Emotion Tags** โ `[whisper]`, `[angry]`, `[happy]` support
- Per-model voice selection (Kokoro supports 100+ speakers)
- System-wide TTS engine with pitch & speed control
- Theme-aware UI
---
## Technical Architecture
```
User Text
โ
โโโโ Kokoro Engine (KokoroEngine.java)
โ โโโ Sherpa-ONNX JNI โ ONNX Runtime โ CPU/NNAPI
โ โโโ kokoro-multi-lang-v1_0 (82M params, FP32)
โ
โโโโ Piper / VITS Engine (VoiceEngine.java)
โโโ Sherpa-ONNX JNI โ ONNX Runtime โ CPU
โโโ VITS model (language-specific)
```
**Built with:**
- [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx) โ on-device neural inference
- [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) โ multilingual neural TTS model
- [Piper](https://github.com/rhasspy/piper) โ fast local TTS
- Android AudioTrack API โ low-latency PCM playback
---
## Performance
Generation speed depends entirely on your device's processor:
| Device Tier | Kokoro | Piper |
|-------------|--------|-------|
| ๐ข Flagship (Snapdragon 8 Gen 3) | ~20โ40 sec/min audio | ~5 sec/min audio |
| ๐ก Mid-range (8-core) | ~60โ90 sec/min audio | ~10 sec/min audio |
| ๐ด Budget (6-core) | ~2โ3 min/min audio | ~20 sec/min audio |
> Kokoro prioritizes **quality over speed** by design. It uses the same 82M parameter architecture that powers premium commercial TTS โ running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.
---
## Installation
### ๐ Now Live on Google Play!
VoxSherpa TTS is officially available on the **Google Play Store**. No forms, no waitlists โ just tap and install.
**Requirements:** Android 11+ ยท ARM64
---
## Changelog
### V2.6 โ Media Notification *(Latest)*
- ๐ MediaStyle notification with full playback controls
- ๐๏ธ Pitch control in System TTS
- โก Speed control in System TTS
- Improved performance and stability
- Bug fixes and optimizations
- Minor UI improvements
### V2.5 โ Stability
- Bug fixes and stability improvements
- Improved overall performance
### V2.4 โ Bug Fixes
- Improved System TTS support with better language detection
- Enhanced UI & overall app experience
- Improved compatibility for large screen devices
- Various bug fixes
### V2.3 โ Playback Upgrade
- Interactive audio seeking
- New mini player controls
- Smoother and faster UI performance
- Fixed cancel generation delay issue
### V2.2 โ Core Improvements
- Regenerate audio on voice change
- Improved smart punctuation
- Improved emotion tags
- Pitch control added
- Send feedback feature
- UI/UX improvements
### V1.0 โ Foundation
- Text to Audio
- Piper (fast models) + Kokoro (high-quality voices)
- Save audio (.wav) ยท Favorites support
- Speed control ยท Models download ยท Import Custom Model
- Chunk-based playback ยท Smart pause handling
- System TTS integration ยท PDF to Audio ยท TXT to Audio
---
## Model Import (Technical Users)
VoxSherpa supports importing custom `.onnx` models without any server:
1. Place your `.onnx` model + `tokens.txt` on device storage
2. Open **Models tab** โ tap **+** โ **Import Local Model**
3. Select your files
Compatible with any Sherpa-ONNX compatible TTS model.
---
## Contributing
VoxSherpa is open source. Contributions welcome:
- ๐ Bug reports via [Issues](../../issues)
- ๐ก Feature requests via [Discussions](../../discussions)
- ๐ง Pull requests for fixes and improvements
---
## License
```
Copyright (C) 2025 CodeBySonu95
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
https://www.gnu.org/licenses/gpl-3.0.html
```
---
## Acknowledgements
- [k2-fsa/sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) โ the inference engine that makes this possible
- [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) โ the neural model behind studio-quality synthesis
- [rhasspy/piper](https://github.com/rhasspy/piper) โ fast local TTS engine
---
**Built with obsession. Runs without internet.**
*VoxSherpa โ Because your voice deserves to stay yours.*