https://github.com/diffusionstudio/vits-web
Web api for using VITS based models in the browser!
https://github.com/diffusionstudio/vits-web
Last synced: 10 months ago
JSON representation
Web api for using VITS based models in the browser!
- Host: GitHub
- URL: https://github.com/diffusionstudio/vits-web
- Owner: diffusionstudio
- Created: 2024-07-05T15:52:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-09T08:10:42.000Z (over 1 year ago)
- Last Synced: 2025-03-28T12:53:43.427Z (10 months ago)
- Language: JavaScript
- Homepage: https://huggingface.co/spaces/diffusionstudio/vits-web
- Size: 8.72 MB
- Stars: 193
- Watchers: 3
- Forks: 23
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://github.com/diffusion-studio/ffmpeg-js/graphs/commit-activity)
[](https://huggingface.co/spaces/diffusionstudio/vits-web)
[](https://discord.gg/n3mpzfejAb)
[](https://github.com/diffusion-studio/ffmpeg-js/blob/main/LICENSE)
[](https://typescriptlang.org)
# Run VITS based text-to-speech in the browser powered by the [ONNX Runtime](https://onnxruntime.ai/)
A big shout-out goes to [Rhasspy Piper](https://github.com/rhasspy/piper), who open-sourced all the currently available models (MIT License) and to [@jozefchutka](https://github.com/jozefchutka) who came up with the wasm build steps.
## Usage
First of all, you need to install the library:
```bash
npm i @diffusionstudio/vits-web
```
Then you're able to import the library like this (ES only)
```typescript
import * as tts from '@diffusionstudio/vits-web';
```
Now you can start synthesizing speech!
```typescript
const wav = await tts.predict({
text: "Text to speech in the browser is amazing!",
voiceId: 'en_US-hfc_female-medium',
});
const audio = new Audio();
audio.src = URL.createObjectURL(wav);
audio.play();
// as seen in /example with Web Worker
```
With the initial run of the predict function you will download the model which will then be stored in your [Origin private file system](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API/Origin_private_file_system). You can also do this manually in advance *(recommended)*, as follows:
```typescript
await tts.download('en_US-hfc_female-medium', (progress) => {
console.log(`Downloading ${progress.url} - ${Math.round(progress.loaded * 100 / progress.total)}%`);
});
```
The predict function also accepts a download progress callback as the second argument (`tts.predict(..., console.log)`).
If you want to know which models have already been stored, do the following
```typescript
console.log(await tts.stored());
// will log ['en_US-hfc_female-medium']
```
You can remove models from opfs by calling
```typescript
await tts.remove('en_US-hfc_female-medium');
// alternatively delete all
await tts.flush();
```
And last but not least use this snippet if you would like to retrieve all available voices:
```typescript
console.log(await tts.voices());
// Hint: the key can be used as voiceId
```
### **That's it!** Happy coding :)