An open API service indexing awesome lists of open source software.

https://github.com/lifeiteng/soundstorm


https://github.com/lifeiteng/soundstorm

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

        

# SoundStorm: Efficient Parallel Audio Generation
![](assets/soundstorm.png)

# Demo Page
* [https://lifeiteng.github.io/SoundStorm/index.html](https://lifeiteng.github.io/SoundStorm/index.html)

# Objective Evaluation
* `LibriTTS test clean`
* ASR WER `whisper large-v2`
* Speaker Embedding [https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector](https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector)

| Prompt | WER | Speaker cosine Similarity | UtteranceLevel Pitch Mean MAE | UtteranceLevel Pitch Std MAE | UtteranceLevel Duration Diff |
| ---- | ---- | ---- | ---- | ---- | ---- |
| Ground Truth | 0.86 | - | - | - | - |
| 2 Seconds | 2.32 | 0.8670 | 20.1407 | 17.4387 | - |
| 4 Seconds | 2.10 | 0.8817 | 21.1379 | 19.3733 | - |
| 6 Seconds | 1.95 | 0.8905 | 17.2253 | 15.3792 | - |
| 8 Seconds | 2.33 | 0.8895 | 18.5837 | 15.9667 | - |
| 4 Seconds(PrefixPrompt) | 1.83 | 0.9351 | 12.0929 | 14.3814 | `1.5564 / 12.7153` (avg utter duration)|