https://github.com/lifeiteng/naturalspeech2
https://github.com/lifeiteng/naturalspeech2
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lifeiteng/naturalspeech2
- Owner: lifeiteng
- Created: 2023-05-10T15:33:02.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-29T02:09:56.000Z (almost 2 years ago)
- Last Synced: 2025-01-15T01:18:00.123Z (5 months ago)
- Size: 4.88 KB
- Stars: 33
- Watchers: 22
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# NaturalSpeech2
# Progress
- [x] Align datasets
- [x] Implement modules
- [x] Training
- [x] End-To-End Synthesizer
- [x] Add Loss CE RVQ
- [x] Subjective Evaluation
- [x] Objective Evaluation
- [ ] Demo Page# Objective Evaluation
* `LibriTTS test clean`
* ASR WER `whisper large-v2`
* Speaker Embedding [https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector](https://huggingface.co/docs/transformers/model_doc/wavlm#transformers.WavLMForXVector)| Prompt | WER | Speaker cosine Similarity | UtteranceLevel Pitch Mean MAE | UtteranceLevel Pitch Std MAE | UtteranceLevel Duration Diff |
| ---- | ---- | ---- | ---- | ---- | ---- |
| Ground Truth | 0.86 | - | - | - | - |
| 2 Seconds | | | | | |
| 4 Seconds | | | | | |
| 6 Seconds | | | | | |
| 8 Seconds | | | | | |
| 4 Seconds(PrefixPrompt) | | | | | (avg utter duration)|