https://github.com/x-lance/storytts
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
https://github.com/x-lance/storytts
Last synced: 8 months ago
JSON representation
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
- Host: GitHub
- URL: https://github.com/x-lance/storytts
- Owner: X-LANCE
- License: other
- Created: 2023-09-07T11:04:15.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-27T14:49:36.000Z (over 1 year ago)
- Last Synced: 2025-01-11T21:11:02.921Z (9 months ago)
- Language: HTML
- Homepage: https://goarsenal.github.io/StoryTTS/
- Size: 25.7 MB
- Stars: 137
- Watchers: 17
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# StoryTTS
> [STORYTTS: A HIGHLY EXPRESSIVE TEXT-TO-SPEECH DATASET WITH RICH TEXTUAL EXPRESSIVENESS ANNOTATIONS](https://ieeexplore.ieee.org/document/10446023)
StoryTTS is a highly expressive text-to-speech dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show (评书), which is delivered by a female artist, Lian Liru(连丽如). It contains 61 hours of consecutive and highly prosodic speech equipped with accurate text transcriptions and rich textual expressiveness annotations.
[Demos](https://goarsenal.github.io/StoryTTS/)
## Dataset Statistics
## Download
* Please download the speech data from [Huggingface](https://huggingface.co/datasets/Arsenal/StoryTTS) or [ModelScope](https://modelscope.cn/api/v1/datasets/CantabileKwok/StoryTTS/repo?Revision=master&FilePath=StoryTTS.zip)
### Note
* The dataset is **ONLY** for research purposes.
* The ownership of the speech data remains with the original owner. Downloading this dataset defaults to agreeing to sign our [licensing agreement](storytts_license_agreement.pdf). lt's important to note that these materials may be removed at any time upon request from the original owner.## File Description
* `dataset/transcript` : The transcripts of StoryTTS in simplified Chinese with puncuations.
* `dataset/utt2dur`: The duration (in seconds) of each utterance.
* `dataset/utt2spk`: The speaker name of each utterance, i.e. the name of the only speaker in StoryTTS.
* `dataset/label` : The annotation labels of StoryTTS. The format of this file is as follows:
```
utt-ID 句式(Sentence Pattern)|修辞手法(Rhetoric Device)|场景(Scene)|情感色彩(Emotional colors)|模仿人物(Imitated Characters)
```* `dataset/prompt_claude2`: Prompt and instruction for Claude2.
* `dataset/prompt_gpt4`: Prompt and instruction for GPT4.
* `dataset/wav.scp`: Path of wav files. Note: might be changed according to your location of storing the speech data.
## Citation
```
@inproceedings{storytts,
author={Sen Liu and Yiwei Guo and Xie Chen and Kai Yu},
title={{StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations}},
year={2024},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={11521-11525},
doi={10.1109/ICASSP48485.2024.10446023}
}
```