An open API service indexing awesome lists of open source software.

https://github.com/x-lance/storytts

[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
https://github.com/x-lance/storytts

Last synced: 8 months ago
JSON representation

[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Awesome Lists containing this project

README

          

# StoryTTS

> [STORYTTS: A HIGHLY EXPRESSIVE TEXT-TO-SPEECH DATASET WITH RICH TEXTUAL EXPRESSIVENESS ANNOTATIONS](https://ieeexplore.ieee.org/document/10446023)

StoryTTS is a highly expressive text-to-speech dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show (评书), which is delivered by a female artist, Lian Liru(连丽如). It contains 61 hours of consecutive and highly prosodic speech equipped with accurate text transcriptions and rich textual expressiveness annotations.

[Demos](https://goarsenal.github.io/StoryTTS/)

## Dataset Statistics

table1

table2

## Download

* Please download the speech data from [Huggingface](https://huggingface.co/datasets/Arsenal/StoryTTS) or [ModelScope](https://modelscope.cn/api/v1/datasets/CantabileKwok/StoryTTS/repo?Revision=master&FilePath=StoryTTS.zip)

### Note

* The dataset is **ONLY** for research purposes.
* The ownership of the speech data remains with the original owner. Downloading this dataset defaults to agreeing to sign our [licensing agreement](storytts_license_agreement.pdf). lt's important to note that these materials may be removed at any time upon request from the original owner.

## File Description

* `dataset/transcript` : The transcripts of StoryTTS in simplified Chinese with puncuations.

* `dataset/utt2dur`: The duration (in seconds) of each utterance.

* `dataset/utt2spk`: The speaker name of each utterance, i.e. the name of the only speaker in StoryTTS.

* `dataset/label` : The annotation labels of StoryTTS. The format of this file is as follows:

```
utt-ID 句式(Sentence Pattern)|修辞手法(Rhetoric Device)|场景(Scene)|情感色彩(Emotional colors)|模仿人物(Imitated Characters)
```

* `dataset/prompt_claude2`: Prompt and instruction for Claude2.

* `dataset/prompt_gpt4`: Prompt and instruction for GPT4.

* `dataset/wav.scp`: Path of wav files. Note: might be changed according to your location of storing the speech data.

## Citation

```
@inproceedings{storytts,
author={Sen Liu and Yiwei Guo and Xie Chen and Kai Yu},
title={{StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations}},
year={2024},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={11521-11525},
doi={10.1109/ICASSP48485.2024.10446023}
}
```