Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/centre-for-humanities-computing/literary_evocation
Contains data of the Ficiton4 corpus and for our experiment on literary sentiment evocation
https://github.com/centre-for-humanities-computing/literary_evocation
implicitness literary-analysis literary-language roberta-model sentiment-analysis
Last synced: 6 days ago
JSON representation
Contains data of the Ficiton4 corpus and for our experiment on literary sentiment evocation
- Host: GitHub
- URL: https://github.com/centre-for-humanities-computing/literary_evocation
- Owner: centre-for-humanities-computing
- Created: 2024-07-06T06:37:40.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-10T12:21:14.000Z (4 months ago)
- Last Synced: 2024-07-11T14:28:56.958Z (4 months ago)
- Topics: implicitness, literary-analysis, literary-language, roberta-model, sentiment-analysis
- Language: Python
- Homepage:
- Size: 22.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fiction4 sentiment evocation
Data & code for textual features influence on human sentiment perception in literary texts## 🔬 Data
| | No. texts | No. annotations | No. words | Period |
|-------------|-----|------|--------|------------|
| **Fairy tales** | 3 | 772 | 18,597 | 1837-1847 |
| **Hymns** | 65 | 2,026 | 12,798 | 1798-1873 |
| **Prose** | 1 | 1,923 | 30,279 | 1952 |
| **Poetry** | 40 | 1,579 | 11,576 | 1965 |We present the **Fiction4 corpus** of literary texts, spanning 109 individual texts across 4 genres and two languages (English and Danish) in the 19th and 20th century.
The corpus consists of 3 main authors, Sylvia Plath for poetry, Ernest Hemingway for prose and H.C. Andersen for fairytales. Hymns represent a heterogenous colleciton from Danish official church hymnbooks from 1798-1873.
The corpus was annotated for valence on a sentence basis by at least 2 annotators/sentence.Full Fiction4 corpus data in `\data\fiction4_data.json`
We compare this fiction corpus again nonfiction texts (across genres)
The nonlit considered is:
1. EmoBank (from this paper [https://aclanthology.org/E17-2092/](https://aclanthology.org/E17-2092/)), repo [here](https://github.com/JULIELab/EmoBank/tree/master). So these are multigenre sentences. (n=10,062 & range=(1 to 674 toks) & mean_length=87.8 toks)
2. Facebook posts (from this paper [https://aclanthology.org/W16-0404.pdf](https://aclanthology.org/W16-0404.pdf)), repo [here](https://github.com/wwbp/additional_data_sets/tree/master/valence_arousal). So these are facebook posts (multiple sentences)(n=2,895 & range=(2 to 445 toks) & mean_length=86.7 toks)## 💻 Code
All code for our study on human/model sentiment perception across these corpora is available in this repository, see primarily feature extraction (`get_features.py`) and analysis (`analysis.py`).Annotator agreement calculation for each subcategory of the Fiction4 corpus is in `/annotation/annotator_agreement.py`