An open API service indexing awesome lists of open source software.

https://github.com/suzana-ilic/nlp_affective_computing

NLP Affective Computing - text-based emotion recognition with Deep Learning and LLMs
https://github.com/suzana-ilic/nlp_affective_computing

affective-computing data deeplearning linguistics llms nlp

Last synced: 5 months ago
JSON representation

NLP Affective Computing - text-based emotion recognition with Deep Learning and LLMs

Awesome Lists containing this project

README

          

This thesis aims to contribute to research efforts in the field of affective computing and to provide a holistic analysis of text-based emotion recognition from the perspective of Applied and Computational Linguistics. We will examine linguistic features, annotation schemes, categorical and dimensional emotion models, as well as commonly used research datasets with different linguistic styles, and focus on deep neural network architectures as the main prediction systems, since deep learning has achieved major breakthroughs and state-of-the-art results for a large number of tasks in the field of Natural Language Processing (Young et al. 2018). Schematic thesis overview that spans analyses, tasks and implications for (1) datasets, (2) emotion models and (3) algorithms:

# Overview

## Emotion Models

- Categorical emotion models – emotions are represented as distinct, mutually exclusive categories (e.g. the basic emotions anger, fear, joy, ...)
- Dimensional emotion models – emotions are represented in a two- or multidimensional space (e.g. valence and arousal)

### Model demo: Predicting basic emotions

In this demo you can try out directly in the browser [a fine-tuned checkpoint of DistilRoBERTa-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base) by Jochen Hartmann. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:

- anger 🤬
- disgust 🤢
- fear 😨
- joy 😀
- neutral 😐
- sadness 😭
- surprise 😲

[Demo](https://huggingface.co/spaces/Suzana/text_basic_emotions) by Suzana Ilic

Model reference: Jochen Hartmann, "Emotion English DistilRoBERTa-base". https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/, 2022.

### Datasets

- Dataset I – Facebook posts
- Dataset II – Media headlines (SemEval 2007)

### Notebooks

- EDA
- Transformers

# Exploratory data analysis for emotion datasets (text)

The goal of exploratory data analyses for emotion datasets is to get an understanding of the corpus, the linguistic style, lexical elements, syntax as well as the annotation scheme, distribution and imbalance check of classes (or analyses of scores).

## Contents

## Dataset I

- **Dataset:** 2,894 Facebook posts annotated with scores for valence and arousal on an integer scale from 1-9 repsectively
- [EDA](https://github.com/suzana-ilic/EDA_nlp_emotion_datasets/blob/master/notebooks/)
- Model (BERT, RoBERTa) using [Simple Transformers](https://simpletransformers.ai/)

**Task:** Regression\
**Paper:** [Modelling valence and arousal in facebook posts (2016)](https://www.semanticscholar.org/paper/Modelling-Valence-and-Arousal-in-Facebook-posts-Preotiuc-Pietro-Schwartz/5b9f7b419766a35c9ee4a37d5338fa557bbbea47)\
**References:**\
Preoţiuc-Pietro, D., Schwartz, H. A., Park, G., Eichstaedt, J., Kern, M., Ungar, L., & Shulman, E. (2016): Modelling valence and arousal in facebook posts. In Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 9-15).

Dimensonal Emotion model based on the circumplex model (valence and arousal) by James A. Russell (1980): A Circumplex Model of Affect. Journal of Personality and Social Psychology (39,6:1161–1178).

**Dataset Access**: The original authors have taken the dataset offline, it's currently not available.

## Dataset II

For the experimental setup comparing different predictive models, we use a research
dataset for text-based emotion recognition called GoEmotions, a corpus of 58k care-
fully curated comments extracted from Reddit, with human annotations to 27 emo-
tion categories or Neutral (Demszky et al. 2020).

- Number of examples: 58,009.
- Number of emotion labels: 27 + Neutral.
- Maximum sequence length in training and evaluation datasets: 30.

In addition to the raw data, the authors also include a version filtered based on
rater-agreement, which contains the following split:

- Size of training dataset: 43,410.
- Size of test dataset: 5,427.
- Size of validation dataset: 5,426.

**Dataset Access**
- [GoEmotions](https://github.com/google-research/google-research/tree/master/goemotions/data)