https://github.com/nicolay-r/thor-ecac

The official fork of THoR Chain-of-Thought framework, enhanced and adapted for Emotion Cause Analysis (ECAC-2024)
https://github.com/nicolay-r/thor-ecac

chainofthought colab-notebook emotion-analysis emotion-cause-pair-extraction fine-tuning flan-t5 framework llm notebook-jupyter reasoning semeval semeval-2024 transformers-models

Last synced: 4 months ago
JSON representation

The official fork of THoR Chain-of-Thought framework, enhanced and adapted for Emotion Cause Analysis (ECAC-2024)

Host: GitHub
URL: https://github.com/nicolay-r/thor-ecac
Owner: nicolay-r
License: apache-2.0
Created: 2024-02-06T20:38:47.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2025-07-13T21:17:44.000Z (12 months ago)
Last Synced: 2025-07-13T23:30:32.369Z (12 months ago)
Topics: chainofthought, colab-notebook, emotion-analysis, emotion-cause-pair-extraction, fine-tuning, flan-t5, framework, llm, notebook-jupyter, reasoning, semeval, semeval-2024, transformers-models
Language: Python
Homepage: https://arxiv.org/abs/2404.03361
Size: 5.15 MB
Stars: 12
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGESET.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          ## THOR: Three-hop Reasoning for Emotion Cause Analysis in Context • [![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1803705227108847866)

![](https://img.shields.io/badge/Python-3.10-lightgreen.svg)

[![arXiv](https://img.shields.io/badge/arXiv-2404.03361-b31b1b.svg)](https://arxiv.org/abs/2404.03361)

[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1803705227108847866)

[![Youtube badge](https://img.shields.io/badge/-Youtube-Cc4c4c?style=flat-square&logo=Youtube&logoColor=white&link=https://twitter.com/nicolayr_)](https://youtu.be/vRVDQa7vfkU)

**The PyTorch reforged and forked version of the official 

[THoR-framework](https://github.com/scofield7419/THOR-ISA),

enhanced

and adapted for SemEval-2024 paper 

[nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States](https://arxiv.org/abs/2404.03361)**



[![](https://markdown-videos-api.jorgenkh.no/youtube/vRVDQa7vfkU)](https://youtu.be/vRVDQa7vfkU)



### **Latest Updates**   

> **Update February 23 2025:** 🔥 **BATCHING MODE SUPPORT**. See 🌌 [Flan-T5 provider](https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_flan_t5.py) for [bulk-chain](https://github.com/nicolay-r/bulk-chain) project. Test [is available here](https://github.com/nicolay-r/bulk-chain/blob/master/test/test_provider_batching.py)

> **Update 06 June 2024:** 🤗 Model and its description / tutorial for usage has been uploaded on `huggingface`: [🤗 nicolay-r/flan-t5-emotion-cause-thor-base](https://huggingface.co/nicolay-r/flan-t5-emotion-cause-thor-base)

> **Update 06 March 2024**: 🔓 `attrdict` represents the main limitation for code launching in `Python 3.10` and hence been switched to `addict` (see [Issue#2](https://github.com/nicolay-r/THOR-ECAC/issues/2)).

> **Update 05 March 2024**: The quick [arXiv paper](https://arxiv.org/abs/2404.03361) breakdowns 🔨 are @ [Twitter/X post](https://twitter.com/nicolayr_/status/1777005686611751415)

> **Update 17 February 2024**: We support `--bf16` mode for launching Flan-T5 with `torch.bfloat16` type; 

> this feature allows launching `xl`-sized model training with just a single NVidia-A100 (40GB)

> **NOTE:** Since the existed fork aimed on a variety non-commercial projects application, 

> this repository represent **a copy** of the originally published code with the folllowing 

> [🔧 enhancements and changes](CHANGESET.md)

> **NOTE:** [List of the changes](CHANGESET.md) from the original THoR

## Contents

* [Overview](#overview)

* [📙 **Quickstart in GoogleColab**](#quickstart)

* [Usage](#code)

  * [Requirement](#requirement)

  * [Dataset Preparation](#data)

  * [**Prompts and CoT**](#prompts-and-engines)

  * [Training / Inferring](#runt5)

  * [**Submitting Results on Codalab**](#submitting-results-on-codalab)

* [References](#references)  

## Overview

* **Input:** a conversation containing the speaker and the text of each utterance.

* **Output:** all emotion-cause pairs, where each pair contains an emotion utterance along with its emotion category and the textual cause span in a specific cause utterance, e.g:

  * (`U3_Joy`, `U2_“You made up!”`)

> The complete description of the task is [available here](https://nustm.github.io/SemEval-2024_ECAC/).



  



> Framework illustration.



  



## Quickstart

### 🧪 [Quick start with Emotion CoT ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/THOR-ECAC/blob/master/SemEval_2024_Task_3_FlanT5_Finetuned_Model_Usage.ipynb)

### 🔥 [Experiments Reproduction ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/THOR-ECAC/blob/master/THoR_Finetuning_SemEval2023_t3_1_public.ipynb)

We provide 🔥 notebook for downloading all the necessary data, followed by launching

experiments with `NVidia-V100`/ or `NVidia-A100`.

**Codalab Submission formation**: please [follow this section](#submitting-results-on-codalab).

## Usage

### Requirement

![](https://img.shields.io/badge/Python-3.10-lightgreen.svg)

This project has been tested under **Python-3.8** and [adapted](https://github.com/nicolay-r/THOR-ECAC/issues/2) for the **Python-3.10**. 

Using `pip`, you can install the necessary dependencies as follows:

``` bash 

pip install -r requirements.txt

```

### Datasets

### 👉 [Compile datasets manually](https://github.com/nicolay-r/SemEval2024-Task3) 👈

**Serialize datasets**: We provide `download_data.py` script 

for downloading and serialization of the manually compiled datasets 

(`D_state` and `D_cause`).

```python

python download_data.py \

  --cause-test "https://www.dropbox.com/scl/fi/4b2ouqdhgifqy3pmopq08/cause-mult-test.csv?rlkey=tkw0p1e01vezrjbou6v7qh36a&dl=1" \

  --cause-train "https://www.dropbox.com/scl/fi/0tlkwbe5awcss2qmihglf/cause-mult-train.csv?rlkey=x9on1ogzn5kigx7c32waudi21&dl=1" \

  --cause-valid "https://www.dropbox.com/scl/fi/8zjng2uyghbkpbfcogj6o/cause-mult-valid.csv?rlkey=91dgg4ly7p23e3id2230lqsoi&dl=1" \

  --state-train "https://www.dropbox.com/scl/fi/0lokgaeo973wo82ig01hy/state-mult-train.csv?rlkey=tkt1oyo8kwgqs6gp79jn5vbh8&dl=1" \

  --state-valid "https://www.dropbox.com/scl/fi/eu4yuk8n61izygnfncnbo/state-mult-valid.csv?rlkey=tlg8rac4ofkbl9o4ipq6dtyos&dl=1"

```

For reproduction purposes you may refer to the **[code of this supplementary repository](https://github.com/nicolay-r/SemEval2024-Task3)**. 

### LLMs



  



Use the **Flan-T5** as the backbone LLM reasoner:

  * **[google/flan-t5-base](https://huggingface.co/google/flan-t5-base)**

  * [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)

  * [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl)

  * [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl)

> **NOTE**: We setup `base` reasoner in [config.yaml](https://github.com/nicolay-r/THOR-ECAC/blob/23a2add3d77f251dfca5241153815d76eb4dee6b/config/config.yaml#L4-L5).

However, **it is highly recommended** to choose the largest reasoning model you can afford (`xl` or higher) for fine-tuning.

### Prompts and Engines

We provide separate engines, and for each engine the source of the prompts in particular:

* `prompt_state`: 

  [instruction](https://github.com/nicolay-r/THOR-ECAC/blob/39b768cba5a652bc207725d707b5c41dece574ac/main.py#L143) 

  wrapped into the 

  [prompt](https://github.com/nicolay-r/THOR-ECAC/blob/39b768cba5a652bc207725d707b5c41dece574ac/src/utils.py#L9-L14)

* `prompt_cause`: 

  [instruction](https://github.com/nicolay-r/THOR-ECAC/blob/39b768cba5a652bc207725d707b5c41dece574ac/main.py#L142) 

  wrapped into the 

  [prompt](https://github.com/nicolay-r/THOR-ECAC/blob/39b768cba5a652bc207725d707b5c41dece574ac/src/utils.py#L9-L14)

* `thor_state`: [Class of the prompts](src/cot_state.py)

* `thor_cause`: [Class of the prompts](src/cot_cause.py)

* `thor_cause_rr`: [Class of the prompts](src/cot_cause.py) same as `thor_cause`

### Training and Evaluating with Flan-T5

Use the [main.py](main.py) script with command-line arguments to run the 

**Flan-T5-based** THOR system. 

```bash

python main.py -c  \

    -m "google/flan-t5-base" \

    -r [prompt|thor_state|thor_cause|thor_cause_rr]  \ 

    -d [state_se24|cause_se24] \

    -lf "optional/path/to/the/pretrained/state" \

    -es  \

    -bs  \

    -f  

```

### Parameters list

* `-c`, `--cuda_index`: Index of the GPU to use for computation (default: `0`).

* `-d`, `--data_name`: Name of the dataset. Choices are `state_se24` or `cause_se24`.

* `-m`, `--model_path`: Path to the model on hugging face.

* `-r`, `--reasoning`: Specifies the reasoning mode, with one-step prompt or multi-step thor mode.

* `-li`, `--load_iter`: load a state on specific index from the same `data_name` resource (default: `-1`, not applicable.)

* `-lp`, `--load_path`: load a state on specific path.

* `-p`, `--instruct`: instructive prompt for `prompt` training engine that involves `target` parameter only"

* `-es`, `--epoch_size`: amount of training epochs (default: `1`)

* `-bs`, `--batch_size`: size of the batch (default: `None`)

* `-lr`, `--bert_lr`: learning rate (default=`2e-4`)

* `-t`, `--temperature`: temperature (default=gen_config.temperature)

* `-v`, `--validate`: running under zero-shot mode on `valid` set.

* `-i`, `--infer_iter`: running inference on `test` dataset to form answers.

* `-f`, `--config`: Specifies the location of [config.yaml](config/config.yaml) file.

Configure more parameters in [config.yaml](config/config.yaml) file.

## Submitting Results on Codalab

### 📊 [Codalab Competiton Page](https://codalab.lisn.upsaclay.fr/competitions/16141)

All the service that is not related to the Codalab is a part of 

**another repository** (link below 👇)

Once results were inferred ([`THOR-cause-rr` results example](data/google_flan-t5-base-thor_cause_rr-output-sample.csv)), 

you may refer to the following code to form a submission: 

### 👉 [Codalab Service Repository](https://github.com/nicolay-r/SemEval2024-Task3) 👈

## References

The original THoR project:

```bibtex

@inproceedings{FeiAcl23THOR,

  title={Reasoning Implicit Sentiment with Chain-of-Thought Prompting},

  author={Hao, Fei and Bobo, Li and Qian, Liu and Lidong, Bing and Fei, Li and Tat-Seng, Chua},

  booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

  pages = "1171--1182",

  year={2023}

}

```

You can cite this work as follows:

```bibtex

@article{rusnachenko2024nicolayr,

  title={nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States},

  booktitle = "Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics",

  author={Nicolay Rusnachenko and Huizhi Liang},

  year= "2024",

  month= jun,

  address = "Mexico City, Mexico",

  publisher = "Annual Conference of the North American Chapter of the Association for Computational Linguistics"

}

```

## Acknowledgement

This code is referred from following projects:

[CoT](https://arxiv.org/abs/2201.11903); 

[Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5);

[Transformers](https://github.com/huggingface/transformers),

## License

The code is released under Apache License 2.0 for Noncommercial use only.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nicolay-r/thor-ecac

Awesome Lists containing this project

README