https://github.com/hiborn4/tensorfusion_network_for_multimodal_sentiment_analysis

This repository implements the Tensor Fusion Network (TFN) for multimodal sentiment analysis using the CMU-MOSI dataset. TFN integrates language, visual, and acoustic modalities to predict sentiment intensity, enhancing sentiment prediction accuracy by modeling unimodal, bimodal, and trimodal interactions.
https://github.com/hiborn4/tensorfusion_network_for_multimodal_sentiment_analysis

cmu-mosi deep deep-learning fusion lrmf regression tensorflow unimodals

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/hiborn4/tensorfusion_network_for_multimodal_sentiment_analysis
Owner: HiBorn4
Created: 2024-05-21T08:19:44.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-05-21T08:26:17.000Z (about 2 years ago)
Last Synced: 2025-03-30T21:33:20.535Z (over 1 year ago)
Topics: cmu-mosi, deep, deep-learning, fusion, lrmf, regression, tensorflow, unimodals
Language: Jupyter Notebook
Homepage:
Size: 389 KB
Stars: 5
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Tensor Fusion Network (TFN) for Multimodal Sentiment Analysis

This repository contains the implementation of the Tensor Fusion Network (TFN) for multimodal sentiment analysis using the CMU-MOSI dataset. The TFN architecture incorporates language, visual, and acoustic modalities to predict sentiment intensity.

## Dataset: CMU-MOSI

The CMU-MOSI dataset is an annotated dataset of video opinions from YouTube movie reviews. It includes sentiment annotations on a seven-step Likert scale from very negative to very positive. The dataset comprises 2,199 opinion utterances from 93 distinct speakers, with an average length of 4.2 seconds per video.

### Dataset Features

- **Language Modality**: Uses GloVe word vectors for spoken words.
- **Visual Modality**: Extracts facial expressions and action units using the FACET framework and OpenFace.
- **Acoustic Modality**: Extracts acoustic features using the COVAREP framework.

### Sentiment Prediction Tasks

1. **Binary Sentiment Classification**
2. **Five-Class Sentiment Classification**
3. **Sentiment Regression**

## Tensor Fusion Network (TFN)

TFN consists of three main components:

1. **Modality Embedding Subnetworks**: Extracts features from language, visual, and acoustic modalities.
2. **Tensor Fusion Layer**: Explicitly models unimodal, bimodal, and trimodal interactions.
3. **Sentiment Inference Subnetwork**: Performs sentiment inference based on the fused multimodal tensor.

### Modality Embedding Subnetworks

- **Language Embedding Subnetwork**: Uses LSTM to learn time-dependent representations of spoken words.
- **Visual Embedding Subnetwork**: Uses a deep neural network to process visual features extracted from facial expressions.
- **Acoustic Embedding Subnetwork**: Uses a deep neural network to process acoustic features extracted from audio signals.

### Tensor Fusion Layer

The Tensor Fusion Layer models the interactions between different modalities using a three-fold Cartesian product, generating a multimodal tensor that captures unimodal, bimodal, and trimodal dynamics.

### Sentiment Inference Subnetwork

A fully connected deep neural network that takes the multimodal tensor as input and performs sentiment classification or regression.

## Experiments

Three sets of experiments were conducted:

1. **Multimodal Sentiment Analysis**: Compared TFN with state-of-the-art multimodal sentiment analysis models.
2. **Tensor Fusion Evaluation**: Analyzed the importance of subtensors and the impact of each modality.
3. **Modality Embedding Subnetworks Evaluation**: Compared TFN's modality-specific networks with state-of-the-art unimodal sentiment analysis models.

## Results

TFN outperformed state-of-the-art approaches in binary sentiment classification, five-class sentiment classification, and sentiment regression. The ablation study showed the importance of modeling trimodal dynamics for improved performance.

## How to Use

### Prerequisites

- Python 3.x
- TensorFlow or PyTorch (depending on the implementation)
- Required Python libraries (listed in `requirements.txt`)

### Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/TFN-multimodal-sentiment.git
cd TFN-multimodal-sentiment
```

2. Install the required Python libraries:
```bash
pip3 install -r requirements.txt
```

### Dataset Preparation

1. Download the CMU-MOSI dataset from the official source.
2. Extract the dataset and place it in the `data` directory.

### Training the Model

1. Preprocess the dataset:
```bash
python3 preprocess.py --data_dir data/CMU-MOSI
```

2. Train the TFN model:
```bash
python3 train.py --config configs/tfn_config.json
```

### Evaluation

Evaluate the trained model on the test set:
```bash
python3 evaluate.py --model_dir models/tfn --data_dir data/CMU-MOSI
```

### Configuration

Modify the configuration file `configs/tfn_config.json` to change hyperparameters, model settings, and dataset paths.

## Citation

If you use this code or dataset in your research, please cite the original paper:

```bibtex
@inproceedings{zadeh2016mosi,
title={Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages},
author={Zadeh, Amir and Chen, Minghai and Poria, Soujanya and Cambria, Erik and Morency, Louis-Philippe},
booktitle={IEEE Intelligent Systems},
year={2016}
}
```

## License

This project is licensed under the MIT License.

---

Feel free to open an issue if you have any questions or need further assistance. Happy researching!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hiborn4/tensorfusion_network_for_multimodal_sentiment_analysis

Awesome Lists containing this project

README