https://github.com/akshaysinhaaa/emova

A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.
https://github.com/akshaysinhaaa/emova

bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/akshaysinhaaa/emova
Owner: akshaysinhaaa
Created: 2025-04-06T17:39:12.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-24T19:32:32.000Z (about 1 year ago)
Last Synced: 2025-05-24T20:31:42.915Z (about 1 year ago)
Topics: bert, cnn, cuda, deep-learning, multimodal, python, pytorch, resnet-18, tensorboard, transformers
Language: Python
Homepage:
Size: 31.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ConvEmoSentNet: A Parameter-Efficient Framework for Multimodal Emotion and Sentiment Analysis in Social Media Conversations

A deep learning framework designed for **emotion and sentiment recognition** using **text**, **audio**, and **video** modalities. This project leverages the **MELD (Multimodal EmotionLines Dataset)** to train a robust and flexible model that reflects human communication more accurately than unimodal models.

---

## 📦 Dataset: MELD

**Multimodal EmotionLines Dataset (MELD)** is a large-scale, multi-party conversation dataset derived from the TV series *Friends*. It provides aligned and synchronized **text**, **audio**, and **video** data, annotated with both **emotion** and **sentiment** labels.

- **Modalities**:
- `Text`: Dialogues (utterances)
- `Audio`: Speaker voice tone
- `Video`: Speaker facial expressions and posture

- **Emotion Labels**:
- Anger
- Disgust
- Fear
- Joy
- Neutral
- Sadness
- Surprise

- **Sentiment Labels**:
- Positive
- Negative
- Neutral

🔗 [MELD Dataset GitHub](https://github.com/declare-lab/MELD)

---

## 🧠 Model Architecture

The model is **modular** and allows training on individual or fused modalities: `Text`, `Audio`, and `Video`. It is designed to perform well when one or more modalities are missing or unavailable.

### 🔹 Individual Modality Encoders

| Modality | Model Used | Preprocessing |
|----------|--------------------|--------------------------------|
| Text | BERT | Tokenization, Padding |
| Audio | CNN | MFCC / Log-Mel Spectrogram |
| Video | ResNet18 / 3D-CNN | Face Extraction, Frame Sampling|

### 🔹 Multimodal Fusion Strategy

- Concatenation of latent vectors from each modality
- Optional **attention mechanism** to weight more informative modalities
- Final **Fully Connected Layers** leading to classification head (Softmax)

```
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Text │ │ Audio │ │ Video │
└────┬───────┘ └── ┬─────── ┘ └────┬───────┘
│ │ │
BERT CNN 3D CNN / ResNet
│ │ │
└────────────┬───┴────┬──────────────┘
│ Fusion │
└────┬───┘
Fully Connected
Softmax
```

---

## 🧪 Training Details

- **Optimizer**: Adam
- **Scheduler**: ReduceLROnPlateau
- **Loss Function**:
- CrossEntropyLoss for multiclass emotion classification
- Label Smoothing (0.1) to prevent overconfidence
- **Regularization**:
- Dropout in FC layers (0.3–0.5)
- Early Stopping based on validation loss
- **Batch Size**: 16–32
- **Epochs**: 15–25

### 🧵 Hyperparameter Tuning

- Performed manually (grid search) on:
- Learning rate (1e-3 to 1e-5)
- Hidden layer sizes
- Dropout rates
- Fusion strategies (early vs late fusion)

---

## 📈 Performance Snapshot

| Configuration | Emo Precision | Emo Acc. | Sen Precision | Sen Acc. |
|------------------------|---------------|----------|---------------|----------|
| Fused Model | 53.50% | 54.90% | 64.40% | 64.60% |

---

## 🧑‍💻 Author

**Akshay Sinha, Gauri Saksena, Yash Chandel**
_Deep Learning | Multimodal AI | Emotion Recognition_

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/akshaysinhaaa/emova

Awesome Lists containing this project

README