An open API service indexing awesome lists of open source software.

https://github.com/lalitdotdev/neurosense-fusion

Multimodal Affect : Unified deep learning framework for emotion and sentiment recognition from video, audio, and text. Powered by BERT, ResNet3D, and CNNs. End-to-end training, robust evaluation β€”built for research and real-world affective computing.
https://github.com/lalitdotdev/neurosense-fusion

Last synced: 12 months ago
JSON representation

Multimodal Affect : Unified deep learning framework for emotion and sentiment recognition from video, audio, and text. Powered by BERT, ResNet3D, and CNNs. End-to-end training, robust evaluation β€”built for research and real-world affective computing.

Awesome Lists containing this project

README

          

# 🧠 NeuroSense: Multimodal Sentiment & Emotion Classifier

![neurosense-fusion](https://socialify.git.ci/lalitdotdev/neurosense-fusion/image?description=1&font=Inter&language=1&logo=https%3A%2F%2Fstellarix.com%2Fwp-content%2Fuploads%2F2024%2F07%2FMultimodal-AI.jpg&name=1&owner=1&pattern=Solid&stargazers=1&theme=Light)

**Decode human emotion and sentiment from video, audio, and textβ€”at scale, in real-time, and with research-grade accuracy.**

---

## πŸ“– Overview

**NeuroSense** is a next-generation multimodal AI framework that fuses video, audio, and text to recognize emotions and sentiments in human communication. Designed for research, real-world deployment, and SaaS applications, NeuroSense combines the power of deep learning, cloud scalability, and a modern web interface.

---

## πŸš€ Features

- πŸŽ₯ **Video Frame Analysis** β€” Extracts facial and contextual cues using ResNet3D.
- πŸŽ™οΈ **Audio Feature Extraction** β€” Captures vocal emotion with Mel spectrograms and CNNs.
- πŸ“ **Text Embeddings with BERT** β€” Understands semantic sentiment from transcripts.
- πŸ”— **Multimodal Fusion** β€” Late fusion of 128D features from each modality for robust affect detection.
- πŸ“Š **Dual Head Classification** β€” Simultaneous prediction of 7 emotion classes and 3 sentiment classes.
- πŸ§ͺ **Model Training & Evaluation** β€” Efficient PyTorch pipeline with TensorBoard logging.
- ☁️ **Scalable Cloud Deployment** β€” AWS SageMaker for training, S3 for data, and real-time inference endpoints.
- πŸ” **Authentication & API Keys** β€” Auth.js and secure key management for SaaS users.
- πŸ“ˆ **Usage Quota Tracking** β€” Monitor and limit API usage per user.
- 🌐 **Modern Frontend** β€” Next.js, Tailwind CSS, and T3 Stack for a seamless user experience.
- πŸ–ΌοΈ **Rich Visualizations** β€” Confusion matrices, training curves, and interactive analytics.

---

## πŸ—οΈ Model Architecture

```
Video Frames ─┐
β”‚
[ResNet3D]──┐
Text ─────[BERT]─────┼─► [Fusion Layer] ──► [Emotion Classifier] ─► 7 Emotions
β”‚ β”‚ └─► [Sentiment Classifier] ─► 3 Sentiments
Audio ──[CNN+Mel]β”€β”€β”€β”€β”˜
```

![Model Architecture](assets/architecture.png)

### Key Components

- **Input Modalities**: Video frames, audio clips, and text transcripts
- **Feature Extraction**:
- Video: ResNet3D processes frames to extract spatial-temporal features.
- Audio: CNN processes Mel spectrograms for vocal emotion.
- Text: BERT generates contextual embeddings from transcripts.
- **Fusion Layer**: Concatenates features from all modalities into a unified representation.
- **Classification Heads**:
- Emotion Classifier: 7-way softmax for emotions (e.g., happy, sad, angry).
- Sentiment Classifier: 3-way softmax for sentiment (positive, negative, neutral).
- **Output**: Real-time predictions for both emotion and sentiment.

### Model Details

- **Encoders**: BERT (text), ResNet3D (video), CNN (audio)
- **Fusion**: Concatenates 128D features from each encoder (total 384D), then projects to 256D
- **Heads**: Two classifiers for emotion (7-way) and sentiment (3-way)

---

## πŸ› οΈ Tech Stack

| Layer | Technologies |
| ------------- | ---------------------------------------------------------------------------- |
| **AI/ML** | PyTorch, HuggingFace Transformers (BERT), TorchVision (ResNet3D), torchaudio |
| **Cloud** | AWS SageMaker, S3, IAM, CloudWatch, Docker |
| **Web** | Next.js, React, Tailwind CSS, tRPC, Prisma, Auth.js |
| **Dev Tools** | TensorBoard, Matplotlib, Seaborn, Docker |

---

## πŸ“¦ Installation & Setup

### 1. Clone the Repository

```bash
git clone https://github.com/yourusername/neurosense.git
cd neurosense
```

### 2. Install Python Dependencies

```bash
pip install -r requirements.txt
```

### 3. Install Frontend Dependencies

```bash
cd frontend
npm install
cd ..
```

### 4. Prepare Dataset

- Download the [MELD Dataset](https://affective-meld.github.io)
- Extract and place it in the `data/` directory as follows:
```
data/
β”œβ”€β”€ train_splits/
β”œβ”€β”€ test_splits/
└── dev_splits/
```

---

## 🧠 Model Training (Local or SageMaker)

### Local Training

```bash
python train.py --model-dir ./output --epochs 25 --data-dir ./data
```

### AWS SageMaker Training

1. **Increase Quota** for your desired instance type (e.g., `ml.g5.xlarge`).
2. **Upload Dataset** to your S3 bucket:
```bash
aws s3 sync ./data s3://your-bucket/data
```
3. **Create IAM Role** with S3 and SageMaker permissions.
4. **Start Training Job**:
```bash
python train_sagemaker.py --role-arn
```

---

## πŸ›°οΈ Model Deployment

1. **Upload Model Artifacts** to your S3 bucket after training.
2. **Deploy Endpoint**:
```bash
python deploy_endpoint.py --model-s3-uri s3://your-bucket/model.tar.gz
```
3. **Configure IAM for Inference** (see `deployment/README.md` for details).

---

## πŸ“ž Inference API

- **REST API**: Real-time predictions via SageMaker endpoint.
- **API Key Management**: Secure access for frontend and external clients.
- **Example Usage**:
```python
import requests
response = requests.post(
"https://api.neurosense.app/infer",
headers={"x-api-key": ""},
files={"video": open("sample.mp4", "rb")}
)
print(response.json())
```

---

## πŸ“Š TensorBoard & Visualization

- **Training Metrics**:
```bash
tensorboard --logdir output/tensorboard
```
- **Confusion Matrices & Curves**:
Check `output/` or `results/` for PNGs and CSVs.

---

## πŸ–₯️ Frontend Web App

**NeuroSense** includes a modern SaaS dashboard built with **Next.js** and **Tailwind CSS**.

### **Frontend Features**

- 🎬 **Media Upload**: Drag-and-drop video/audio files
- πŸ“ **Text Input**: Paste or type transcript for analysis
- ⚑ **Real-Time Inference**: See emotion & sentiment predictions instantly
- πŸ“ˆ **Interactive Visualizations**: Explore confusion matrices, training curves, and usage analytics
- πŸ”‘ **Authentication**: Secure sign-in with Auth.js (Google, GitHub, etc.)
- πŸ“Š **Usage Dashboard**: Track API calls and quota per user
- πŸ›‘οΈ **API Key Management**: Generate and manage API keys for secure access

### **Run the Frontend Locally**

```bash
cd frontend
npm run dev
# Visit http://localhost:3000
```

---

## πŸ“Š Example Results

| Modality | Emotion Accuracy | Sentiment F1 |
| -------------------- | ---------------- | ------------ |
| Video + Audio + Text | 0.82 | 0.87 |

- **Confusion matrices** and **classification reports** are auto-generated and saved for every evaluation.

---

## 🌐 Applications

- **Conversational AI**: Enhance chatbots and virtual assistants with emotional intelligence
- **Customer Experience**: Analyze emotions in support calls and video chats
- **Content Moderation**: Detect toxic or harmful sentiments in user-generated content
- **Mental Health**: Monitor mood and affect in telehealth sessions
- **Education**: Track student engagement and sentiment in e-learning

---

## πŸ’‘ Inspiration

NeuroSense is built to demonstrate the full lifecycle of a multimodal AI applicationβ€”from deep learning model training, to scalable cloud deployment, to a beautiful SaaS web interface. It’s ideal for researchers, engineers, and product teams exploring the future of affective computing.

---

## πŸ“¬ Contributions

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

---

## πŸ“„ License

[MIT](LICENSE)

---

## ⭐️ Give NeuroSense a Star!

If you find this project useful or inspiring, please consider starring the repo and sharing it with your network!

---

**Decode the unspoken. Understand the unseen. Welcome to the future of emotion-aware AI.**

---