https://github.com/danishayman/speaker-change-detection

A Jupyter notebook implementation of speaker change detection using LSTM-based deep learning models on the IEMOCAP dataset.
https://github.com/danishayman/speaker-change-detection

deep-learning iemocap ipython-notebook jupyter-notebook lstm lstm-neural-networks machine-learning neural-network python

Last synced: 4 months ago
JSON representation

A Jupyter notebook implementation of speaker change detection using LSTM-based deep learning models on the IEMOCAP dataset.

Host: GitHub
URL: https://github.com/danishayman/speaker-change-detection
Owner: danishayman
Created: 2025-01-27T11:37:23.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-01-27T14:07:52.000Z (8 months ago)
Last Synced: 2025-05-18T09:11:32.382Z (5 months ago)
Topics: deep-learning, iemocap, ipython-notebook, jupyter-notebook, lstm, lstm-neural-networks, machine-learning, neural-network, python
Language: Jupyter Notebook
Homepage:
Size: 976 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🎙️ Speaker Change Detection using Deep Learning

A Jupyter notebook implementation of speaker change detection using LSTM-based deep learning models on the IEMOCAP dataset.

## 📋 Overview

This project implements a speaker change detection system using LSTM networks in a Jupyter notebook format. The system processes audio features (MFCC and F0) to identify points in a conversation where speaker transitions occur.

## 🔧 Prerequisites

- Python 3.8+
- Jupyter Notebook/Lab
- TensorFlow 2.x
- librosa
- parselmouth
- numpy
- pandas
- matplotlib
- scikit-learn
- seaborn

## 📦 Setup

1. Clone the repository:
```bash
git clone https://github.com/danishayman/Speaker-Change-Detection.git
cd Speaker-Change-Detection
```

2. Install required packages:
```bash
pip install -r requirements.txt
```

3. Download the IEMOCAP dataset:
- The dataset can be obtained from [Kaggle](https://www.kaggle.com/datasets/dejolilandry/iemocapfullrelease/data)
- Place the downloaded dataset in your working directory

## 📓 Notebook Structure

The project is contained in a single Jupyter notebook with the following sections:

1. **Import Libraries**: Setting up necessary Python packages
2. **Feature Extraction**:
- Loading audio files
- Extracting MFCC and F0 features
- Defining sliding window parameters
3. **Data Preprocessing**:
- RTTM parsing
- Label generation
- Dataset splitting
4. **Model Development**:
- Building LSTM model
- Training with different window sizes
- Performance evaluation
5. **Results and Analysis**:
- Visualization of results
- Confusion matrix analysis
- Comprehensive performance metrics

## 🚀 Features

- 🎵 Audio feature extraction (MFCC and F0)
- 🪟 Sliding window analysis with various sizes (3, 5, 7, 9 frames)
- 🤖 LSTM-based architecture with batch normalization
- 📊 Comprehensive evaluation metrics and visualizations
- 📈 Experiment analysis with different window sizes

## 💻 Usage

1. Open the Jupyter notebook:
```bash
jupyter notebook speaker_change_detection.ipynb
```

2. Ensure your IEMOCAP dataset path is correctly set in the notebook:
```python
base_path = "path/to/your/IEMOCAP/dataset"
```

3. Run all cells sequentially to:
- Extract features
- Process data
- Train models
- Visualize results

## 📊 Results

The model's performance across different window sizes:

- Best Window Size: 7 frames
- Peak Accuracy: 66.94%
- Precision: 0.0047
- Recall: 0.6593
- F1-Score: 0.0093

## 🔄 Model Architecture

```python
Sequential([
Input(shape=input_shape),
LSTM(128, return_sequences=True),
BatchNormalization(),
Dropout(0.3),
LSTM(64),
BatchNormalization(),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
```

## 🛠️ Future Improvements

- [ ] Implement data augmentation techniques
- [ ] Explore attention mechanisms
- [ ] Add residual connections
- [ ] Implement curriculum learning
- [ ] Experiment with additional acoustic features
- [ ] Optimize batch size and training epochs with better hardware

## 📚 Citation
```bibtex
@article{busso2008iemocap,
title = {IEMOCAP: Interactive emotional dyadic motion capture database},
author = {Busso, Carlos and Bulut, Murtaza and Lee, Chi-Chun and
Kazemzadeh, Abe and Mower, Emily and Kim, Samuel and
Chang, Jeannette and Lee, Sungbok and Narayanan, Shrikanth S},
journal = {Speech Communication},
volume = {50},
number = {11},
pages = {1150--1162},
year = {2008},
publisher = {Elsevier}
}
```

## ⚠️ Note

The current implementation faces challenges with class imbalance and computational constraints. Future improvements should focus on addressing these limitations for better performance.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/danishayman/speaker-change-detection

Awesome Lists containing this project

README