https://github.com/fazledyn/gender-classification-from-audio-clips

In this project, we built a machine learning model that can identify the gender of a person from their voice recording.
https://github.com/fazledyn/gender-classification-from-audio-clips

deep-learning gender-classification machine-learning tensorflow

Last synced: 26 days ago
JSON representation

In this project, we built a machine learning model that can identify the gender of a person from their voice recording.

Host: GitHub
URL: https://github.com/fazledyn/gender-classification-from-audio-clips
Owner: fazledyn
Created: 2023-02-20T16:29:24.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2023-03-31T10:19:21.000Z (over 2 years ago)
Last Synced: 2024-11-12T09:26:39.088Z (11 months ago)
Topics: deep-learning, gender-classification, machine-learning, tensorflow
Language: Jupyter Notebook
Homepage:
Size: 654 KB
Stars: 6
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

![Project Cover Image](/docs/cover.svg)

# Overview
In this project, we aim to build a machine learning model that can identify the gender of a person from their voice recording. In the process, we use two intermediary data representation format of the audio clips- **Mel Spectrogram** (Mel) and **Mel-Frequency Cepstral Coefficients** (MFCC).

# Datasets
**[MCV]** Common Voice by Mozilla.org (https://www.kaggle.com/datasets/mozillaorg/common-voice)

**[DLS]** Bengali Common Voice Speech Dataset (https://www.kaggle.com/competitions/dlsprint)

# Proposed Solution
## Mel-Frequency Cepstral Coefficients (MFCC)
![Project Cover Image](/docs/mfcc.png)
## Mel Spectrogram
![Project Cover Image](/docs/mel.png)

# Notebook Details
## Training
The `training` folder contains four notebooks. Each of the notebooks are named as: `[Data-Type]_[Dataset]_[Model]`. These notebooks are used to train individual models on the train datasets.

```
└── training
├── mel_dls_resnet50_train.ipynb
├── mel_mcv_resnet50.ipynb
├── mfcc_dls_train_resnet50.ipynb
└── mfcc_mcv_resnet50.ipynb
```

## Evaluation
The `evaluation` folder contains four notebooks. Each of the notebooks are named as: `[Data-Type]_[Datase#1]_on_[Dataset#2]`. The models trained on `Dataset#1` are used to evaluate `Dataset#2`.

```
└── evaluation
├── mel_dls_on_mcv.ipynb
├── mel_mcv_on_dls.ipynb
├── mfcc_dls_on_mcv.ipynb
└── mfcc_mcv_on_dls.ipynb
```

In the report mentioned in the [presentation](#presentation-report), the comparison between models are shown.

# Model Details
- Architecture: ResNet50
- Learning Rate: 0.0001
- Adam Optimizer

# Presentation Report
https://docs.google.com/presentation/d/14BWOq6YSmO3GqZHEvCou43Z5A4dlOmKq4pjqUqgZALU/

# References
[1] **Speaker Gender Recognition Based on Deep Neural Networks and ResNet50** (https://doi.org/10.1155/2022/4444388)

[2] **A Machine Learning Approach to Automating Bengali Voice Based Gender Classification** (https://ieeexplore.ieee.org/document/9117385)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fazledyn/gender-classification-from-audio-clips

Awesome Lists containing this project

README