Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fazledyn/gender-classification-from-audio-clips
In this project, we built a machine learning model that can identify the gender of a person from their voice recording.
https://github.com/fazledyn/gender-classification-from-audio-clips
deep-learning gender-classification machine-learning tensorflow
Last synced: about 2 months ago
JSON representation
In this project, we built a machine learning model that can identify the gender of a person from their voice recording.
- Host: GitHub
- URL: https://github.com/fazledyn/gender-classification-from-audio-clips
- Owner: fazledyn
- Created: 2023-02-20T16:29:24.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2023-03-31T10:19:21.000Z (almost 2 years ago)
- Last Synced: 2024-03-17T14:01:11.668Z (10 months ago)
- Topics: deep-learning, gender-classification, machine-learning, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 654 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![Project Cover Image](/docs/cover.svg)
# Overview
In this project, we aim to build a machine learning model that can identify the gender of a person from their voice recording. In the process, we use two intermediary data representation format of the audio clips- **Mel Spectrogram** (Mel) and **Mel-Frequency Cepstral Coefficients** (MFCC).# Datasets
**[MCV]** Common Voice by Mozilla.org (https://www.kaggle.com/datasets/mozillaorg/common-voice)**[DLS]** Bengali Common Voice Speech Dataset (https://www.kaggle.com/competitions/dlsprint)
# Proposed Solution
## Mel-Frequency Cepstral Coefficients (MFCC)
![Project Cover Image](/docs/mfcc.png)
## Mel Spectrogram
![Project Cover Image](/docs/mel.png)# Notebook Details
## Training
The `training` folder contains four notebooks. Each of the notebooks are named as: `[Data-Type]_[Dataset]_[Model]`. These notebooks are used to train individual models on the train datasets.```
└── training
├── mel_dls_resnet50_train.ipynb
├── mel_mcv_resnet50.ipynb
├── mfcc_dls_train_resnet50.ipynb
└── mfcc_mcv_resnet50.ipynb
```## Evaluation
The `evaluation` folder contains four notebooks. Each of the notebooks are named as: `[Data-Type]_[Datase#1]_on_[Dataset#2]`. The models trained on `Dataset#1` are used to evaluate `Dataset#2`.```
└── evaluation
├── mel_dls_on_mcv.ipynb
├── mel_mcv_on_dls.ipynb
├── mfcc_dls_on_mcv.ipynb
└── mfcc_mcv_on_dls.ipynb
```In the report mentioned in the [presentation](#presentation-report), the comparison between models are shown.
# Model Details
- Architecture: ResNet50
- Learning Rate: 0.0001
- Adam Optimizer# Presentation Report
https://docs.google.com/presentation/d/14BWOq6YSmO3GqZHEvCou43Z5A4dlOmKq4pjqUqgZALU/# References
[1] **Speaker Gender Recognition Based on Deep Neural Networks and ResNet50** (https://doi.org/10.1155/2022/4444388)[2] **A Machine Learning Approach to Automating Bengali Voice Based Gender Classification** (https://ieeexplore.ieee.org/document/9117385)