https://github.com/oscarknagg/raw-audio-gender-classification
Machine learning experiment to perform gender classification from raw audio.
https://github.com/oscarknagg/raw-audio-gender-classification
audio convolutional-neural-networks gender-classification machine-learning pytorch speech
Last synced: 11 months ago
JSON representation
Machine learning experiment to perform gender classification from raw audio.
- Host: GitHub
- URL: https://github.com/oscarknagg/raw-audio-gender-classification
- Owner: oscarknagg
- Created: 2018-05-12T15:01:06.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-09-01T20:59:38.000Z (almost 8 years ago)
- Last Synced: 2025-04-07T08:14:32.117Z (about 1 year ago)
- Topics: audio, convolutional-neural-networks, gender-classification, machine-learning, pytorch, speech
- Language: Python
- Homepage:
- Size: 3.83 MB
- Stars: 23
- Watchers: 2
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# raw-audio-gender-classification
This project contains the code to train a gender classification model that takes raw audio as inputs.
The weights of the model from the article can be found in the `models/` directory.
See my Medium article for more discussion.
## Instructions
#### Requirements
Make a new virtualenv and install requirements from `requirements.txt` with
```
pip install -r requirements.txt
```
This project was written in Python 2.7.12 so I cannot guarantee it works on
any other version.
#### Run tests
```
python -m unittest tests
```
#### Data
Get training data here: http://www.openslr.org/12
- train-clean-100.tar.gz
- train-clean-360.tar.gz
- dev-clean.tar.gz
Place the unzipped training data into the `data/` folder so the file structure is as follows:
```
data/
LibriSpeech/
dev-clean/
train-clean-100/
train-clean-360/
SPEAKERS.TXT
```
Please use the `SPEAKERS.TXT` supplied in the repo as I've made a few corrections to the one found at openslr.org.
#### Training
Run `run_experiment.py` with the default parameters to train the model with the performance discussed in the article.
## Processing audio
Run `process_audio.py`, specifying the model and audio file to use. The audio file must be a `.flac` file.
This script makes many predictions on different fragments of the target audio file and saves the results to
`data/results.csv`.
I used this script to produce the data for the video embedded in the Medium article.
## Notebooks
I have uploaded two notebooks with this project.
`Model_Performance_Investigation` gives a breakdown of the performance of the model over the different speakers in the
LibriSpeech dataset.
`Interview_Segmentation` is where I analysed the results of the `process_audio.py` script on an interview between Elton
John and Kirsty Wark.