An open API service indexing awesome lists of open source software.

https://github.com/oscarknagg/raw-audio-gender-classification

Machine learning experiment to perform gender classification from raw audio.
https://github.com/oscarknagg/raw-audio-gender-classification

audio convolutional-neural-networks gender-classification machine-learning pytorch speech

Last synced: 11 months ago
JSON representation

Machine learning experiment to perform gender classification from raw audio.

Awesome Lists containing this project

README

          

# raw-audio-gender-classification

This project contains the code to train a gender classification model that takes raw audio as inputs.

The weights of the model from the article can be found in the `models/` directory.

See my Medium article for more discussion.

## Instructions
#### Requirements
Make a new virtualenv and install requirements from `requirements.txt` with
```
pip install -r requirements.txt
```
This project was written in Python 2.7.12 so I cannot guarantee it works on
any other version.

#### Run tests

```
python -m unittest tests
```

#### Data
Get training data here: http://www.openslr.org/12
- train-clean-100.tar.gz
- train-clean-360.tar.gz
- dev-clean.tar.gz

Place the unzipped training data into the `data/` folder so the file structure is as follows:
```
data/
LibriSpeech/
dev-clean/
train-clean-100/
train-clean-360/
SPEAKERS.TXT
```

Please use the `SPEAKERS.TXT` supplied in the repo as I've made a few corrections to the one found at openslr.org.

#### Training

Run `run_experiment.py` with the default parameters to train the model with the performance discussed in the article.

## Processing audio

Run `process_audio.py`, specifying the model and audio file to use. The audio file must be a `.flac` file.

This script makes many predictions on different fragments of the target audio file and saves the results to
`data/results.csv`.

I used this script to produce the data for the video embedded in the Medium article.

## Notebooks

I have uploaded two notebooks with this project.

`Model_Performance_Investigation` gives a breakdown of the performance of the model over the different speakers in the
LibriSpeech dataset.

`Interview_Segmentation` is where I analysed the results of the `process_audio.py` script on an interview between Elton
John and Kirsty Wark.