https://github.com/oscarknagg/raw-audio-gender-classification

Machine learning experiment to perform gender classification from raw audio.
https://github.com/oscarknagg/raw-audio-gender-classification

audio convolutional-neural-networks gender-classification machine-learning pytorch speech

Last synced: 11 months ago
JSON representation

Machine learning experiment to perform gender classification from raw audio.

Host: GitHub
URL: https://github.com/oscarknagg/raw-audio-gender-classification
Owner: oscarknagg
Created: 2018-05-12T15:01:06.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2018-09-01T20:59:38.000Z (almost 8 years ago)
Last Synced: 2025-04-07T08:14:32.117Z (about 1 year ago)
Topics: audio, convolutional-neural-networks, gender-classification, machine-learning, pytorch, speech
Language: Python
Homepage:
Size: 3.83 MB
Stars: 23
Watchers: 2
Forks: 7
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# raw-audio-gender-classification

This project contains the code to train a gender classification model that takes raw audio as inputs.

The weights of the model from the article can be found in the `models/` directory.

See my Medium article for more discussion.

## Instructions
#### Requirements
Make a new virtualenv and install requirements from `requirements.txt` with
```
pip install -r requirements.txt
```
This project was written in Python 2.7.12 so I cannot guarantee it works on
any other version.

#### Run tests

```
python -m unittest tests
```

#### Data
Get training data here: http://www.openslr.org/12
- train-clean-100.tar.gz
- train-clean-360.tar.gz
- dev-clean.tar.gz

Place the unzipped training data into the `data/` folder so the file structure is as follows:
```
data/
LibriSpeech/
dev-clean/
train-clean-100/
train-clean-360/
SPEAKERS.TXT
```

Please use the `SPEAKERS.TXT` supplied in the repo as I've made a few corrections to the one found at openslr.org.

#### Training

Run `run_experiment.py` with the default parameters to train the model with the performance discussed in the article.

## Processing audio

Run `process_audio.py`, specifying the model and audio file to use. The audio file must be a `.flac` file.

This script makes many predictions on different fragments of the target audio file and saves the results to
`data/results.csv`.

I used this script to produce the data for the video embedded in the Medium article.

## Notebooks

I have uploaded two notebooks with this project.

`Model_Performance_Investigation` gives a breakdown of the performance of the model over the different speakers in the
LibriSpeech dataset.

`Interview_Segmentation` is where I analysed the results of the `process_audio.py` script on an interview between Elton
John and Kirsty Wark.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oscarknagg/raw-audio-gender-classification

Awesome Lists containing this project

README