https://github.com/mpolinowski/yolo-listen

Use an image classifier to predict audio file labels.
https://github.com/mpolinowski/yolo-listen

audio-labeling docker image-classifier pytorch yolov8n

Last synced: 4 days ago
JSON representation

Use an image classifier to predict audio file labels.

Host: GitHub
URL: https://github.com/mpolinowski/yolo-listen
Owner: mpolinowski
Created: 2023-09-23T15:59:27.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2023-09-23T15:59:53.000Z (about 2 years ago)
Last Synced: 2024-11-30T11:10:32.734Z (10 months ago)
Topics: audio-labeling, docker, image-classifier, pytorch, yolov8n
Language: Jupyter Notebook
Homepage: https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23
Size: 2 MB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

           

# Audio Classification with Computer Vision

Use an PyTorch image classifier to predict audio file labels for the following dataset.

## Dataset

> The [ESC-50 dataset](https://github.com/karolpiczak/ESC-50) is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification.

>

> The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into the following categories:

| class | instances |

| -- | -- |

| dog | 40 |

| glass_breaking | 40 |

| drinking_sipping | 40 |

| rain | 40 |

| insects | 40 |

| laughing | 40 |

| hen | 40 |

| engine | 40 |

| breathing | 40 |

| crying_baby | 40 |

| hand_saw | 40 |

| coughing | 40 |

| snoring | 40 |

| chirping_birds | 40 |

| toilet_flush | 40 |

| pig | 40 |

| washing_machine | 40 |

| clock_tick | 40 |

| sneezing | 40 |

| rooster | 40 |

| sea_waves | 40 |

| siren | 40 |

| cat | 40 |

| door_wood_creaks | 40 |

| helicopter | 40 |

| crackling_fire | 40 |

| car_horn | 40 |

| brushing_teeth | 40 |

| vacuum_cleaner | 40 |

| thunderstorm | 40 |

| door_wood_knock | 40 |

| can_opening | 40 |

| crow | 40 |

| clapping | 40 |

| fireworks | 40 |

| chainsaw | 40 |

| airplane | 40 |

| mouse_click | 40 |

| pouring_water | 40 |

| train | 40 |

| sheep | 40 |

| water_drops | 40 |

| church_bells | 40 |

| clock_alarm | 40 |

| keyboard_typing | 40 |

| wind | 40 |

| footsteps | 40 |

| frog | 40 |

| cow | 40 |

| crickets | 40 |

## Data Preprocessing

Download the dataset all `*.wav` files to `dataset/ESC-50/audio` and run the pre-processing scripts to generate the corresponding spectrograms. The __Train/Val-Split__ will then copy all image files to `./data`:

```bash

├── data

│   ├── test

│   ├── train

│   ├── val

├── dataset

│   └── ESC-50

│       ├── audio

│       └── spectrogram

```

### Spectrograms

![Audio Classification with Computer Vision](./assets/class_label_crow.webp)

![Audio Classification with Computer Vision](./assets/class_label_toilet_flush.webp)

## Model Training

Run the YOLO model inside the a PyTorch container image with [Jupyter Notebooks](https://github.com/mpolinowski/pytorch-jupyter):

```bash

docker run --ipc=host --gpus all -ti --rm \

    -v $(pwd):/opt/app -p 8888:8888 \

    --name pytorch-jupyter \

    pytorch-jupyter:latest

```

![Audio Classification with Computer Vision](./assets/confusion_matrix_normalized.webp)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mpolinowski/yolo-listen

Awesome Lists containing this project

README