https://github.com/mpolinowski/yolo-listen
Use an image classifier to predict audio file labels.
https://github.com/mpolinowski/yolo-listen
audio-labeling docker image-classifier pytorch yolov8n
Last synced: 4 days ago
JSON representation
Use an image classifier to predict audio file labels.
- Host: GitHub
- URL: https://github.com/mpolinowski/yolo-listen
- Owner: mpolinowski
- Created: 2023-09-23T15:59:27.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-09-23T15:59:53.000Z (about 2 years ago)
- Last Synced: 2024-11-30T11:10:32.734Z (10 months ago)
- Topics: audio-labeling, docker, image-classifier, pytorch, yolov8n
- Language: Jupyter Notebook
- Homepage: https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23
- Size: 2 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Audio Classification with Computer VisionUse an PyTorch image classifier to predict audio file labels for the following dataset.
## Dataset
> The [ESC-50 dataset](https://github.com/karolpiczak/ESC-50) is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classification.
>
> The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into the following categories:| class | instances |
| -- | -- |
| dog | 40 |
| glass_breaking | 40 |
| drinking_sipping | 40 |
| rain | 40 |
| insects | 40 |
| laughing | 40 |
| hen | 40 |
| engine | 40 |
| breathing | 40 |
| crying_baby | 40 |
| hand_saw | 40 |
| coughing | 40 |
| snoring | 40 |
| chirping_birds | 40 |
| toilet_flush | 40 |
| pig | 40 |
| washing_machine | 40 |
| clock_tick | 40 |
| sneezing | 40 |
| rooster | 40 |
| sea_waves | 40 |
| siren | 40 |
| cat | 40 |
| door_wood_creaks | 40 |
| helicopter | 40 |
| crackling_fire | 40 |
| car_horn | 40 |
| brushing_teeth | 40 |
| vacuum_cleaner | 40 |
| thunderstorm | 40 |
| door_wood_knock | 40 |
| can_opening | 40 |
| crow | 40 |
| clapping | 40 |
| fireworks | 40 |
| chainsaw | 40 |
| airplane | 40 |
| mouse_click | 40 |
| pouring_water | 40 |
| train | 40 |
| sheep | 40 |
| water_drops | 40 |
| church_bells | 40 |
| clock_alarm | 40 |
| keyboard_typing | 40 |
| wind | 40 |
| footsteps | 40 |
| frog | 40 |
| cow | 40 |
| crickets | 40 |## Data Preprocessing
Download the dataset all `*.wav` files to `dataset/ESC-50/audio` and run the pre-processing scripts to generate the corresponding spectrograms. The __Train/Val-Split__ will then copy all image files to `./data`:
```bash
├── data
│ ├── test
│ ├── train
│ ├── val
├── dataset
│ └── ESC-50
│ ├── audio
│ └── spectrogram
```### Spectrograms


## Model Training
Run the YOLO model inside the a PyTorch container image with [Jupyter Notebooks](https://github.com/mpolinowski/pytorch-jupyter):
```bash
docker run --ipc=host --gpus all -ti --rm \
-v $(pwd):/opt/app -p 8888:8888 \
--name pytorch-jupyter \
pytorch-jupyter:latest
```