https://github.com/cyblx/cnn_urbansound8k
Full pipeline for urban sound classification using PyTorch and the UrbanSound8K dataset. Converts audio into MEL spectrograms, applies data augmentation, and trains a CNN to recognize sounds like horns, barks, and sirens.
https://github.com/cyblx/cnn_urbansound8k
audio-classification covolution-neural-network deep-learning pytorch spectrogram urbansound8k
Last synced: about 2 months ago
JSON representation
Full pipeline for urban sound classification using PyTorch and the UrbanSound8K dataset. Converts audio into MEL spectrograms, applies data augmentation, and trains a CNN to recognize sounds like horns, barks, and sirens.
- Host: GitHub
- URL: https://github.com/cyblx/cnn_urbansound8k
- Owner: CybLX
- License: mit
- Created: 2025-04-05T18:07:27.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-05T19:43:02.000Z (about 1 year ago)
- Last Synced: 2025-04-13T11:14:32.278Z (about 1 year ago)
- Topics: audio-classification, covolution-neural-network, deep-learning, pytorch, spectrogram, urbansound8k
- Language: Python
- Homepage:
- Size: 354 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ Urban Sound Classifier - Deep Learning with PyTorch
This project implements a full pipeline for preprocessing, modeling, and metric visualization for **urban sound classification** using **PyTorch** and the **UrbanSound8K** dataset. The pipeline handles audio files, applies `data augmentation` techniques, and converts data into MEL spectrograms ready to feed into a CNN.
---
## ๐ Project Structure
```bash
.
โโโ checkpoint/ # Checkpoints with models, metrics, scheduler and optimizer
โ โโโ train_and_val_metrics.png # Plot of accuracy and loss
โโโ data analysis/ # Notebook or scripts with data preprocessing analysis
โโโ src/ # Source code
โ โโโ inference.py # Inference routine for trained models
โ โโโ model.py # CNN architecture
โ โโโ training.py # Training loop, validation, early stopping
โ โโโ utils.py # Dataset class, preprocessing functions
โโโ UrbanSound8K/ # Dataset folder
โโโ ForPrediction.py # Script for inference on new audio files
โโโ UrbanSound_Training.py # Main training routine
โโโ README.md
```
---
## ๐ Objective
The goal of this project is to develop a classifier for **urban sounds** using **convolutional neural networks (CNNs)** with **PyTorch**. Sounds are extracted from the **UrbanSound8K** dataset, and the system can identify noise such as car horns, dog barks, sirens, and more, based on MEL spectrograms. The pipeline is complete: from raw audio loading to CNN training and results visualization.
---
## ๐ Preprocessing Pipeline
- ๐ต Reading `.wav` files using `torchaudio`
- ๐ Transformation into **MelSpectrogram** with configurable parameters
- ๐งช Application of **SpecAugment** (time and frequency masking)
- ๐ข Normalization of spectrogram data
- ๐ท๏ธ Conversion to tensors and label pairing
Each sample is standardized to a fixed input size, making CNN training consistent.
---
## ๐ฆ Custom Dataset
Custom dataset based on `torch.utils.data.Dataset` and `UrbanSound8K`:
- Reads from `metadata/UrbanSound8K.csv`
- Uses the `fold` column to split training and validation sets
- Lazy loading of `.wav` files
- Spectrogram normalization and caching
- Conditional `data augmentation` only during training
---
## ๐๏ธ Training
Model training is handled by the `Trainer` class, which includes:
- โ
Support for **early stopping** and automatic **checkpoints**
- ๐ Calculation of metrics like loss, accuracy, recall, and F1
- ๐ Logs saved in `.json` formats
- ๐ Automatic plotting of training curves (loss and accuracy)
- ๐งช Validation at the end of each epoch
CNN architecture includes:
- ๐น 4 convolutional blocks with `BatchNorm`, `ReLU`, `Dropout`
- ๐น `MaxPooling` between blocks
- ๐น `Flatten` + fully connected layers
- ๐น Final `Softmax` layer for 10-class classification
---
## ๐ Metrics Visualization
Automatic visualizations after training:
- Metric logs saved per epoch as CSV files
- Visualization script in `utils/visualization.py`
- Charts for:
- ๐ฏ Accuracy and loss per epoch
- ๐ Execution time per epoch
---
## ๐ผ Dataset
Using the **UrbanSound8K** dataset:
- ๐ **8732 audio files** (`.wav`)
- ๐ท๏ธ **10 classes of urban sounds** (e.g., siren, bark, car horn)
- ๐ **Split into 10 folders** (`fold1` to `fold10`)
- ๐๏ธ Metadata file `metadata/UrbanSound8K.csv` contains:
- `slice_file_name`
- `fold`
- `classID`
๐ **Download link**: [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html)
---
## ๐ Requirements
Install the requirements with:
```bash
pip install -r requirements.txt
```
Key libraries:
- torch
- torchaudio
- scikit-learn
- matplotlib
- tqdm
- pandas
- numpy
---
## ๐ References
- [UrbanSound8K Dataset](https://urbansounddataset.weebly.com/urbansound8k.html)
- [SpecAugment: Data Augmentation for ASR](https://arxiv.org/abs/1904.08779)
- [PyTorch](https://pytorch.org/)
- [Torchaudio Docs](https://pytorch.org/audio/stable/index.html)
- [Scikit-learn Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)
- [Audio Deep Learning Made Simple](https://medium.com/data-science/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5) - Ketan Doshi
---
## ๐จโ๐ป Author
Developed by **Lucas Alves**
๐ง Email: [alves_lucasoliveira@usp.br](mailto:alves_lucasoliveira@usp.br)
๐ GitHub: [cyblx](https://github.com/cyblx)
๐ผ LinkedIn: [cyblx](https://www.linkedin.com/in/cyblx)