An open API service indexing awesome lists of open source software.

https://github.com/cyblx/cnn_urbansound8k

Full pipeline for urban sound classification using PyTorch and the UrbanSound8K dataset. Converts audio into MEL spectrograms, applies data augmentation, and trains a CNN to recognize sounds like horns, barks, and sirens.
https://github.com/cyblx/cnn_urbansound8k

audio-classification covolution-neural-network deep-learning pytorch spectrogram urbansound8k

Last synced: about 2 months ago
JSON representation

Full pipeline for urban sound classification using PyTorch and the UrbanSound8K dataset. Converts audio into MEL spectrograms, applies data augmentation, and trains a CNN to recognize sounds like horns, barks, and sirens.

Awesome Lists containing this project

README

          

# ๐Ÿ”Š Urban Sound Classifier - Deep Learning with PyTorch

This project implements a full pipeline for preprocessing, modeling, and metric visualization for **urban sound classification** using **PyTorch** and the **UrbanSound8K** dataset. The pipeline handles audio files, applies `data augmentation` techniques, and converts data into MEL spectrograms ready to feed into a CNN.

---

## ๐Ÿ“ Project Structure

```bash
.
โ”œโ”€โ”€ checkpoint/ # Checkpoints with models, metrics, scheduler and optimizer
โ”‚ โ””โ”€โ”€ train_and_val_metrics.png # Plot of accuracy and loss
โ”œโ”€โ”€ data analysis/ # Notebook or scripts with data preprocessing analysis
โ”œโ”€โ”€ src/ # Source code
โ”‚ โ”œโ”€โ”€ inference.py # Inference routine for trained models
โ”‚ โ”œโ”€โ”€ model.py # CNN architecture
โ”‚ โ”œโ”€โ”€ training.py # Training loop, validation, early stopping
โ”‚ โ””โ”€โ”€ utils.py # Dataset class, preprocessing functions
โ”œโ”€โ”€ UrbanSound8K/ # Dataset folder
โ”œโ”€โ”€ ForPrediction.py # Script for inference on new audio files
โ”œโ”€โ”€ UrbanSound_Training.py # Main training routine
โ””โ”€โ”€ README.md
```

---

## ๐Ÿ“š Objective

The goal of this project is to develop a classifier for **urban sounds** using **convolutional neural networks (CNNs)** with **PyTorch**. Sounds are extracted from the **UrbanSound8K** dataset, and the system can identify noise such as car horns, dog barks, sirens, and more, based on MEL spectrograms. The pipeline is complete: from raw audio loading to CNN training and results visualization.

---

## ๐Ÿ”„ Preprocessing Pipeline

- ๐ŸŽต Reading `.wav` files using `torchaudio`
- ๐Ÿ” Transformation into **MelSpectrogram** with configurable parameters
- ๐Ÿงช Application of **SpecAugment** (time and frequency masking)
- ๐Ÿ”ข Normalization of spectrogram data
- ๐Ÿท๏ธ Conversion to tensors and label pairing

Each sample is standardized to a fixed input size, making CNN training consistent.

---

## ๐Ÿ“ฆ Custom Dataset

Custom dataset based on `torch.utils.data.Dataset` and `UrbanSound8K`:

- Reads from `metadata/UrbanSound8K.csv`
- Uses the `fold` column to split training and validation sets
- Lazy loading of `.wav` files
- Spectrogram normalization and caching
- Conditional `data augmentation` only during training

---

## ๐Ÿ‹๏ธ Training

Model training is handled by the `Trainer` class, which includes:

- โœ… Support for **early stopping** and automatic **checkpoints**
- ๐Ÿ“‰ Calculation of metrics like loss, accuracy, recall, and F1
- ๐Ÿ“ Logs saved in `.json` formats
- ๐Ÿ“Š Automatic plotting of training curves (loss and accuracy)
- ๐Ÿงช Validation at the end of each epoch

CNN architecture includes:

- ๐Ÿ”น 4 convolutional blocks with `BatchNorm`, `ReLU`, `Dropout`
- ๐Ÿ”น `MaxPooling` between blocks
- ๐Ÿ”น `Flatten` + fully connected layers
- ๐Ÿ”น Final `Softmax` layer for 10-class classification

---

## ๐Ÿ“Š Metrics Visualization

Automatic visualizations after training:

- Metric logs saved per epoch as CSV files
- Visualization script in `utils/visualization.py`
- Charts for:
- ๐ŸŽฏ Accuracy and loss per epoch
- ๐Ÿ”„ Execution time per epoch

---

## ๐ŸŽผ Dataset

Using the **UrbanSound8K** dataset:

- ๐Ÿ”Š **8732 audio files** (`.wav`)
- ๐Ÿท๏ธ **10 classes of urban sounds** (e.g., siren, bark, car horn)
- ๐Ÿ“ **Split into 10 folders** (`fold1` to `fold10`)
- ๐Ÿ—‚๏ธ Metadata file `metadata/UrbanSound8K.csv` contains:
- `slice_file_name`
- `fold`
- `classID`

๐Ÿ”— **Download link**: [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html)

---

## ๐Ÿ“ Requirements

Install the requirements with:

```bash
pip install -r requirements.txt
```

Key libraries:
- torch
- torchaudio
- scikit-learn
- matplotlib
- tqdm
- pandas
- numpy

---

## ๐Ÿ“Œ References

- [UrbanSound8K Dataset](https://urbansounddataset.weebly.com/urbansound8k.html)
- [SpecAugment: Data Augmentation for ASR](https://arxiv.org/abs/1904.08779)
- [PyTorch](https://pytorch.org/)
- [Torchaudio Docs](https://pytorch.org/audio/stable/index.html)
- [Scikit-learn Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)
- [Audio Deep Learning Made Simple](https://medium.com/data-science/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5) - Ketan Doshi

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

Developed by **Lucas Alves**
๐Ÿ“ง Email: [alves_lucasoliveira@usp.br](mailto:alves_lucasoliveira@usp.br)
๐Ÿ™ GitHub: [cyblx](https://github.com/cyblx)
๐Ÿ’ผ LinkedIn: [cyblx](https://www.linkedin.com/in/cyblx)