{"id":27373392,"url":"https://github.com/cyblx/cnn_urbansound8k","last_synced_at":"2026-04-30T10:08:23.584Z","repository":{"id":286338552,"uuid":"961084673","full_name":"CybLX/CNN_UrbanSound8K","owner":"CybLX","description":"Full pipeline for urban sound classification using PyTorch and the UrbanSound8K dataset. Converts audio into MEL spectrograms, applies data augmentation, and trains a CNN to recognize sounds like horns, barks, and sirens.","archived":false,"fork":false,"pushed_at":"2025-04-05T19:43:02.000Z","size":362,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T11:14:32.278Z","etag":null,"topics":["audio-classification","covolution-neural-network","deep-learning","pytorch","spectrogram","urbansound8k"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CybLX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-05T18:07:27.000Z","updated_at":"2025-04-05T19:43:06.000Z","dependencies_parsed_at":"2025-04-05T20:38:13.631Z","dependency_job_id":null,"html_url":"https://github.com/CybLX/CNN_UrbanSound8K","commit_stats":null,"previous_names":["cyblx/cnn_urbansound8k"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CybLX%2FCNN_UrbanSound8K","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CybLX%2FCNN_UrbanSound8K/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CybLX%2FCNN_UrbanSound8K/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CybLX%2FCNN_UrbanSound8K/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CybLX","download_url":"https://codeload.github.com/CybLX/CNN_UrbanSound8K/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248703200,"owners_count":21148118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-classification","covolution-neural-network","deep-learning","pytorch","spectrogram","urbansound8k"],"created_at":"2025-04-13T11:14:35.270Z","updated_at":"2026-04-30T10:08:18.561Z","avatar_url":"https://github.com/CybLX.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🔊 Urban Sound Classifier - Deep Learning with PyTorch\n\nThis project implements a full pipeline for preprocessing, modeling, and metric visualization for **urban sound classification** using **PyTorch** and the **UrbanSound8K** dataset. The pipeline handles audio files, applies `data augmentation` techniques, and converts data into MEL spectrograms ready to feed into a CNN.\n\n---\n\n## 📁 Project Structure\n\n```bash\n.\n├── checkpoint/                     # Checkpoints with models, metrics, scheduler and optimizer\n│   └── train_and_val_metrics.png   # Plot of accuracy and loss\n├── data analysis/                  # Notebook or scripts with data preprocessing analysis\n├── src/                            # Source code\n│   ├── inference.py                # Inference routine for trained models\n│   ├── model.py                    # CNN architecture\n│   ├── training.py                 # Training loop, validation, early stopping\n│   └── utils.py                    # Dataset class, preprocessing functions\n├── UrbanSound8K/                   # Dataset folder\n├── ForPrediction.py                # Script for inference on new audio files\n├── UrbanSound_Training.py          # Main training routine\n└── README.md\n```\n\n---\n\n## 📚 Objective\n\nThe goal of this project is to develop a classifier for **urban sounds** using **convolutional neural networks (CNNs)** with **PyTorch**. Sounds are extracted from the **UrbanSound8K** dataset, and the system can identify noise such as car horns, dog barks, sirens, and more, based on MEL spectrograms. The pipeline is complete: from raw audio loading to CNN training and results visualization.\n\n---\n\n## 🔄 Preprocessing Pipeline\n\n- 🎵 Reading `.wav` files using `torchaudio`\n- 🔁 Transformation into **MelSpectrogram** with configurable parameters\n- 🧪 Application of **SpecAugment** (time and frequency masking)\n- 🔢 Normalization of spectrogram data\n- 🏷️ Conversion to tensors and label pairing\n\nEach sample is standardized to a fixed input size, making CNN training consistent.\n\n---\n\n## 📦 Custom Dataset\n\nCustom dataset based on `torch.utils.data.Dataset` and `UrbanSound8K`:\n\n- Reads from `metadata/UrbanSound8K.csv`\n- Uses the `fold` column to split training and validation sets\n- Lazy loading of `.wav` files\n- Spectrogram normalization and caching\n- Conditional `data augmentation` only during training\n\n---\n\n## 🏋️ Training\n\nModel training is handled by the `Trainer` class, which includes:\n\n- ✅ Support for **early stopping** and automatic **checkpoints**\n- 📉 Calculation of metrics like loss, accuracy, recall, and F1\n- 📝 Logs saved in `.json` formats\n- 📊 Automatic plotting of training curves (loss and accuracy)\n- 🧪 Validation at the end of each epoch\n\nCNN architecture includes:\n\n- 🔹 4 convolutional blocks with `BatchNorm`, `ReLU`, `Dropout`\n- 🔹 `MaxPooling` between blocks\n- 🔹 `Flatten` + fully connected layers\n- 🔹 Final `Softmax` layer for 10-class classification\n\n---\n\n## 📊 Metrics Visualization\n\nAutomatic visualizations after training:\n\n- Metric logs saved per epoch as CSV files\n- Visualization script in `utils/visualization.py`\n- Charts for:\n  - 🎯 Accuracy and loss per epoch\n  - 🔄 Execution time per epoch\n\n---\n\n## 🎼 Dataset\n\nUsing the **UrbanSound8K** dataset:\n\n- 🔊 **8732 audio files** (`.wav`)\n- 🏷️ **10 classes of urban sounds** (e.g., siren, bark, car horn)\n- 📁 **Split into 10 folders** (`fold1` to `fold10`)\n- 🗂️ Metadata file `metadata/UrbanSound8K.csv` contains:\n  - `slice_file_name`\n  - `fold`\n  - `classID`\n\n🔗 **Download link**: [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html)\n\n---\n\n## 📁 Requirements\n\nInstall the requirements with:\n\n```bash\npip install -r requirements.txt\n```\n\nKey libraries:\n- torch\n- torchaudio\n- scikit-learn\n- matplotlib\n- tqdm\n- pandas\n- numpy\n\n---\n\n## 📌 References\n\n- [UrbanSound8K Dataset](https://urbansounddataset.weebly.com/urbansound8k.html)  \n- [SpecAugment: Data Augmentation for ASR](https://arxiv.org/abs/1904.08779)  \n- [PyTorch](https://pytorch.org/)  \n- [Torchaudio Docs](https://pytorch.org/audio/stable/index.html)  \n- [Scikit-learn Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)  \n- [Audio Deep Learning Made Simple](https://medium.com/data-science/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5) - Ketan Doshi\n\n---\n\n## 👨‍💻 Author\n\nDeveloped by **Lucas Alves**  \n📧 Email: [alves_lucasoliveira@usp.br](mailto:alves_lucasoliveira@usp.br)  \n🐙 GitHub: [cyblx](https://github.com/cyblx)  \n💼 LinkedIn: [cyblx](https://www.linkedin.com/in/cyblx)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyblx%2Fcnn_urbansound8k","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyblx%2Fcnn_urbansound8k","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyblx%2Fcnn_urbansound8k/lists"}