{"id":19311610,"url":"https://github.com/rednafi/urban-sound-classification","last_synced_at":"2025-09-17T09:31:16.733Z","repository":{"id":49473731,"uuid":"184739935","full_name":"rednafi/urban-sound-classification","owner":"rednafi","description":"Urban sound source tagging from an aggregation of four second noisy audio clips via 1D and 2D CNN (Xception)","archived":false,"fork":false,"pushed_at":"2023-03-24T22:34:26.000Z","size":38722,"stargazers_count":60,"open_issues_count":20,"forks_count":15,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-02T00:36:14.592Z","etag":null,"topics":["audio-processing","audio-tagging","classification","machine-learning","mel-spectrogram","sound-classification","sound-classification-spectrograms","sound-processing","sound-synthesis","urban-sound-8k","urban-sound-classification"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rednafi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-03T10:50:31.000Z","updated_at":"2025-03-27T08:07:44.000Z","dependencies_parsed_at":"2022-09-11T19:01:03.455Z","dependency_job_id":"d191b318-0cce-4928-bdda-fe08da68bf4c","html_url":"https://github.com/rednafi/urban-sound-classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rednafi/urban-sound-classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rednafi%2Furban-sound-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rednafi%2Furban-sound-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rednafi%2Furban-sound-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rednafi%2Furban-sound-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rednafi","download_url":"https://codeload.github.com/rednafi/urban-sound-classification/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rednafi%2Furban-sound-classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275568557,"owners_count":25488493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-processing","audio-tagging","classification","machine-learning","mel-spectrogram","sound-classification","sound-classification-spectrograms","sound-processing","sound-synthesis","urban-sound-8k","urban-sound-classification"],"created_at":"2024-11-10T00:29:33.291Z","updated_at":"2025-09-17T09:31:13.704Z","avatar_url":"https://github.com/rednafi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Urban Sound Classification\nUrban sound source tagging from an aggregation of four second noisy audio clips via 1D and 2D CNN (Xception)\n\n[![Dataset](https://img.shields.io/badge/Dataset-Urban8k-red.svg)](https://urbansounddataset.weebly.com/urbansound8k.html)\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)\n[![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/rednafi/urban-sound-classification/blob/master/LICENSE)\n![stability-experimental](https://img.shields.io/badge/stability-experimental-orange.svg)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n\n\u003c/div\u003e\n\n## Dataset Description\nThe Urban Sound Classification dataset contains 8732 labeled sound excerpts (\u003c=4s) of urban sounds from 10 classes,namely:\n\n* Air Conditioner\n* Car Horn\n* Children Playing\n* Dog bark\n* Drilling Engine\n* Idling Gun Shot\n* Jackhammer\n* Siren\n* Street Music\n\nThe attributes of data are mapped as follows:\n* **ID** – Unique ID of sound excerpt and **Class** – type of sound\n\n![air_conditioner](https://github.com/rednafi/urban-sound-classification/blob/master/notebooks/eda_plots/amplitude_vs_time/air_conditioner.svg)\n![air_conditioner](https://user-images.githubusercontent.com/30027932/57352070-febe8a80-7185-11e9-8806-44ccfb79d986.png)\n\n## Project Organization\n### Folder Structure\n\n```\n.\n├── data\n│   ├── img\n│   │   ├── audio-features.png\n│   │   ├── sound.png\n│   │   └── time_freq.png\n│   ├── test\n│   │   └── Test\n|   |       ├── 1.wav\n|   |       ├── 2.wav\n|   |       ├── .............\n│   ├── test.csv\n│   ├── train\n│   │   └── Train\n|   |       ├── 1.wav\n|   |       ├── 2.wav\n|   |       ├── ............\n|   |\n│   └── train.csv\n├── LICENSE\n├── notebooks\n│   ├── eda_plots\n│   │   ├── amplitude_vs_time\n│   │   │   ├── air_conditioner.svg\n│   │   │   ├── car_horn.svg\n|   |   |   ├── ............\n│   │   └── mel_spectrum\n│   │       ├── air_conditioner.png\n│   │       ├── car_horn.png\n|   |       ├── ............\n│   └── Exploratory Data Analysis.ipynb\n├── README.md\n├── requirements.txt\n├── results\n│   ├── acc_model_1d.png\n│   ├── acc_model_2d.png\n│   ├── loss_model_1d.png\n│   ├── loss_model_2d.png\n│   ├── pred_1d.csv\n│   └── pred_2d.csv\n└── src\n    ├── test_1d.py\n    ├── test_2d.py\n    ├── train_1d.py\n    ├── train_2d.py\n    ├── utils_1d.py\n    └── utils_2d.py\n\n```\n\n### Workflow\n\n**Exploratory Data Analysis:**\n* Frequency normalization and amplitude vs time plot\n* Mel spectogram plot\n\n**Audio Tagging:**\n\n* Normalizing the audio clips and passing them through stacks of 1D convolution layers for feature extraction. Then the usual dense layer stacks were used to do the final categorization.\n\n* Extracting features in the form of [mel-spectogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) and passing them\nthrough stacks of 2D convolution layers for additional feature pulling. Dense layer stack does the final classification. In this case, we trained an Xception model from scratch to achieve better generalization capability.\n\n### Result\n\nWe achieved  89% validation accuracy in the second approach.\n![xception_val_acc](https://github.com/rednafi/urban-sound-classification/blob/master/results/acc_model_2d.png)\n\n### Requirements\n```\npip install -r requirements.txt\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frednafi%2Furban-sound-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frednafi%2Furban-sound-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frednafi%2Furban-sound-classification/lists"}