{"id":13822198,"url":"https://github.com/dr-costas/dnd-sed","last_synced_at":"2025-05-16T15:33:21.683Z","repository":{"id":51794087,"uuid":"235791281","full_name":"dr-costas/dnd-sed","owner":"dr-costas","description":"Sound event detection with depthwise separable and dilated convolutions. ","archived":false,"fork":false,"pushed_at":"2020-03-30T06:47:57.000Z","size":57,"stargazers_count":51,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-04T08:07:55.161Z","etag":null,"topics":["audio-signal-processing","deep-learning","deep-neural-networks","depthwise-separable-convolutions","depthwiseseparableconvolution","dilated-cnn","dilated-convolution","machine-learning","machine-listening","sound-event-detection"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2002.00476","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dr-costas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-23T12:33:26.000Z","updated_at":"2024-03-04T06:27:43.000Z","dependencies_parsed_at":"2022-08-17T15:35:17.759Z","dependency_job_id":null,"html_url":"https://github.com/dr-costas/dnd-sed","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dr-costas%2Fdnd-sed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dr-costas%2Fdnd-sed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dr-costas%2Fdnd-sed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dr-costas%2Fdnd-sed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dr-costas","download_url":"https://codeload.github.com/dr-costas/dnd-sed/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225436580,"owners_count":17474171,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-signal-processing","deep-learning","deep-neural-networks","depthwise-separable-convolutions","depthwiseseparableconvolution","dilated-cnn","dilated-convolution","machine-learning","machine-listening","sound-event-detection"],"created_at":"2024-08-04T08:01:48.033Z","updated_at":"2024-11-19T22:32:36.700Z","avatar_url":"https://github.com/dr-costas.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Sound event detection with depthwise separable and dilated convolutions\n\n----\n\n### Welcome to the repository of DnD-SED method. \n\nThis is the repository for the method presented in the paper \n\"Sound Event Detection with Depthwise Separable and Dilated Convolutions\", by \n[K. Drossos](https://tutcris.tut.fi/portal/en/persons/konstantinos-drosos(b1070370-5156-4280-b354-6291618bb965).html), \n[S. I. Mimilakis](https://www.idmt.fraunhofer.de/en/institute/doctorands/mimilakis.html), \n[S. Gharib](https://scholar.google.com/citations?user=neb2vi0AAAAJ\u0026hl=en), \n[Y. Li](https://scholar.google.com/citations?user=ywDuJjEAAAAJ\u0026hl=en), \nand [T. Virtanen](https://tutcris.tut.fi/portal/en/persons/tuomas-virtanen(210e58bb-c224-40a9-bf6c-5b786297e841).html).\n\nOur code is based on [PyTorch framework](https://pytorch.org/) \nand we use the publicly available dataset \n[TUT-SED Synthetic 2016](http://www.cs.tut.fi/sgn/arg/taslp2017-crnn-sed/tut-sed-synthetic-2016). \n\nOur paper is submitted for review to the [IEEE World Congress on Computational \nIntelligence/International Joint Conference on Neural Networks \n(WCCI/IJCNN)](https://wcci2020.org/).  \n\nYou can find an online version of [our paper at arXiv](https://arxiv.org/abs/2002.00476).\n\n**If you use our method, please cite our paper.**  \n\n----\n\n## Table of Contents\n1. [Method introduction](#method-introduction)\n2. [System set-up](#system-set-up)\n3. [Conducting the experiments](#conducting-the-experiments)\n\n----\n\n## Method introduction\n\nMethods for sound event detection (SED) are usually based on a composition\nof three functions; a feature extractor, an identifier of long temporal context, and a\nclassifier. State-of-the-art SED methods use typical 2D convolutional neural networks (CNNs)\nas the feature extractor and an RNN for identifying long temporal context (a simple \naffine transform with a non-linearity is utilized as a classifier). This set-up can \nyield a considerable amount of parameters, amounting up to couple of millions (e.g. 4M)\nAdditionally, the utilization of an RNN impedes the training process and the parallelization\nof the method.  \n\nWith our DnD-SED method we propose the replacement of the typical 2D CNNs used as a \nfeature extractor with depthwise separable convolutions, and the replacement of the\nRNN with dilated convolutions. We compare our method with the widely-used CRNN method,\nusing the publicly available TUT-SED Synthetic 2016 dataset. We conduct a series of \n10 experiments and we report mean values of time needed for one training epoch, F1 score,\nerror rate, and amount of parameters.   \n\nWe achieve a considerable decrease at the computational complexity and a simultaneous\nincrease on the SED performance. Specifically, we achieve a reduction of the amount of \nparameters and the mean time needed for one training epoch (reduction of 85% and 72% \nrespectively). Also, we achieve an increase of the mean F1 score by 4/6% and a reduction\nof the mean error rate by 3.8%. \n\nYou can find more information in [our paper](https://arxiv.org/abs/2002.00476)!\n\n----\n\n## System set-up\n\nTo run and use our method (or simply repeat the experiments), you need to set-up\nthe code and use the specific dataset. We provide you the full code used for the\nmethod, but you will have to get the audio files and extract the features.   \n\n### Code set-up\n\nTo set-up the code and run our code, you will need to clone this repository and\nthen install the dependencies using your favorite package manager. If you are \nusing Conda, then you can do: \n\n````shell script\n$ conda env create --yes --file conda_dependencies.yml\n```` \n\nThen, an environment with the name `dnd-sed` will be created, using Python 3.7. If\nyou prefer PIP, then you can do:\n\n````shell script\n$ pip install -r pip_dependencies.txt\n````\n\nAnd you will be good to go! If anything is not working, please let me know by\nmaking an issue in this repository. \n\n### Data set-up\n\nTo set-up the data, you first have to follow the procedure and download the\ndata from the [corresponding web-page](http://www.cs.tut.fi/sgn/arg/taslp2017-crnn-sed/tut-sed-synthetic-2016).\nThen, you should create your input/output values and use them with our method.\n\nThe code in this repository offers data handling functionality. The \n`data_feders.get_tut_sed_data_loader` function returns a PyTorch data loader, using as\na dataset class the `data_feders.TUTSEDSynthetic2016`. \n\nTo use your extracted features with the class, you should have saved the features\nand the target values as separate files. You can specify the file names and the\ndirectory having these files in the settings files. \n\n----\n\n## Conducting the experiments\n\nIn the `settings` directory you can find all the settings that were used for the\nresults presented in the paper. We uses each settings file 10 times, and then we\naveraged the results. If you want to reproduce our results, then please remember \nto follow our procedure. \n\nTo run the code you just have to use the `main.py` script, passing the proper\narguments. The needed arguments for running the `main.py` script are: \n\n * The name of the model that will be used, `-m`. Accepted values are:\n   1. `baseline` -- This is the baseline, CRNN model.\n   2. `baseline_dilated` -- This is the baseline model, but with the RNN replaced\n   by a CNN with dilated convolution.\n   3. `dessed` -- This is the baseline model, but with the CNNs replaced by\n   depth-wise separable convolutions.\n   4. `dessed_dilated` -- This is our proposed model, with depth-wise separable\n   convolutions, followed by dilated convolution. \n * The name of the settings file to be used (without the extension `.yaml`): `-c`. For\n example, if the settings file `synthetic_2016_k_55_d_1_1.yaml` is to be used, then\n this argument has be `synthetic_2016_k_55_d_1_1`. \n \nThere are some optional arguments for the `main.py` script. These are: \n\n * The extension of the settings file, `-e`. Default value is `.yaml`.\n * The directory where the settings file is ,`-d`. Default value is\n `settings`.\n\nEnjoy!\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdr-costas%2Fdnd-sed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdr-costas%2Fdnd-sed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdr-costas%2Fdnd-sed/lists"}