https://github.com/idiap/nnsslm

Neural Network based Sound Source Localization Models
https://github.com/idiap/nnsslm

Last synced: 3 months ago
JSON representation

Neural Network based Sound Source Localization Models

Host: GitHub
URL: https://github.com/idiap/nnsslm
Owner: idiap
License: bsd-3-clause
Created: 2019-05-10T08:24:10.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-08-29T04:55:28.000Z (almost 2 years ago)
Last Synced: 2025-03-23T01:01:53.351Z (3 months ago)
Language: Python
Size: 3.66 MB
Stars: 38
Watchers: 5
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

awesome-sound-source-localization - GCC & GCC-Fbank
awesome-sound-source-localization - GCC & GCC-Fbank

README

        Neural Network based Sound Source Localization Models

=====================================================

This repository includes the programs to run and test the neural network models that we proposed in our publications on sound source localization, including

* Deep learning based direction-of-arrival (DOA) estimation

* Joint DOA estimation and speech/non-speech classification

The networks has been trained to work with the microphone array of the Softbank Pepper robot (early version with directional microphones).

Please cite the relevant publications when using the code.

Dependency

----------

* `Python `_ (version >= 3.8)

* `NumPy `_ (version >= 1.14)

* `PyTorch `_ (version >= 0.2)

* `apkit `_ (version >= 0.2)

Data

----

We use the `SSLR dataset `_ for the experiments.

Usage

-----

Training

********

This command reproduces the "ResNet-STFT" model described in `"Deep Learning Approaches for Auditory Perception in Robotics", Weipeng He, PhD Thesis, EPFL `_:

.. code::

  ssl_nn_v2/train_resnet_2stage.py --model=models/test --window-size=8192 --hop-size=4096 --n-doa=360 --output-act=5 --n-out-hidden=1 --sigma=8 --s1-epoch=4 --epoch=10 --lr=0.001 --ld=2 --batch-size=128 /lsp_train_*

Inference

*********

.. code::

  ssl_nn_v2/test_nn_raw.py --feature=stft --method=METHOD_NAME --window-size=8192 --hop-size=4096 --batch-size=100 /human models/thesis_resnet_act5_p1lsp_s1ep4_ep10_valid_b100

Evaluation

**********

.. code::

  eval/gen_2tasks_report.py --method=METHOD_NAME --window-size=8192 --hop-size=4096 --output=REPORT_DIR --ssl-only /human

  gnuplot --persist REPORT_DIR/ssl_pr_plot

Publications

------------

The models and code in this repository are based on the work published in:

  Deep Neural Networks for Multiple Speaker Detection and Localization

  Weipeng He, Petr Motlicek, Jean-Marc Odobez 

  In *IEEE International Conference on Robotics and Automation (ICRA)*, 2018

  Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network

  Weipeng He, Petr Motlicek, Jean-Marc Odobez 

  In *INTERSPEECH*, 2018

  Deep Learning Approaches for Auditory Perception in Robotics

  Weipeng He

  PhD Thesis, EPFL

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/idiap/nnsslm

Awesome Lists containing this project

README