Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/idiap/nnsslm
Neural Network based Sound Source Localization Models
https://github.com/idiap/nnsslm
Last synced: 2 months ago
JSON representation
Neural Network based Sound Source Localization Models
- Host: GitHub
- URL: https://github.com/idiap/nnsslm
- Owner: idiap
- License: bsd-3-clause
- Created: 2019-05-10T08:24:10.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-08-29T04:55:28.000Z (over 1 year ago)
- Last Synced: 2023-08-29T10:10:51.333Z (over 1 year ago)
- Language: Python
- Size: 3.66 MB
- Stars: 21
- Watchers: 6
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-sound-source-localization - GCC & GCC-Fbank
- awesome-sound-source-localization - GCC & GCC-Fbank
README
Neural Network based Sound Source Localization Models
=====================================================This repository includes the programs to run and test the neural network models that we proposed in our publications on sound source localization, including
* Deep learning based direction-of-arrival (DOA) estimation
* Joint DOA estimation and speech/non-speech classificationThe networks has been trained to work with the microphone array of the Softbank Pepper robot (early version with directional microphones).
Please cite the relevant publications when using the code.
Dependency
----------* `Python `_ (version >= 3.8)
* `NumPy `_ (version >= 1.14)
* `PyTorch `_ (version >= 0.2)
* `apkit `_ (version >= 0.2)Data
----We use the `SSLR dataset `_ for the experiments.
Usage
-----Training
********This command reproduces the "ResNet-STFT" model described in `"Deep Learning Approaches for Auditory Perception in Robotics", Weipeng He, PhD Thesis, EPFL `_:
.. code::
ssl_nn_v2/train_resnet_2stage.py --model=models/test --window-size=8192 --hop-size=4096 --n-doa=360 --output-act=5 --n-out-hidden=1 --sigma=8 --s1-epoch=4 --epoch=10 --lr=0.001 --ld=2 --batch-size=128 /lsp_train_*
Inference
*********.. code::
ssl_nn_v2/test_nn_raw.py --feature=stft --method=METHOD_NAME --window-size=8192 --hop-size=4096 --batch-size=100 /human models/thesis_resnet_act5_p1lsp_s1ep4_ep10_valid_b100
Evaluation
**********.. code::
eval/gen_2tasks_report.py --method=METHOD_NAME --window-size=8192 --hop-size=4096 --output=REPORT_DIR --ssl-only /human
gnuplot --persist REPORT_DIR/ssl_pr_plotPublications
------------The models and code in this repository are based on the work published in:
Deep Neural Networks for Multiple Speaker Detection and Localization
Weipeng He, Petr Motlicek, Jean-Marc Odobez
In *IEEE International Conference on Robotics and Automation (ICRA)*, 2018Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
Weipeng He, Petr Motlicek, Jean-Marc Odobez
In *INTERSPEECH*, 2018Deep Learning Approaches for Auditory Perception in Robotics
Weipeng He
PhD Thesis, EPFL