Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/GAMMA-UMD/doa-release

A Direction-of-Arrival estimation code repo accompanying our research paper.
https://github.com/GAMMA-UMD/doa-release

Last synced: about 2 months ago
JSON representation

A Direction-of-Arrival estimation code repo accompanying our research paper.

Awesome Lists containing this project

README

        

# doa-release
Keras (w/ tensorflow backend) code accompanying our paper [Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks](https://arxiv.org/abs/1904.08452), Interspeech 2019

## Dataset

As described in the paper, our training/validation set is purely simulated data using a geometric sound propagation engine, whereas the test set only consists of real-world recorded data. To train/test our models, you need to download following files:

- Training/validation set features and labels [33.4GB] https://obj.umiacs.umd.edu/gammadata/dataset/doa/features.zip
- Test set original wav files and labels [19.2GB] https://obj.umiacs.umd.edu/gammadata/dataset/doa/SOFA_DOA_test_set.zip
- Training/validation set original wav files (you can skip this one if you don't want to extract the feature yourself) [43.8GB] https://obj.umiacs.umd.edu/gammadata/dataset/doa/wav.zip

The reason for providing test set in wav format instead of feature is that our models are trained assuming the ACN convention, whereas others may assume FuMa convention (see [Ambisonic formats](https://en.wikipedia.org/wiki/Ambisonic_data_exchange_formats)). In the test code, we provide a simple function for format conversion before feature extraction if needed.

## Models

In this repo, we attach several models under `models` folder. These are:

| Model name | Explanation | Convention |
|--------------------------------------|------------------------------------------------------------------------------------------------|------------|
| cartesian_base_model.h5 | Initial (untrained) regression model | ACN |
| cartesian_trained_model.h5 | Trained regression model | ACN |
| categorical_trained_model.h5 | Trained classification model | ACN |
| Perotin_categorical_trained_model.h5 | Baseline classification model by [Perotin et al.](https://hal.inria.fr/hal-01840453/document) | FuMa |

Note that `categorical_trained_model.h5` is trained from `Perotin_categorical_trained_model.h5`, which is not different from training from scratch because their model uses FuMa convention that generates very different features from ACN convention.

## Usage

0. Download dataset and clone this repo.
1. Build conda environment and activate it:
```
conda env create --file=environment.yml
conda activate doa-release
```
2. Training
```
python3 train.py -i [train_feature_dir] -l [train_feature_dir]/train_labels.csv -o [output_dir] -lo [cartesian or categorical] -m models/[model_name]
```
3. Testing
```
python3 test.py -i [test_wav_dir] -l [test_wav_dir]/test_labels.csv -m [trained_model_path] -lo [cartesian or categorical]
```
When testing a model that uses FuMa convention, you must append `-c` to test.py's argument list, which enables conversion from ACN to FuMa. For example:
```
python3 test.py -i [test_wav_dir] -l [test_wav_dir]/test_labels.csv -m models/Perotin_categorical_trained_model.h5 -lo categorical -c
```

## Citation
If you use our codes or models, please consider citing:
```
@inproceedings{Tang2019,
author={Zhenyu Tang and John D. Kanu and Kevin Hogan and Dinesh Manocha},
title={{Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks}},
year=2019,
booktitle={Proc. Interspeech 2019},
pages={654--658},
doi={10.21437/Interspeech.2019-1111},
url={http://dx.doi.org/10.21437/Interspeech.2019-1111}
}
```
We also recommend citing the work of Perontin et al. if you also use their model:
```
@inproceedings{perotin2018crnn,
title={CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector},
author={Perotin, Laur{\'e}line and Serizel, Romain and Vincent, Emmanuel and Gu{\'e}rin, Alexandre},
booktitle={2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC)},
pages={241--245},
year={2018},
organization={IEEE}
}
```