https://github.com/apple/ml-spatial-librispeech
A large synthetic dataset of spatial audio with multiple labels
https://github.com/apple/ml-spatial-librispeech
machine-learning spatial-audio
Last synced: 8 months ago
JSON representation
A large synthetic dataset of spatial audio with multiple labels
- Host: GitHub
- URL: https://github.com/apple/ml-spatial-librispeech
- Owner: apple
- License: other
- Created: 2023-08-18T21:29:08.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-25T15:49:18.000Z (over 2 years ago)
- Last Synced: 2025-01-30T07:33:11.246Z (over 1 year ago)
- Topics: machine-learning, spatial-audio
- Homepage:
- Size: 11.7 KB
- Stars: 96
- Watchers: 17
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Spatial LibriSpeech
Spatial LibriSpeech, is a spatial audio dataset with over 650 hours of first-order
ambisonics, and optional distractor noise (with raw 19-channel audio coming soon). Spatial LibriSpeech is designed for machine learning
model training, and it includes labels for source position, speaking direction, room acoustics and
geometry. Spatial LibriSpeech was generated by augmenting LibriSpeech samples with 200k+ simulated
acoustic conditions across 8k+ synthetic rooms.
For more information, refer to our paper: https://doi.org/10.21437/Interspeech.2023-2117.
If you use Spatial LibriSpeech in a publication, please cite our paper:
```
@inproceedings{spatial_librispeech2023,
author={Miguel Sarabia and Elena Menyaylenko and Alessandro Toso and Skyler Seto
and Zakaria Aldeneh and Shadi Pirhosseinloo and Luca Zappella
and Barry-John Theobald and Nicholas Apostoloff and Jonathan Sheaffer},
title={{Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning}},
year={2023},
booktitle={Proc. Interspeech},
pages={3724--3728},
doi={10.21437/Interspeech.2023-2117}
}
```
## 📜 License
By downloading and using Spatial LibriSpeech, you are agreeing to comply with
the terms of its [LICENSE](LICENSE).
## 💾 Download
Our downloader script & pytorch dataloader will be uploaded soon.
### Manual download
In the meantime, all our files are hosted here:
```python3
SLS_URI = "https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1"
```
You can manually download the metadata from here. Refer to [dataset schema](DATASET_SCHEMA.md)
for more information about how the data is structured.
```python3
f"{SLS_URI}/metadata.parquet"
```
Using the metadata you can manually download samples with:
```python3
# speech first order ambisonics samples
f"{SLS_URI}/ambisonics/{sample_id:06}.flac"
# distractor noise first order ambisonics samples
f"{SLS_URI}/noise_ambisonics/{sample_id:06}.flac"
```
So, for instance, you may download the metadata with this command:
```bash
curl -O https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1/metadata.parquet
```
And the first speech sample with:
```bash
curl -O https://docs-assets.developer.apple.com/ml-research/datasets/spatial-librispeech/v1/ambisonics/000000.flac
```
⚠️ 19-channel speech and distractor noise samples are very large and we are evaluating how to best host them. If
you need them in the meantime, please contact us.
## ✉️ Contact
* [spatial-librispeech-dataset@group.apple.com](mailto:spatial-librispeech-dataset@group.apple.com)