Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/soundata/soundata

Python library for downloading, loading & working with sound datasets
https://github.com/soundata/soundata

audio bioacoustics dataset environmental-sound python urban-sound

Last synced: 3 months ago
JSON representation

Python library for downloading, loading & working with sound datasets

Awesome Lists containing this project

README

        

# soundata

Python library for downloading, loading & working with sound datasets. Check the [API documentation](https://soundata.readthedocs.io/) and the [contributing instructions](https://soundata.readthedocs.io/en/latest/source/contributing.html).

For Music Information Retrieval (MIR) datasets please check [mirdata](https://github.com/mir-dataset-loaders/mirdata).

![CI status](https://github.com/soundata/soundata/actions/workflows/ci.yml/badge.svg?branch=main)
![Formatting status](https://github.com/soundata/soundata/actions/workflows/formatting.yml/badge.svg?branch=main)
![Linting status](https://github.com/soundata/soundata/actions/workflows/lint-python.yml/badge.svg?branch=main)
[![Downloads](https://static.pepy.tech/badge/soundata)](https://pepy.tech/project/soundata)

[![codecov](https://codecov.io/gh/soundata/soundata/branch/master/graph/badge.svg)](https://codecov.io/gh/soundata/soundata)
[![Documentation Status](https://readthedocs.org/projects/soundata/badge/?version=latest)](https://soundata.readthedocs.io/en/latest/?badge=latest)
![GitHub](https://img.shields.io/github/license/soundata/soundata.svg)
[![PyPI version](https://badge.fury.io/py/soundata.svg)](https://badge.fury.io/py/soundata)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)

This library provides tools for working with common sound datasets, including tools for:
* Downloading datasets to a common location and format
* Validating that the files for a dataset are all present
* Loading annotation files to a common format
* Parsing clip-level metadata for detailed evaluations

Here's soundata's [list of currently supported datasets](https://soundata.readthedocs.io/en/latest/source/quick_reference.html).

### Installation

To install, simply run:

```python
pip install soundata
```

### Quick example
```python
import soundata

dataset = soundata.initialize('urbansound8k')
dataset.download() # download the dataset
dataset.validate() # validate that all the expected files are there

example_clip = dataset.choice_clip() # choose a random example clip
print(example_clip) # see the available data

```
See the [documentation](https://soundata.readthedocs.io/) for more examples and the API reference.

### Contributing a new dataset loader

We welcome and encourage contributions to this library, especially new dataset loaders. Please see [contributing](https://soundata.readthedocs.io/en/latest/source/contributing.html) for guidelines. Feel free to [open an issue](https://github.com/soundata/soundata/issues) if you have any doubt or your run into problems when working on the library.

### Releases

The Soundata Zenodo repository is the preferred source for downloading the software releases.

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11518021.svg)](https://doi.org/10.5281/zenodo.11518021)

### Citing

If you use Soundata in your pipeline, please cite the version used with the corresponding DOI of the version release in Zenodo. For Soundata v1.0.1.:

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11580085.svg)](https://doi.org/10.5281/zenodo.11580085)

If you refer to soundata's design principles, motivation etc., please cite the JOSS article:

[![DOI](https://joss.theoj.org/papers/10.21105/joss.06634/status.svg)](https://doi.org/10.21105/joss.06634)

```bibtex
@article{Fuentes2024,
title = {{Soundata: Reproducible use of audio datasets}},
author = {Fuentes, Magdalena and Plaja-Roglans, Genís and Cortès-Sebastià, Guillem and Khandelwal, Tanmay and Miron, Marius and Serra, Xavier and Bello, Juan Pablo and Salamon, Justin},
year = 2024,
month = jun,
journal = {Journal of Open Source Software},
volume = 9,
number = 98,
pages = 6634,
doi = {10.21105/joss.06634},
url = {https://joss.theoj.org/papers/10.21105/joss.06634}
}
```

When working with datasets, please include the reference of the dataset, which can be found in the respective dataset loader using `cite()`.