Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soundata/soundata
Python library for downloading, loading & working with sound datasets
https://github.com/soundata/soundata
audio bioacoustics dataset environmental-sound python urban-sound
Last synced: 3 months ago
JSON representation
Python library for downloading, loading & working with sound datasets
- Host: GitHub
- URL: https://github.com/soundata/soundata
- Owner: soundata
- License: bsd-3-clause
- Created: 2021-03-02T00:42:25.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-22T16:33:54.000Z (6 months ago)
- Last Synced: 2024-05-22T16:36:16.019Z (6 months ago)
- Topics: audio, bioacoustics, dataset, environmental-sound, python, urban-sound
- Language: Python
- Homepage: https://soundata.readthedocs.io/en/stable
- Size: 124 MB
- Stars: 270
- Watchers: 10
- Forks: 19
- Open Issues: 38
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - soundata/soundata - Python library for downloading, loading & working with sound datasets (Python)
README
# soundata
Python library for downloading, loading & working with sound datasets. Check the [API documentation](https://soundata.readthedocs.io/) and the [contributing instructions](https://soundata.readthedocs.io/en/latest/source/contributing.html).
For Music Information Retrieval (MIR) datasets please check [mirdata](https://github.com/mir-dataset-loaders/mirdata).![CI status](https://github.com/soundata/soundata/actions/workflows/ci.yml/badge.svg?branch=main)
![Formatting status](https://github.com/soundata/soundata/actions/workflows/formatting.yml/badge.svg?branch=main)
![Linting status](https://github.com/soundata/soundata/actions/workflows/lint-python.yml/badge.svg?branch=main)
[![Downloads](https://static.pepy.tech/badge/soundata)](https://pepy.tech/project/soundata)[![codecov](https://codecov.io/gh/soundata/soundata/branch/master/graph/badge.svg)](https://codecov.io/gh/soundata/soundata)
[![Documentation Status](https://readthedocs.org/projects/soundata/badge/?version=latest)](https://soundata.readthedocs.io/en/latest/?badge=latest)
![GitHub](https://img.shields.io/github/license/soundata/soundata.svg)
[![PyPI version](https://badge.fury.io/py/soundata.svg)](https://badge.fury.io/py/soundata)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)This library provides tools for working with common sound datasets, including tools for:
* Downloading datasets to a common location and format
* Validating that the files for a dataset are all present
* Loading annotation files to a common format
* Parsing clip-level metadata for detailed evaluationsHere's soundata's [list of currently supported datasets](https://soundata.readthedocs.io/en/latest/source/quick_reference.html).
### Installation
To install, simply run:
```python
pip install soundata
```### Quick example
```python
import soundatadataset = soundata.initialize('urbansound8k')
dataset.download() # download the dataset
dataset.validate() # validate that all the expected files are thereexample_clip = dataset.choice_clip() # choose a random example clip
print(example_clip) # see the available data```
See the [documentation](https://soundata.readthedocs.io/) for more examples and the API reference.### Contributing a new dataset loader
We welcome and encourage contributions to this library, especially new dataset loaders. Please see [contributing](https://soundata.readthedocs.io/en/latest/source/contributing.html) for guidelines. Feel free to [open an issue](https://github.com/soundata/soundata/issues) if you have any doubt or your run into problems when working on the library.
### Releases
The Soundata Zenodo repository is the preferred source for downloading the software releases.
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11518021.svg)](https://doi.org/10.5281/zenodo.11518021)
### Citing
If you use Soundata in your pipeline, please cite the version used with the corresponding DOI of the version release in Zenodo. For Soundata v1.0.1.:
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11580085.svg)](https://doi.org/10.5281/zenodo.11580085)
If you refer to soundata's design principles, motivation etc., please cite the JOSS article:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.06634/status.svg)](https://doi.org/10.21105/joss.06634)
```bibtex
@article{Fuentes2024,
title = {{Soundata: Reproducible use of audio datasets}},
author = {Fuentes, Magdalena and Plaja-Roglans, Genís and Cortès-Sebastià, Guillem and Khandelwal, Tanmay and Miron, Marius and Serra, Xavier and Bello, Juan Pablo and Salamon, Justin},
year = 2024,
month = jun,
journal = {Journal of Open Source Software},
volume = 9,
number = 98,
pages = 6634,
doi = {10.21105/joss.06634},
url = {https://joss.theoj.org/papers/10.21105/joss.06634}
}
```When working with datasets, please include the reference of the dataset, which can be found in the respective dataset loader using `cite()`.