Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/archinetai/audio-data-pytorch
A collection of useful audio datasets and transforms for PyTorch.
https://github.com/archinetai/audio-data-pytorch
artifical-intelligense audio-generation datasets deep-learning pytorch
Last synced: 4 days ago
JSON representation
A collection of useful audio datasets and transforms for PyTorch.
- Host: GitHub
- URL: https://github.com/archinetai/audio-data-pytorch
- Owner: archinetai
- License: mit
- Created: 2022-07-24T09:15:36.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-11T23:30:31.000Z (almost 2 years ago)
- Last Synced: 2025-01-07T22:08:29.384Z (12 days ago)
- Topics: artifical-intelligense, audio-generation, datasets, deep-learning, pytorch
- Language: Python
- Homepage:
- Size: 47.9 KB
- Stars: 137
- Watchers: 5
- Forks: 22
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - archinetai/audio-data-pytorch - A collection of useful audio datasets and transforms for PyTorch. (Python)
README
# Audio Data - PyTorch
A collection of useful audio datasets and transforms for PyTorch.
## Install
```bash
pip install audio-data-pytorch
```[![PyPI - Python Version](https://img.shields.io/pypi/v/audio-data-pytorch?style=flat&colorA=0f0f0f&colorB=0f0f0f)](https://pypi.org/project/audio-data-pytorch/)
## Datasets
### WAV Dataset
Load one or multiple folders of `.wav` files as dataset.
```py
from audio_data_pytorch import WAVDatasetdataset = WAVDataset(path=['my/path1', 'my/path2'])
```#### Full API:
```py
WAVDataset(
path: Union[str, Sequence[str]], # Path or list of paths from which to load files
recursive: bool = False # Recursively load files from provided paths
sample_rate: bool = False, # Specify sample rate to convert files to on read
random_crop_size: int = None, # Load small portions of files randomly
transforms: Optional[Callable] = None, # Transforms to apply to audio files
check_silence: bool = True # Discards silent samples if true
)
```### AudioWebDataset
A [`WebDataset`](https://webdataset.github.io/webdataset/) extension for audio data. Assumes that the `.tar` file comes with pairs of `.wav` (or `.flac`) and `.json` data.
```py
from audio_data_pytorch import AudioWebDatasetdataset = AudioWebDataset(
urls='mywebdataset.tar'
)waveform, info = next(iter(dataset))
print(waveform.shape) # torch.Size([2, 480000])
print(info.keys()) # dict_keys(['text'])
```#### Full API:
```py
dataset = AudioWebDataset(
urls: Union[str, Sequence[str]],
shuffle: Optional[int] = None,
batch_size: Optional[int] = None,
transforms: Optional[Callable] = None,# Transforms to apply to audio files
use_wav_processor: bool = False, # Set this to True if your tar files only use .wav
crop_size: Optional[int] = None,
max_crops: Optional[int] = None,
**kwargs, # Forwarded to WebDataset class)
```### LJSpeech Dataset
An unsupervised dataset for LJSpeech with voice-only data.
```py
from audio_data_pytorch import LJSpeechDatasetdataset = LJSpeechDataset(root='./data')
dataset[0] # (1, 158621)
dataset[1] # (1, 153757)
```#### Full API:
```py
LJSpeechDataset(
root: str = "./data", # The root where the dataset will be downloaded
transforms: Optional[Callable] = None, # Transforms to apply to audio files
)
```### LibriSpeech Dataset
Wrapper for the [LibriSpeech](https://www.openslr.org/12) dataset (EN only). Requires `pip install datasets`. Note that this dataset requires several GBs of storage.```py
from audio_data_pytorch import LibriSpeechDatasetdataset = LibriSpeechDataset(
root="./data",
)dataset[0] # (1, 222336)
```#### Full API:
```py
LibriSpeechDataset(
root: str = "./data", # The root where the dataset will be downloaded
with_info: bool = False, # Whether to return info (i.e. text, sampling rate, speaker_id)
transforms: Optional[Callable] = None, # Transforms to apply to audio files
)
```### Common Voice Dataset
Multilanguage wrapper for the [Common Voice](https://commonvoice.mozilla.org/). Requires `pip install datasets`. Note that each language requires several GBs of storage, and that you have to confirm access for each distinct version you use e.g. [here](https://huggingface.co/datasets/mozilla-foundation/common_voice_10_0), to validate your Huggingface access token. You can provide a list of `languages` and to avoid an unbalanced dataset the values will be interleaved by downsampling the majority language to have the same number of samples as the minority language.```py
from audio_data_pytorch import CommonVoiceDatasetdataset = CommonVoiceDataset(
auth_token="hf_xxx",
version=1,
root="../data",
languages=['it']
)
```#### Full API:
```py
CommonVoiceDataset(
auth_token: str, # Your Huggingface access token
version: int, # Common Voice dataset version
sub_version: int = 0, # Subversion: common_voice_{version}_{sub_version}
root: str = "./data", # The root where the dataset will be downloaded
languages: Sequence[str] = ['en'], # List of languages to include in the dataset
with_info: bool = False, # Whether to return info (i.e. text, sampling rate, age, gender, accent, locale)
transforms: Optional[Callable] = None, # Transforms to apply to audio files
)
```### Youtube Dataset
A wrapper around yt-dlp that automatically downloads the audio source of Youtube videos. Requires `pip install yt-dlp`.```py
from audio_data_pytorch import YoutubeDatasetdataset = YoutubeDataset(
root='./data',
urls=[
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://www.youtube.com/watch?v=BZ-_KQezKmU",
],
crop_length=10 # Crop source in 10s chunks (optional but suggested)
)
dataset[0] # (2, 480000)
```#### Full API:
```py
dataset = YoutubeDataset(
urls: Sequence[str], # The list of youtube urls
root: str = "./data", # The root where the dataset will be downloaded
crop_length: Optional[int] = None, # Crops the source into chunks of `crop_length` seconds
with_sample_rate: bool = False, # Returns sample rate as second argument
transforms: Optional[Callable] = None, # Transforms to apply to audio files
)
```### Clotho Dataset
A wrapper for the [Clotho](https://zenodo.org/record/3490684#.Y0VVVOxBwR0) dataset extending `AudioWebDataset`. Requires `pip install py7zr` to decompress `.7z` archive.```py
from audio_data_pytorch import ClothoDataset, Crop, Stereo, Monodataset = ClothoDataset(
root='./data/',
preprocess_sample_rate=48000, # Added to all files during preprocessing
preprocess_transforms=nn.Sequential(Crop(48000*10), Stereo()), # Added to all files during preprocessing
transforms=Mono() # Added dynamically at iteration time
)
```#### Full API:
```py
dataset = ClothoDataset(
root: str, # Path where the dataset is saved
split: str = 'train', # Dataset split, one of: 'train', 'valid'
preprocess_sample_rate: Optional[int] = None, # Preprocesses dataset to this sample rate
preprocess_transforms: Optional[Callable] = None, # Preprocesses dataset with the provided transfomrs
reset: bool = False, # Re-compute preprocessing if `true`
**kwargs # Forwarded to `AudioWebDataset`
)
```### MetaDataset
Extends `WAVDataset` with artist and genres read from ID3 tags and returned as string arrays or optionally mapped to integers stored in a json file at `metadata_mapping_path`.```py
from audio_data_pytorch import MetaDatasetdataset = MetaDataset(
path: Union[str, Sequence[str]], # Path or list of paths from which to load files
metadata_mapping_path: Optional[str] = None, # Path where mapping from artist/genres to numbers will be saved
)waveform, artists, genres = next(iter(dataset))
# Convert an artist ID back to a string
artist_name = dataset.mappings['artists'].invert[insert_artist_id]# Convert a genre ID back to a string
genre_name = dataset.mappings['genres'].invert[insert_genre_id]# If given a metadata_mapping_path, metadata is returned as an int Tensor
waveform, artist_genre_tensor = next(iter(dataset))
```#### Full API:
```py
dataset = MetaDataset(
path: Union[str, Sequence[str]], # Path or list of paths from which to load files
metadata_mapping_path: Optional[str] = None, # Path where mapping from artist/genres to numbers will be saved
max_artists: int = 4, # Max number of artists to return
max_genres: int = 4, # Max number of artists to return
**kwargs # Forwarded to `WAVDataset`
)
```## Transforms
You can use the following individual transforms, or merge them with `nn.Sequential()`:
```py
from audio_data_pytorch import Crop
crop = Crop(size=22050*2, start=0) # Crop 2 seconds at 22050 Hz from the start of the filefrom audio_data_pytorch import RandomCrop
random_crop = RandomCrop(size=22050*2) # Crop 2 seconds at 22050 Hz from a random positionfrom audio_data_pytorch import Resample
resample = Resample(source=48000, target=22050), # Resamples from 48kHz to 22kHzfrom audio_data_pytorch import Mono
overlap = Mono() # Overap channels by sum to get mono soruce (C, N) -> (1, N)from audio_data_pytorch import Stereo
stereo = Stereo() # Duplicate channels (1, N) -> (2, N) or (2, N) -> (2, N)from audio_data_pytorch import Scale
scale = Scale(scale=0.8) # Scale waveform amplitude by 0.8from audio_data_pytorch import Loudness
loudness = Loudness(sampling_rate=22050, target=-20) # Normalize loudness to -20dB, requires `pip install pyloudnorm`
```Or use this wrapper to apply a subset of them in one go, API:
```py
from audio_data_pytorch import AllTransformtransform = AllTransform(
source_rate: Optional[int] = None,
target_rate: Optional[int] = None,
crop_size: Optional[int] = None,
random_crop_size: Optional[int] = None,
loudness: Optional[int] = None,
scale: Optional[float] = None,
mono: bool = False,
stereo: bool = False,
)
```