{"id":16402520,"url":"https://github.com/spijkervet/torchaudio-augmentations","last_synced_at":"2025-12-14T13:28:16.736Z","repository":{"id":40202219,"uuid":"332333362","full_name":"Spijkervet/torchaudio-augmentations","owner":"Spijkervet","description":"Audio transformations library for PyTorch","archived":false,"fork":false,"pushed_at":"2022-04-19T12:45:17.000Z","size":1244,"stargazers_count":230,"open_issues_count":6,"forks_count":27,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-29T20:04:58.902Z","etag":null,"topics":["audio","machine-learning","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Spijkervet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-24T00:01:58.000Z","updated_at":"2025-01-28T22:55:38.000Z","dependencies_parsed_at":"2022-06-26T21:31:28.103Z","dependency_job_id":null,"html_url":"https://github.com/Spijkervet/torchaudio-augmentations","commit_stats":null,"previous_names":["spijkervet/audio-augmentations"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spijkervet%2Ftorchaudio-augmentations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spijkervet%2Ftorchaudio-augmentations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spijkervet%2Ftorchaudio-augmentations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Spijkervet%2Ftorchaudio-augmentations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Spijkervet","download_url":"https://codeload.github.com/Spijkervet/torchaudio-augmentations/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399878,"owners_count":20932880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","machine-learning","pytorch"],"created_at":"2024-10-11T05:46:28.174Z","updated_at":"2025-12-14T13:28:11.426Z","avatar_url":"https://github.com/Spijkervet.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyTorch Audio Augmentations\n![CI status](https://github.com/spijkervet/torchaudio-augmentations/actions/workflows/ci.yml/badge.svg)\n[![codecov](https://codecov.io/gh/Spijkervet/torchaudio-augmentations/branch/master/graph/badge.svg?token=0DEFJYJH5K)](https://codecov.io/gh/Spijkervet/torchaudio-augmentations)\n[![Downloads](https://pepy.tech/badge/torchaudio-augmentations)](https://pepy.tech/project/torchaudio-augmentations)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4748582.svg)](https://zenodo.org/record/4748582#)\n\nAudio data augmentations library for PyTorch for audio in the time-domain. The focus of this repository is to:\n- Provide many audio transformations in an easy Python interface.\n- Have a high test coverage.\n- Easily control stochastic (sequential) audio transformations.\n- Make every audio transformation differentiable with PyTorch's `nn.Module`.\n- Optimise audio transformations for CPU and GPU.\n\nIt supports stochastic transformations as used often in self-supervised, semi-supervised learning methods. One can apply a single stochastic augmentation or create as many stochastically transformed audio examples from a single interface.\n\nThis package follows the conventions set out by `torchvision` and `torchaudio`, with audio defined as a tensor of `[channel, time]`, or a batched representation `[batch, channel, time]`. Each individual augmentation can be initialized on its own, or be wrapped around a `RandomApply` interface which will apply the augmentation with probability `p`.\n\n\n## Usage\nWe can define a single or several audio augmentations, which are applied sequentially to an audio waveform.\n```python\nfrom audio_augmentations import *\n\naudio, sr = torchaudio.load(\"tests/classical.00002.wav\")\n\nnum_samples = sr * 5\ntransforms = [\n    RandomResizedCrop(n_samples=num_samples),\n    RandomApply([PolarityInversion()], p=0.8),\n    RandomApply([Noise(min_snr=0.001, max_snr=0.005)], p=0.3),\n    RandomApply([Gain()], p=0.2),\n    HighLowPass(sample_rate=sr), # this augmentation will always be applied in this aumgentation chain!\n    RandomApply([Delay(sample_rate=sr)], p=0.5),\n    RandomApply([PitchShift(\n        n_samples=num_samples,\n        sample_rate=sr\n    )], p=0.4),\n    RandomApply([Reverb(sample_rate=sr)], p=0.3)\n]\n```\n\nWe can also define a stochastic augmentation on multiple transformations. The following will apply both polarity inversion and white noise with a probability of 80%, a gain of 20%, and delay and reverb with a probability of 50%:\n```python\ntransforms = [\n    RandomResizedCrop(n_samples=num_samples),\n    RandomApply([PolarityInversion(), Noise(min_snr=0.001, max_snr=0.005)], p=0.8),\n    RandomApply([Gain()], p=0.2),\n    RandomApply([Delay(sample_rate=sr), Reverb(sample_rate=sr)], p=0.5)\n]\n```\n\nWe can return either one or many versions of the same audio example:\n```python\ntransform = Compose(transforms=transforms)\ntransformed_audio =  transform(audio)\n\u003e\u003e transformed_audio.shape = [num_channels, num_samples]\n```\n\n```\naudio = torchaudio.load(\"testing/classical.00002.wav\")\ntransform = ComposeMany(transforms=transforms, num_augmented_samples=4)\ntransformed_audio = transform(audio)\n\u003e\u003e transformed_audio.shape = [4, num_channels, num_samples]\n```\n\nSimilar to the `torchvision.datasets` interface, an instance of the `Compose` or `ComposeMany` class can be supplied to `torchaudio` dataloaders that accept `transform=`.\n\n\n## Optional\nInstall WavAugment for reverberation / pitch shifting:\n```\npip install git+https://github.com/facebookresearch/WavAugment\n```\n\n# Cite\nYou can cite this work with the following BibTeX:\n```\n@misc{spijkervet_torchaudio_augmentations,\n  doi = {10.5281/ZENODO.4748582},\n  url = {https://zenodo.org/record/4748582},\n  author = {Spijkervet,  Janne},\n  title = {Spijkervet/torchaudio-augmentations},\n  publisher = {Zenodo},\n  year = {2021},\n  copyright = {MIT License}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspijkervet%2Ftorchaudio-augmentations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspijkervet%2Ftorchaudio-augmentations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspijkervet%2Ftorchaudio-augmentations/lists"}