{"id":21428308,"url":"https://github.com/mlverse/torchaudio","last_synced_at":"2025-07-14T10:31:52.885Z","repository":{"id":40258243,"uuid":"295303992","full_name":"mlverse/torchaudio","owner":"mlverse","description":"R interface to torchaudio","archived":false,"fork":false,"pushed_at":"2023-04-27T16:29:08.000Z","size":12429,"stargazers_count":26,"open_issues_count":10,"forks_count":6,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-07T13:40:40.811Z","etag":null,"topics":["deep-learning","r","torch"],"latest_commit_sha":null,"homepage":"https://mlverse.github.io/torchaudio/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlverse.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-14T04:31:14.000Z","updated_at":"2024-10-20T17:07:24.000Z","dependencies_parsed_at":"2023-02-19T06:31:11.352Z","dependency_job_id":null,"html_url":"https://github.com/mlverse/torchaudio","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/mlverse/torchaudio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Ftorchaudio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Ftorchaudio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Ftorchaudio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Ftorchaudio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlverse","download_url":"https://codeload.github.com/mlverse/torchaudio/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlverse%2Ftorchaudio/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265280701,"owners_count":23739853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","r","torch"],"created_at":"2024-11-22T22:12:39.186Z","updated_at":"2025-07-14T10:31:47.872Z","avatar_url":"https://github.com/mlverse.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n# torchaudio \u003ca href='https://mlverse.github.io/torchaudio/'\u003e\u003cimg src=\"man/figures/torchaudio.png\" align=\"right\" height=\"139\"/\u003e\u003c/a\u003e\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n\u003c!-- badges: start --\u003e\n\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) [![R build status](https://github.com/mlverse/torchaudio/workflows/R-CMD-check/badge.svg)](https://github.com/mlverse/torchaudio/actions) [![CRAN status](https://www.r-pkg.org/badges/version/torchaudio)](https://CRAN.R-project.org/package=torchaudio) [![](https://cranlogs.r-pkg.org/badges/torchaudio)](https://cran.r-project.org/package=torchaudio)\n\n\u003c!-- badges: end --\u003e\n\n`torchaudio` is an extension for [`torch`](https://github.com/mlverse/torch) providing audio loading, transformations, common architectures for signal processing, pre-trained weights and access to commonly used datasets. The package is a port to R of [PyTorch's TorchAudio](https://pytorch.org/audio/stable/index.html).\n\n`torchaudio` was originally developed by [Athos Damiani](https://github.com/Athospd) as part of [Curso-R](https://github.com/curso-r) work. Development will continue under the roof of the *mlverse* organization, together with `torch` itself, [`torchvision`](https://github.com/mlverse/torchvision), [`luz`](https://github.com/mlverse/luz), and a number of extensions building on `torch`.\n\n## Installation\n\nThe CRAN release can be installed with:\n\n```{r, eval = FALSE}\ninstall.packages(\"torchaudio\")\n```\n\nYou can install the development version from GitHub with:\n\n```{r, eval = FALSE}\nremotes::install_github(\"mlverse/torchaudio\")\n```\n\n## A basic workflow\n\n`torchaudio` supports a variety of workflows -- such as training a neural network on a speech dataset, say -- but to get started, let's do something more basic: load a sound file, extract some information about it, convert it to something `torchaudio` can work with (a tensor), and display a spectrogram.\n\nHere is an example sound:\n\n```{r}\nlibrary(torchaudio)\nurl \u003c- \"https://pytorch.org/tutorials/_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav\"\nsoundfile \u003c- tempfile(fileext = \".wav\")\nr \u003c- httr::GET(url, httr::write_disk(soundfile, overwrite = TRUE))\n```\n\nUsing `torchaudio_info()`, we obtain number of channels, number of samples, and the sampling rate:\n\n\n```{r}\ninfo \u003c- torchaudio_info(soundfile)\ncat(\"Number of channels: \", info$num_channels, \"\\n\")\ncat(\"Number of samples: \", info$num_frames, \"\\n\")\ncat(\"Sampling rate: \", info$sample_rate, \"\\n\")\n```\n\nTo read in the file, we call `torchaudio_load()`. `torchaudio_load()` itself delegates to the default (alternatively, the user-requested) backend.\n\nThe default backend is [`av`](https://docs.ropensci.org/av/), a fast and light-weight wrapper for [Ffmpeg](https://ffmpeg.org/). As of this writing, an alternative is `tuneR`; it may be requested via the option `torchaudio.loader`. (Note though that with `tuneR`, only `wav` and `mp3` file extensions are supported.)\n\n```{r}\nwav \u003c- torchaudio_load(soundfile)\ndim(wav)\n```\nFor `torchaudio` to be able to process the sound object, we need to convert it to a tensor. This is achieved by means of a call to `transform_to_tensor()`, resulting in a list of two tensors: one containing the actual amplitude values, the other, the sampling rate.\n\n```{r, fig.height=3, fig.width=8}\nwaveform_and_sample_rate \u003c- transform_to_tensor(wav)\nwaveform \u003c- waveform_and_sample_rate[[1]]\nsample_rate \u003c- waveform_and_sample_rate[[2]]\n\npaste(\"Shape of waveform: \", paste(dim(waveform), collapse = \" \"))\npaste(\"Sample rate of waveform: \", sample_rate)\n\nplot(waveform[1], col = \"royalblue\", type = \"l\")\nlines(waveform[2], col = \"orange\")\n```\n\nFinally, let's create a spectrogam!\n\n```{r, fig.height=3, fig.width=8}\nspecgram \u003c- transform_spectrogram()(waveform)\n\npaste(\"Shape of spectrogram: \", paste(dim(specgram), collapse = \" \"))\n\nspecgram_as_array \u003c- as.array(specgram$log2()[1]$t())\nimage(specgram_as_array[,ncol(specgram_as_array):1], col = viridis::viridis(n = 257,  option = \"magma\"))\n```\n\n## Development status\n\n### Datasets ([go to issue](https://github.com/mlverse/torchaudio/issues/17))\n\n-   [x] CMUARCTIC\n-   [ ] COMMONVOICE\n-   [ ] GTZAN\n-   [ ] LIBRISPEECH\n-   [ ] LIBRITTS\n-   [ ] LJSPEECH\n-   [x] SPEECHCOMMANDS\n-   [ ] TEDLIUM\n-   [ ] VCTK\n-   [ ] VCTK_092\n-   [x] YESNO\n\n### Models ([go to issue](https://github.com/mlverse/torchaudio/issues/19))\n\n-   [ ] ConvTasNet\n-   [ ] Wav2Letter\n-   [x] WaveRNN\n\n## I/O Backends\n\n-   [x] {av} (default)\n-   [x] {tuneR}\n\n## Code of Conduct\n\nPlease note that the `torchaudio` project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlverse%2Ftorchaudio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlverse%2Ftorchaudio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlverse%2Ftorchaudio/lists"}