https://github.com/mlverse/torchaudio

R interface to torchaudio
https://github.com/mlverse/torchaudio

deep-learning r torch

Last synced: 12 months ago
JSON representation

R interface to torchaudio

Host: GitHub
URL: https://github.com/mlverse/torchaudio
Owner: mlverse
License: other
Created: 2020-09-14T04:31:14.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2023-04-27T16:29:08.000Z (about 3 years ago)
Last Synced: 2025-07-07T13:40:40.811Z (12 months ago)
Topics: deep-learning, r, torch
Language: R
Homepage: https://mlverse.github.io/torchaudio/
Size: 11.9 MB
Stars: 26
Watchers: 4
Forks: 6
Open Issues: 10
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ---

output: github_document

---

# torchaudio 

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) [![R build status](https://github.com/mlverse/torchaudio/workflows/R-CMD-check/badge.svg)](https://github.com/mlverse/torchaudio/actions) [![CRAN status](https://www.r-pkg.org/badges/version/torchaudio)](https://CRAN.R-project.org/package=torchaudio) [![](https://cranlogs.r-pkg.org/badges/torchaudio)](https://cran.r-project.org/package=torchaudio)

`torchaudio` is an extension for [`torch`](https://github.com/mlverse/torch) providing audio loading, transformations, common architectures for signal processing, pre-trained weights and access to commonly used datasets. The package is a port to R of [PyTorch's TorchAudio](https://pytorch.org/audio/stable/index.html).

`torchaudio` was originally developed by [Athos Damiani](https://github.com/Athospd) as part of [Curso-R](https://github.com/curso-r) work. Development will continue under the roof of the *mlverse* organization, together with `torch` itself, [`torchvision`](https://github.com/mlverse/torchvision), [`luz`](https://github.com/mlverse/luz), and a number of extensions building on `torch`.

## Installation

The CRAN release can be installed with:

```{r, eval = FALSE}

install.packages("torchaudio")

```

You can install the development version from GitHub with:

```{r, eval = FALSE}

remotes::install_github("mlverse/torchaudio")

```

## A basic workflow

`torchaudio` supports a variety of workflows -- such as training a neural network on a speech dataset, say -- but to get started, let's do something more basic: load a sound file, extract some information about it, convert it to something `torchaudio` can work with (a tensor), and display a spectrogram.

Here is an example sound:

```{r}

library(torchaudio)

url <- "https://pytorch.org/tutorials/_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav"

soundfile <- tempfile(fileext = ".wav")

r <- httr::GET(url, httr::write_disk(soundfile, overwrite = TRUE))

```

Using `torchaudio_info()`, we obtain number of channels, number of samples, and the sampling rate:

```{r}

info <- torchaudio_info(soundfile)

cat("Number of channels: ", info$num_channels, "\n")

cat("Number of samples: ", info$num_frames, "\n")

cat("Sampling rate: ", info$sample_rate, "\n")

```

To read in the file, we call `torchaudio_load()`. `torchaudio_load()` itself delegates to the default (alternatively, the user-requested) backend.

The default backend is [`av`](https://docs.ropensci.org/av/), a fast and light-weight wrapper for [Ffmpeg](https://ffmpeg.org/). As of this writing, an alternative is `tuneR`; it may be requested via the option `torchaudio.loader`. (Note though that with `tuneR`, only `wav` and `mp3` file extensions are supported.)

```{r}

wav <- torchaudio_load(soundfile)

dim(wav)

```

For `torchaudio` to be able to process the sound object, we need to convert it to a tensor. This is achieved by means of a call to `transform_to_tensor()`, resulting in a list of two tensors: one containing the actual amplitude values, the other, the sampling rate.

```{r, fig.height=3, fig.width=8}

waveform_and_sample_rate <- transform_to_tensor(wav)

waveform <- waveform_and_sample_rate[[1]]

sample_rate <- waveform_and_sample_rate[[2]]

paste("Shape of waveform: ", paste(dim(waveform), collapse = " "))

paste("Sample rate of waveform: ", sample_rate)

plot(waveform[1], col = "royalblue", type = "l")

lines(waveform[2], col = "orange")

```

Finally, let's create a spectrogam!

```{r, fig.height=3, fig.width=8}

specgram <- transform_spectrogram()(waveform)

paste("Shape of spectrogram: ", paste(dim(specgram), collapse = " "))

specgram_as_array <- as.array(specgram$log2()[1]$t())

image(specgram_as_array[,ncol(specgram_as_array):1], col = viridis::viridis(n = 257,  option = "magma"))

```

## Development status

### Datasets ([go to issue](https://github.com/mlverse/torchaudio/issues/17))

-   [x] CMUARCTIC

-   [ ] COMMONVOICE

-   [ ] GTZAN

-   [ ] LIBRISPEECH

-   [ ] LIBRITTS

-   [ ] LJSPEECH

-   [x] SPEECHCOMMANDS

-   [ ] TEDLIUM

-   [ ] VCTK

-   [ ] VCTK_092

-   [x] YESNO

### Models ([go to issue](https://github.com/mlverse/torchaudio/issues/19))

-   [ ] ConvTasNet

-   [ ] Wav2Letter

-   [x] WaveRNN

## I/O Backends

-   [x] {av} (default)

-   [x] {tuneR}

## Code of Conduct

Please note that the `torchaudio` project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mlverse/torchaudio

Awesome Lists containing this project

README