Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yoyololicon/spectrogram-inversion

spectrogram inversion tools in PyTorch. Documentation: https://spectrogram-inversion.readthedocs.io
https://github.com/yoyololicon/spectrogram-inversion

Last synced: 22 days ago
JSON representation

spectrogram inversion tools in PyTorch. Documentation: https://spectrogram-inversion.readthedocs.io

Host: GitHub
URL: https://github.com/yoyololicon/spectrogram-inversion
Owner: yoyololicon
Created: 2019-06-15T13:04:16.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-07-08T15:36:16.000Z (over 1 year ago)
Last Synced: 2024-10-03T12:20:55.884Z (about 1 month ago)
Language: Python
Homepage:
Size: 75.2 KB
Stars: 46
Watchers: 2
Forks: 7
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml

Awesome Lists containing this project

README

        # PyTorch Spectrogram Inversion Documentation

A major direction of Deep Learning in audio, especially generative models, is using features in frequency domain because

directly model raw time signal is hard.

But this require an extra process to convert the predicted spectrogram (magnitude-only in most situation) back to time domain.

To help researcher no need to care this post-precessing step, this package provide some useful and classic spectrogram

inversion algorithms. These algorithms are selected base on their performance and high parallelizability, and can even

be integrated in your model training process.

We hope this tool can serve as a standard, making fair comparison of different audio generation models.

## Installation

### PyPi

First [Install PyTorch](https://pytorch.org/get-started/locally/) with the desired cpu/gpu support and version >= 0.4.1.

Then install via pip

```

pip install torch_specinv

```

or

```

pip install git+https://github.com/yoyololicon/spectrogram-inversion

```

to get the latest version.

## Getting Started

The following example estimated the time signal given only the magnitude information of an audio file.

```python

import torch

import librosa

from torch_specinv import griffin_lim

from torch_specinv.metrics import spectral_convergence as SC

y, sr = librosa.load(librosa.util.example_audio_file())

y = torch.from_numpy(y)

windowsize = 2048

window = torch.hann_window(windowsize)

S = torch.stft(y, windowsize, window=window)

# discard phase information

mag = S.pow(2).sum(2).sqrt()

# move to gpu memory for faster computation

mag = mag.cuda()

yhat = griffin_lim(mag, maxiter=100, alpha=0.3, window=window)

# check convergence

mag_hat = torch.stft(yhat, windowsize, window=window).pow(2).sum(2).sqrt()

print(SC(mag_hat, mag))

```

Reconstruct from other spectral representation:

```python

from librosa.filters import mel

from torch_specinv import L_BFGS

filter_banks = torch.from_numpy(mel(sr, windowsize)).cuda()

def trsfn(x):

    S = torch.stft(x, windowsize, window=window).pow(2).sum(2).sqrt()

    mel_S = filter_banks @ S

    return torch.log1p(mel_S)

y = y.cuda()

mag = trsfn(y)

yhat = L_BFGS(mag, trsfn, len(y))       

```

## TODO

- [ ] Speed comparison on GPU. 

- [x] Documentation.

- [ ] Examples.