Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yoyololicon/spectrogram-inversion
spectrogram inversion tools in PyTorch. Documentation: https://spectrogram-inversion.readthedocs.io
https://github.com/yoyololicon/spectrogram-inversion
Last synced: 22 days ago
JSON representation
spectrogram inversion tools in PyTorch. Documentation: https://spectrogram-inversion.readthedocs.io
- Host: GitHub
- URL: https://github.com/yoyololicon/spectrogram-inversion
- Owner: yoyololicon
- Created: 2019-06-15T13:04:16.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-07-08T15:36:16.000Z (over 1 year ago)
- Last Synced: 2024-10-03T12:20:55.884Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 75.2 KB
- Stars: 46
- Watchers: 2
- Forks: 7
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# PyTorch Spectrogram Inversion Documentation
A major direction of Deep Learning in audio, especially generative models, is using features in frequency domain because
directly model raw time signal is hard.
But this require an extra process to convert the predicted spectrogram (magnitude-only in most situation) back to time domain.To help researcher no need to care this post-precessing step, this package provide some useful and classic spectrogram
inversion algorithms. These algorithms are selected base on their performance and high parallelizability, and can even
be integrated in your model training process.We hope this tool can serve as a standard, making fair comparison of different audio generation models.
## Installation
### PyPi
First [Install PyTorch](https://pytorch.org/get-started/locally/) with the desired cpu/gpu support and version >= 0.4.1.
Then install via pip
```
pip install torch_specinv
```
or
```
pip install git+https://github.com/yoyololicon/spectrogram-inversion
```
to get the latest version.## Getting Started
The following example estimated the time signal given only the magnitude information of an audio file.```python
import torch
import librosa
from torch_specinv import griffin_lim
from torch_specinv.metrics import spectral_convergence as SCy, sr = librosa.load(librosa.util.example_audio_file())
y = torch.from_numpy(y)
windowsize = 2048
window = torch.hann_window(windowsize)
S = torch.stft(y, windowsize, window=window)# discard phase information
mag = S.pow(2).sum(2).sqrt()# move to gpu memory for faster computation
mag = mag.cuda()yhat = griffin_lim(mag, maxiter=100, alpha=0.3, window=window)
# check convergence
mag_hat = torch.stft(yhat, windowsize, window=window).pow(2).sum(2).sqrt()
print(SC(mag_hat, mag))
```Reconstruct from other spectral representation:
```python
from librosa.filters import mel
from torch_specinv import L_BFGSfilter_banks = torch.from_numpy(mel(sr, windowsize)).cuda()
def trsfn(x):
S = torch.stft(x, windowsize, window=window).pow(2).sum(2).sqrt()
mel_S = filter_banks @ S
return torch.log1p(mel_S)y = y.cuda()
mag = trsfn(y)
yhat = L_BFGS(mag, trsfn, len(y))
```## TODO
- [ ] Speed comparison on GPU.
- [x] Documentation.
- [ ] Examples.