https://github.com/dmitryulyanov/neural-style-audio-torch

Torch implementation for audio neural style.
https://github.com/dmitryulyanov/neural-style-audio-torch

audio neural-style style-transfer torch

Last synced: 5 months ago
JSON representation

Torch implementation for audio neural style.

Host: GitHub
URL: https://github.com/dmitryulyanov/neural-style-audio-torch
Owner: DmitryUlyanov
Created: 2016-12-13T22:57:57.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2017-02-08T21:01:36.000Z (over 8 years ago)
Last Synced: 2025-04-05T08:05:33.299Z (6 months ago)
Topics: audio, neural-style, style-transfer, torch
Language: Lua
Size: 1.39 MB
Stars: 140
Watchers: 12
Forks: 24
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Audio texture synthesis (and a bit of stylization)

This is an extension of [texture synthesis](https://arxiv.org/abs/1505.07376) and [style transfer](https://arxiv.org/abs/1508.06576) method of Leon Gatys et al. based on [Justin Johnson's code](https://github.com/jcjohnson/neural-style) for neural style transfer.

To listen to examples go to the [blog post](http://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/). Almost identical Lasagne implementation by [Vadim Lebedev](http://sites.skoltech.ru/compvision/members/vadim-lebedev/) can be found [here](https://github.com/vadim-v-lebedev/audio_style_tranfer). Also check out [TensorFlow implementation](https://github.com/DmitryUlyanov/neural-style-audio-tf).

### Examples
As there is no way to embed an audio player with github markdown please follow [this link](http://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/) for the examples of texture synthesis and style transfer.

### Prerequisites
- [Torch7](http://torch.ch/docs/getting-started.html#_)
- With [npy4th](https://github.com/htwaijry/npy4th), [cudnn.torch](https://github.com/soumith/cudnn.torch)
- Python (tested with 2.7)
- numpy, [librosa](https://github.com/librosa/librosa), [torchfile](https://github.com/bshillingford/python-torchfile)
- Download pretrained Network and mean file

```
wget https://www.dropbox.com/s/xpyoehayuhxvibq/net.t7?dl=1 -O data/net.t7
wget https://www.dropbox.com/s/dwsq33r5bsgy9cd/mean.t7?dl=1 -O data/mean.t7
```

### Usage
##### 1. First convert raw audio to spectrogram

```
python get_spectrogram.py --in_audio --out_npy
```

Additional arguments:
- `-offset` and `-duration` control boundaries of a segment which will be converted to spectrogram. [By default first 10 seconds are used].
- `-sr`: sample rate (samples), [default = 44100].
- `-n_fft`: FFT window size (samples) [default = 1024].

##### 2. Run texture synthesis or style transfer

This is the cmd used for all the texture synthesis examples:
```
th neural_style_audio.lua -content -style -content_layers -style_layers -style_weight -content_weight
```
Parameters:
- `-content`, `-style` are paths to `.npy` spectrogram files generated previously.
- `-content_layers`, `-style_layers`: comma separated layer indices (the provided net has about 50 layers and of encoder-decoder type).
- `-style_weight`, `-content_weight`: set `content_weight` to zero for texture synthesis.
- `-lowres`: the higher receptive field the larger textures network captures. The easiest way to increase a receptive field is to drop every other column from an input spectrogram (kind of downscaling). Use this flag to process spectrogram both in original and low resolution at the same time.
- `-how_div`, `-normalize_gradients`, `-loss` change some details of loss calculation.
- `-save` indicates folder to save intermediate results.

##### 3. Convert spectrogram back to audio file
```
python invert_spectrogram.py --spectrogram_t7 --out_audio keyboard2_texture.wav
```
Parameters:

### Command-line used for texture examples

```
python get_spectrogram.py --out_npy data/inputs/keyboard2.npy --in_audio data/inputs/keyboard2.mp3
th neural_style_audio.lua -style data/inputs/keyboard2.npy -content_layers '' -style_layers 1,5,10,15,20,25 -style_weight 10000000 -optimizer lbfgs -learning_rate 1e-1 -num_iterations 5000 -lowres
python invert_spectrogram.py --spectrogram_t7 data/out/out.png.t7 --out_audio data/outputs/keyboard2_texture.wav
```

### Command-line used for style transfer

We have also implemented minimalistic script identical to TensorFlow and Lasagne scripts. Here is example how to use it:

```
python get_spectrogram.py --out_npy data/inputs/usa.npy --in_audio data/inputs/usa.mp3 --n_fft 2048 --sr 22050
python get_spectrogram.py --out_npy data/inputs/imperial.npy --in_audio data/inputs/imperial.mp3 --n_fft 2048 --sr 22050
th neural_style_audio_random.lua -alpha 1e-2
python invert_spectrogram.py --spectrogram_t7 data/out/out.t7 --out_audio out.wav --n_iter 500 --n_fft 2048 --sr 22050
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmitryulyanov/neural-style-audio-torch

Awesome Lists containing this project

README