Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yoyololicon/pytorch_fftnet

A pytorch implementation of FFTNet.
https://github.com/yoyololicon/pytorch_fftnet

cnn fftnet vocoder

Last synced: 22 days ago
JSON representation

A pytorch implementation of FFTNet.

Host: GitHub
URL: https://github.com/yoyololicon/pytorch_fftnet
Owner: yoyololicon
Created: 2018-07-30T07:51:06.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2018-08-31T07:38:28.000Z (about 6 years ago)
Last Synced: 2024-10-03T12:37:48.918Z (about 1 month ago)
Topics: cnn, fftnet, vocoder
Language: Python
Homepage:
Size: 547 KB
Stars: 36
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

This is a pytorch implementation of FFTNet described [here](http://gfx.cs.princeton.edu/pubs/Jin_2018_FAR/).
Work in progress.

## Quick Start

1. Install requirements
```
pip install -r requirements.txt
```

2. Download [CMU_ARCTIC](http://festvox.org/cmu_arctic/) dataset.

3. Train the model and save. The default parameters are pretty much the same as int the original paper.
Raise the flag _--preprocess_ when execute the first time.

```
python train.py \
--preprocess \
--wav_dir your_downloaded_wav_dir \
--data_dir preprocessed_feature_dir \
--model_file saved_model_name \
```

4. Use trained model to decode/reconstruct a wav file from the mcc feature.

```
python decode.py \
--infile wav_file
--outfile reconstruct_file_name
--data_dir preprocessed_feature_dir \
--model_file saved_model_name \
```

[FFTNet_generator](FFTNet_generator.py) and [FFTNet_vocoder](FFTNet_vocoder.py) are two files I used to test the model
workability using torchaudio yesno dataset.

## Current result

There are some files decoded in the [samples](samples) folder.

## Differences from paper

* window size: 400 >> depend on minimum_f0 (cuz I use pyworld to get f0 and mcc coefficients)

## TODO

- [x] Zero padding.
- [x] Injected noise.
- [x] Voiced/unvoiced conditional sampling.
- [x] Post-synthesis denoising.

## Notes

* I combine two 1x1 convolution kernel to one 1x2 dilated kernel.
This can remove redundant bias parameters and accelerate total speed.
* The author said in the middle layers the channels size are 128 not 256.
* My model will get stuck at the begining (loss aroung 4.x) for thousands of step, then go down very fast to 2.6 ~ 3.0.
Use smaller learning rate can help a little bit.

## Variations of FFTNet

### Radix-N FFTNet

Use the flag _--radixs_ to specify each layer's radix.

```
# a radix-4 FFTNet with 1024 receptive field
python train.py --radixs 4 4 4 4 4
```

The original FFtNet use Radix-2 structure. In my experiment, a radix-4 network can still achieved similar result,
even radix-8, and by reduce the number of layers, it can run faster.

### Transposed FFTNet

Fig. 2 in the paper can be redraw as dilated structure with kernel size 2 (also means radix size 2).

![](images/fftnet_dilated.png)

If we draw all the lines;

![](images/fftnet_dilated2.png)

and transpose the the graph to let the arrows go backward, you'll find a WaveNet dilated structure.

![](images/fftnet_wavenet.png)

Add the flag __--transpose__, you can get a simplified version of WaveNet.
```
# a WaveNet-like structure model withou gated/residual/skip unit.
python train.py --transpose
```
In my experiment, the transposed models are more easy to train and have slightly lower training loss compare to FFTNet.