Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Deepest-Project/MelNet

Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"
https://github.com/Deepest-Project/MelNet

generative-model pytorch tts

Last synced: 2 months ago
JSON representation

Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"

Host: GitHub
URL: https://github.com/Deepest-Project/MelNet
Owner: Deepest-Project
License: mit
Created: 2019-08-17T05:38:54.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-07-25T10:55:31.000Z (6 months ago)
Last Synced: 2024-08-02T15:37:45.077Z (6 months ago)
Topics: generative-model, pytorch, tts
Language: Python
Homepage:
Size: 169 KB
Stars: 206
Watchers: 23
Forks: 38
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# MelNet

Implementation of [MelNet: A Generative Model for Audio in the Frequency Domain]()

## Prerequisites

- Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
- `pip install -r requirements.txt`

## How to train

### Datasets

- Blizzard, VoxCeleb2, and KSS have YAML files provided under `config/`. For other datasets, fill out your own YAML file according to the other provided ones.
- Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by `data.extension` within the YAML file.
- Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

### Running the code

- `python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]`
- Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
- Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
- The `-s` flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when `[tier number] != 0` . Warning: this flag is toggled `True` no matter what follows the flag. Ignore it if you're not planning to use it.

## How to sample

### Preparing the checkpoints

- The checkpoints must be stored under `chkpt/`.
- A YAML file named `inference.yaml` must be provided under `config/`.
- `inference.yaml` must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

### Running the code

- `python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]`
- Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly `[sample rate] : [hop length of FFT]`.
- The `-i` flag is optional, only needed for conditional generation. Surround the sentence with `""` and end with `.`.
- Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

## To-do

- [x] Implement upsampling procedure
- [x] GMM sampling + loss function
- [x] Unconditional audio generation
- [x] TTS synthesis
- [x] Tensorboard logging
- [x] Multi-GPU training
- [ ] Primed generation

## Implementation authors

- [Seungwon Park](), [June Young Yi](), [Yoonhyung Lee](), [Joowhan Song]() @ Deepest Season 6

## License

MIT License