Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/descriptinc/cargan
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
https://github.com/descriptinc/cargan
audio autoregression gan vocoder
Last synced: about 1 month ago
JSON representation
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
- Host: GitHub
- URL: https://github.com/descriptinc/cargan
- Owner: descriptinc
- License: mit
- Created: 2021-09-21T15:49:34.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T21:15:39.000Z (almost 2 years ago)
- Last Synced: 2024-09-06T10:37:20.350Z (2 months ago)
- Topics: audio, autoregression, gan, vocoder
- Language: Python
- Homepage: https://maxrmorrison.com/sites/cargan
- Size: 90.2 MB
- Stars: 187
- Watchers: 22
- Forks: 29
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Chunked Autoregressive GAN (CARGAN)
[![PyPI](https://img.shields.io/pypi/v/cargan.svg)](https://pypi.python.org/pypi/cargan)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Downloads](https://pepy.tech/badge/cargan)](https://pepy.tech/project/cargan)Official implementation of the paper _Chunked Autoregressive GAN for Conditional Waveform Synthesis_ [[paper]](https://www.maxrmorrison.com/pdfs/morrison2022chunked.pdf) [[companion website]](https://www.maxrmorrison.com/sites/cargan/)
## Table of contents
- [Installation](#installation)
- [Configuration](#configuration)
- [Inference](#inference)
* [CLI](#cli)
* [API](#api)
* [`cargan.from_audio`](#carganfrom_audio)
* [`cargan.from_audio_file_to_file`](#carganfrom_audio_file_to_file)
* [`cargan.from_audio_files_to_files`](#carganfrom_audio_files_to_files)
* [`cargan.from_features`](#carganfrom_features)
* [`cargan.from_feature_file_to_file`](#carganfrom_feature_file_to_file)
* [`cargan.from_feature_files_to_files`](#carganfrom_feature_files_to_files)
- [Reproducing results](#reproducing-results)
* [Download](#download)
* [Partition](#partition)
* [Preprocess](#preprocess)
* [Train](#train)
* [Evaluate](#evaluate)
* [Objective](#objective)
* [Subjective](#subjective)
* [Receptive field](#receptive-field)
- [Running tests](#running-tests)
- [Citation](#citation)## Installation
`pip install cargan`
## Configuration
All configuration is performed in `cargan/constants.py`. The default configuration is
CARGAN. Additional configuration files for experiments described in our paper
can be found in `config/`.## Inference
### CLI
Infer from an audio files on disk. `audio_files` and `output_files` can be
lists of files to perform batch inference.```
python -m cargan \
--audio_files \
--output_files \
--checkpoint \
--gpu
```Infer from files of features on disk. `feature_files` and `output_files` can
be lists of files to perform batch inference.```
python -m cargan \
--feature_files \
--output_files \
--checkpoint \
--gpu
```### API
#### `cargan.from_audio`
```
"""Perform vocoding from audioArguments
audio : torch.Tensor(shape=(1, samples))
The audio to vocode
sample_rate : int
The audio sample rate
gpu : int or None
The index of the gpu to useReturns
vocoded : torch.Tensor(shape=(1, samples))
The vocoded audio
"""
```#### `cargan.from_audio_file_to_file`
```
"""Perform vocoding from audio file and save to fileArguments
audio_file : Path
The audio file to vocode
output_file : Path
The location to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
```#### `cargan.from_audio_files_to_files`
```
"""Perform vocoding from audio files and save to filesArguments
audio_files : list(Path)
The audio files to vocode
output_files : list(Path)
The locations to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
```#### `cargan.from_features`
```
"""Perform vocoding from featuresArguments
features : torch.Tensor(shape=(1, cargan.NUM_FEATURES, frames)
The features to vocode
gpu : int or None
The index of the gpu to useReturns
vocoded : torch.Tensor(shape=(1, cargan.HOPSIZE * frames))
The vocoded audio
"""
```#### `cargan.from_feature_file_to_file`
```
"""Perform vocoding from feature file and save to diskArguments
feature_file : Path
The feature file to vocode
output_file : Path
The location to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
```#### `cargan.from_feature_files_to_files`
```
"""Perform vocoding from feature files and save to diskArguments
feature_files : list(Path)
The feature files to vocode
output_files : list(Path)
The locations to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
```## Reproducing results
For the following subsections, the arguments are as follows
- `checkpoint` - Path to an existing checkpoint on disk
- `datasets` - A list of datasets to use. Supported datasets are
`vctk`, `daps`, `cumsum`, and `musdb`.
- `gpu` - The index of the gpu to use
- `gpus` - A list of indices of gpus to use for distributed data parallelism
(DDP)
- `name` - The name to give to an experiment or evaluation
- `num` - The number of samples to evaluate### Download
Downloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.
Stores formatted datasets in `data/cache/`.```
python -m cargan.data.download --datasets
````vctk` must be downloaded before `cumsum`.
### Preprocess
Prepares features for training. Features are stored in `data/cache/`.
```
python -m cargan.preprocess --datasets --gpu
```Running this step is not required for the `cumsum` experiment.
### Partition
Partitions a dataset into training, validation, and testing partitions. You
should not need to run this, as the partitions used in our work are provided
for each dataset in `cargan/assets/partitions/`.```
python -m cargan.partition --datasets
```The optional `--overwrite` flag forces the existing partition to be overwritten.
### Train
Trains a model. Checkpoints and logs are stored in `runs/`.
```
python -m cargan.train \
--name \
--datasets \
--gpus
```You can optionally specify a `--checkpoint` option pointing to the directory
of a previous run. The most recent checkpoint will automatically be loaded
and training will resume from that checkpoint. You can overwrite a previous
training by passing the `--overwrite` flag.You can monitor training via `tensorboard` as follows.
```
tensorboard --logdir runs/ --port
```### Evaluate
#### Objective
Reports the pitch RMSE (in cents), periodicity RMSE, and voiced/unvoiced F1
score. Results are both printed and stored in `eval/objective/`.```
python -m cargan.evaluate.objective \
--name \
--datasets \
--checkpoint \
--num \
--gpu
```#### Subjective
Generates samples for subjective evaluation. Also performs benchmarking
of inference speed. Results are stored in `eval/subjective/`.```
python -m cargan.evaluate.subjective \
--name \
--datasets \
--checkpoint \
--num \
--gpu
```#### Receptive field
Get the size of the (non-causal) receptive field of the generator.
`cargan.AUTOREGRESSIVE` must be `False` to use this.```
python -m cargan.evaluate.receptive_field
```## Running tests
```
pip install pytest
pytest
```## Citation
### IEEE
M. Morrison, R. Kumar, K. Kumar, P. Seetharaman, A. Courville, and Y. Bengio, "Chunked Autoregressive GAN for Conditional Waveform Synthesis," Submitted to ICLR 2022, April 2022.### BibTex
```
@inproceedings{morrison2022chunked,
title={Chunked Autoregressive GAN for Conditional Waveform Synthesis},
author={Morrison, Max and Kumar, Rithesh and Kumar, Kundan and Seetharaman, Prem and Courville, Aaron and Bengio, Yoshua},
booktitle={Submitted to ICLR 2022},
month={April},
year={2022}
}
```