{"id":15553583,"url":"https://github.com/descriptinc/cargan","last_synced_at":"2025-04-06T22:10:46.454Z","repository":{"id":43055962,"uuid":"408881533","full_name":"descriptinc/cargan","owner":"descriptinc","description":"Official repository for the paper \"Chunked Autoregressive GAN for Conditional Waveform Synthesis\"","archived":false,"fork":false,"pushed_at":"2022-12-08T21:15:39.000Z","size":94631,"stargazers_count":187,"open_issues_count":4,"forks_count":30,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-03-30T19:11:04.989Z","etag":null,"topics":["audio","autoregression","gan","vocoder"],"latest_commit_sha":null,"homepage":"https://maxrmorrison.com/sites/cargan","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/descriptinc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-21T15:49:34.000Z","updated_at":"2025-02-21T08:55:50.000Z","dependencies_parsed_at":"2023-01-25T18:15:56.554Z","dependency_job_id":null,"html_url":"https://github.com/descriptinc/cargan","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/descriptinc%2Fcargan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/descriptinc%2Fcargan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/descriptinc%2Fcargan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/descriptinc%2Fcargan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/descriptinc","download_url":"https://codeload.github.com/descriptinc/cargan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247557767,"owners_count":20958047,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","autoregression","gan","vocoder"],"created_at":"2024-10-02T14:39:19.406Z","updated_at":"2025-04-06T22:10:46.425Z","avatar_url":"https://github.com/descriptinc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chunked Autoregressive GAN (CARGAN)\n[![PyPI](https://img.shields.io/pypi/v/cargan.svg)](https://pypi.python.org/pypi/cargan)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Downloads](https://pepy.tech/badge/cargan)](https://pepy.tech/project/cargan)\n\nOfficial implementation of the paper _Chunked Autoregressive GAN for Conditional Waveform Synthesis_ [[paper]](https://www.maxrmorrison.com/pdfs/morrison2022chunked.pdf) [[companion website]](https://www.maxrmorrison.com/sites/cargan/)\n\n\n## Table of contents\n\n- [Installation](#installation)\n- [Configuration](#configuration)\n- [Inference](#inference)\n    * [CLI](#cli)\n    * [API](#api)\n        * [`cargan.from_audio`](#carganfrom_audio)\n        * [`cargan.from_audio_file_to_file`](#carganfrom_audio_file_to_file)\n        * [`cargan.from_audio_files_to_files`](#carganfrom_audio_files_to_files)\n        * [`cargan.from_features`](#carganfrom_features)\n        * [`cargan.from_feature_file_to_file`](#carganfrom_feature_file_to_file)\n        * [`cargan.from_feature_files_to_files`](#carganfrom_feature_files_to_files)\n- [Reproducing results](#reproducing-results)\n    * [Download](#download)\n    * [Partition](#partition)\n    * [Preprocess](#preprocess)\n    * [Train](#train)\n    * [Evaluate](#evaluate)\n        * [Objective](#objective)\n        * [Subjective](#subjective)\n        * [Receptive field](#receptive-field)\n- [Running tests](#running-tests)\n- [Citation](#citation)\n\n\n## Installation\n\n`pip install cargan`\n\n\n## Configuration\n\nAll configuration is performed in `cargan/constants.py`. The default configuration is\nCARGAN. Additional configuration files for experiments described in our paper\ncan be found in `config/`.\n\n\n## Inference\n\n### CLI\n\nInfer from an audio files on disk. `audio_files` and `output_files` can be\nlists of files to perform batch inference.\n\n```\npython -m cargan \\\n    --audio_files \u003caudio_files\u003e \\\n    --output_files \u003coutput_files\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\nInfer from files of features on disk. `feature_files` and `output_files` can\nbe lists of files to perform batch inference.\n\n```\npython -m cargan \\\n    --feature_files \u003cfeature_files\u003e \\\n    --output_files \u003coutput_files\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\n\n### API\n\n#### `cargan.from_audio`\n\n```\n\"\"\"Perform vocoding from audio\n\nArguments\n    audio : torch.Tensor(shape=(1, samples))\n        The audio to vocode\n    sample_rate : int\n        The audio sample rate\n    gpu : int or None\n        The index of the gpu to use\n\nReturns\n    vocoded : torch.Tensor(shape=(1, samples))\n        The vocoded audio\n\"\"\"\n```\n\n#### `cargan.from_audio_file_to_file`\n\n```\n\"\"\"Perform vocoding from audio file and save to file\n\nArguments\n    audio_file : Path\n        The audio file to vocode\n    output_file : Path\n        The location to save the vocoded audio\n    checkpoint : Path\n        The generator checkpoint\n    gpu : int or None\n        The index of the gpu to use\n\"\"\"\n```\n\n\n#### `cargan.from_audio_files_to_files`\n\n```\n\"\"\"Perform vocoding from audio files and save to files\n\nArguments\n    audio_files : list(Path)\n        The audio files to vocode\n    output_files : list(Path)\n        The locations to save the vocoded audio\n    checkpoint : Path\n        The generator checkpoint\n    gpu : int or None\n        The index of the gpu to use\n\"\"\"\n```\n\n\n#### `cargan.from_features`\n\n```\n\"\"\"Perform vocoding from features\n\nArguments\n    features : torch.Tensor(shape=(1, cargan.NUM_FEATURES, frames)\n        The features to vocode\n    gpu : int or None\n        The index of the gpu to use\n\nReturns\n    vocoded : torch.Tensor(shape=(1, cargan.HOPSIZE * frames))\n        The vocoded audio\n\"\"\"\n```\n\n\n#### `cargan.from_feature_file_to_file`\n\n```\n\"\"\"Perform vocoding from feature file and save to disk\n\nArguments\n    feature_file : Path\n        The feature file to vocode\n    output_file : Path\n        The location to save the vocoded audio\n    checkpoint : Path\n        The generator checkpoint\n    gpu : int or None\n        The index of the gpu to use\n\"\"\"\n```\n\n\n#### `cargan.from_feature_files_to_files`\n\n```\n\"\"\"Perform vocoding from feature files and save to disk\n\nArguments\n    feature_files : list(Path)\n        The feature files to vocode\n    output_files : list(Path)\n        The locations to save the vocoded audio\n    checkpoint : Path\n        The generator checkpoint\n    gpu : int or None\n        The index of the gpu to use\n\"\"\"\n```\n\n\n## Reproducing results\n\nFor the following subsections, the arguments are as follows\n- `checkpoint` - Path to an existing checkpoint on disk\n- `datasets` - A list of datasets to use. Supported datasets are\n  `vctk`, `daps`, `cumsum`, and `musdb`.\n- `gpu` - The index of the gpu to use\n- `gpus` - A list of indices of gpus to use for distributed data parallelism\n  (DDP)\n- `name` - The name to give to an experiment or evaluation\n- `num` - The number of samples to evaluate\n\n\n### Download\n\nDownloads, unzips, and formats datasets. Stores datasets in `data/datasets/`.\nStores formatted datasets in `data/cache/`.\n\n```\npython -m cargan.data.download --datasets \u003cdatasets\u003e\n```\n\n`vctk` must be downloaded before `cumsum`.\n\n\n### Preprocess\n\nPrepares features for training. Features are stored in `data/cache/`.\n\n```\npython -m cargan.preprocess --datasets \u003cdatasets\u003e --gpu \u003cgpu\u003e\n```\n\nRunning this step is not required for the `cumsum` experiment.\n\n\n### Partition\n\nPartitions a dataset into training, validation, and testing partitions. You\nshould not need to run this, as the partitions used in our work are provided\nfor each dataset in `cargan/assets/partitions/`.\n\n```\npython -m cargan.partition --datasets \u003cdatasets\u003e\n```\n\nThe optional `--overwrite` flag forces the existing partition to be overwritten.\n\n\n### Train\n\nTrains a model. Checkpoints and logs are stored in `runs/`.\n\n```\npython -m cargan.train \\\n    --name \u003cname\u003e \\\n    --datasets \u003cdatasets\u003e \\\n    --gpus \u003cgpus\u003e\n```\n\nYou can optionally specify a `--checkpoint` option pointing to the directory\nof a previous run. The most recent checkpoint will automatically be loaded\nand training will resume from that checkpoint. You can overwrite a previous\ntraining by passing the `--overwrite` flag.\n\nYou can monitor training via `tensorboard` as follows.\n\n```\ntensorboard --logdir runs/ --port \u003cport\u003e\n```\n\n\n### Evaluate\n\n#### Objective\n\nReports the pitch RMSE (in cents), periodicity RMSE, and voiced/unvoiced F1\nscore. Results are both printed and stored in `eval/objective/`.\n\n```\npython -m cargan.evaluate.objective \\\n    --name \u003cname\u003e \\\n    --datasets \u003cdatasets\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --num \u003cnum\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\n\n#### Subjective\n\nGenerates samples for subjective evaluation. Also performs benchmarking\nof inference speed. Results are stored in `eval/subjective/`.\n\n```\npython -m cargan.evaluate.subjective \\\n    --name \u003cname\u003e \\\n    --datasets \u003cdatasets\u003e \\\n    --checkpoint \u003ccheckpoint\u003e \\\n    --num \u003cnum\u003e \\\n    --gpu \u003cgpu\u003e\n```\n\n\n#### Receptive field\n\nGet the size of the (non-causal) receptive field of the generator.\n`cargan.AUTOREGRESSIVE` must be `False` to use this.\n\n```\npython -m cargan.evaluate.receptive_field\n```\n\n\n## Running tests\n\n```\npip install pytest\npytest\n```\n\n\n## Citation\n\n### IEEE\nM. Morrison, R. Kumar, K. Kumar, P. Seetharaman, A. Courville, and Y. Bengio, \"Chunked Autoregressive GAN for Conditional Waveform Synthesis,\" Submitted to ICLR 2022, April 2022.\n\n\n### BibTex\n\n```\n@inproceedings{morrison2022chunked,\n    title={Chunked Autoregressive GAN for Conditional Waveform Synthesis},\n    author={Morrison, Max and Kumar, Rithesh and Kumar, Kundan and Seetharaman, Prem and Courville, Aaron and Bengio, Yoshua},\n    booktitle={Submitted to ICLR 2022},\n    month={April},\n    year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdescriptinc%2Fcargan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdescriptinc%2Fcargan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdescriptinc%2Fcargan/lists"}