{"id":21271389,"url":"https://github.com/l0sg/waveflow","last_synced_at":"2025-10-10T08:30:51.078Z","repository":{"id":37626203,"uuid":"213303967","full_name":"L0SG/WaveFlow","owner":"L0SG","description":"A PyTorch implementation of \"WaveFlow: A Compact Flow-based Model for Raw Audio\" (ICML 2020)","archived":false,"fork":false,"pushed_at":"2024-07-25T10:46:30.000Z","size":1476,"stargazers_count":123,"open_issues_count":1,"forks_count":16,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-03T18:23:17.636Z","etag":null,"topics":["normalizing-flows","pytorch","speech-synthesis","waveflow"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1912.01219","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/L0SG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-07T05:41:39.000Z","updated_at":"2025-03-16T15:54:49.000Z","dependencies_parsed_at":"2022-09-06T08:10:52.557Z","dependency_job_id":"9c439b24-13e6-48ed-9c28-f88741262188","html_url":"https://github.com/L0SG/WaveFlow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/L0SG/WaveFlow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/L0SG%2FWaveFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/L0SG%2FWaveFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/L0SG%2FWaveFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/L0SG%2FWaveFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/L0SG","download_url":"https://codeload.github.com/L0SG/WaveFlow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/L0SG%2FWaveFlow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003299,"owners_count":26083555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["normalizing-flows","pytorch","speech-synthesis","waveflow"],"created_at":"2024-11-21T08:22:50.194Z","updated_at":"2025-10-10T08:30:50.310Z","avatar_url":"https://github.com/L0SG.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## WaveFlow: A Compact Flow-based Model for Raw Audio\n\n#### Update: Pretrained weights are now available. See links below.\n\nThis is an unofficial PyTorch implementation of [WaveFlow] (Ping et al, ICML 2020) model.\n\nThe aim for this repo is to provide easy-to-use PyTorch version of WaveFlow as a drop-in alternative to various neural vocoder models used with NVIDIA's [Tacotron2] audio processing backend.\n\nPlease refer to the [official implementation] written in PaddlePaddle for the official results.\n\n## Setup\n\n1. Clone this repo and install requirements\n\n   ```command\n   git clone https://github.com/L0SG/WaveFlow.git\n   cd WaveFlow\n   pip install -r requirements.txt\n   ```\n\n2. Install [Apex] for mixed-precision training\n\n\n## Train your model\n\n1. Download [LJ Speech Data]. In this example it's in `data/`\n\n2. Make a list of the file names to use for training/testing.\n\n   ```command\n   ls data/*.wav | tail -n+10 \u003e train_files.txt\n   ls data/*.wav | head -n10 \u003e test_files.txt\n   ```\n    `-n+10` and `-n10` indicates that this example reserves the first 10 audio clips for model testing.\n\n3. Edit the configuration file and train the model.\n\n    Below are the example commands using `waveflow-h16-r64-bipartize.json`\n\n   ```command\n   nano configs/waveflow-h16-r64-bipartize.json\n   python train.py -c configs/waveflow-h16-r64-bipartize.json\n   ```\n   Single-node multi-GPU training is automatically enabled with [DataParallel] (instead of [DistributedDataParallel] for simplicity).\n\n   For mixed precision training, set `\"fp16_run\": true` on the configuration file.\n\n   You can load the trained weights from saved checkpoints by providing the path to `checkpoint_path` variable in the config file.\n\n   `checkpoint_path` accepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints.\n\n   ### Examples\n   insert `checkpoint_path: \"experiments/waveflow-h16-r64-bipartize/waveflow_5000\"` in the config file then run\n   ```command\n   python train.py -c configs/waveflow-h16-r64-bipartize.json\n   ```\n\n   for loading averaged weights over 10 recent checkpoints, insert `checkpoint_path: \"experiments/waveflow-h16-r64-bipartize\"` in the config file then run\n   ```command\n   python train.py -a 10 -c configs/waveflow-h16-r64-bipartize.json\n   ```\n\n   you can reset the optimizer and training scheduler (and keep the weights) by providing `--warm_start`\n   ```command\n   python train.py --warm_start -c configs/waveflow-h16-r64-bipartize.json\n   ```\n   \n4. Synthesize waveform from the trained model.\n\n   insert `checkpoint_path` in the config file and use `--synthesize` to `train.py`. The model generates waveform by looping over `test_files.txt`.\n   ```command\n   python train.py --synthesize -c configs/waveflow-h16-r64-bipartize.json\n   ```\n   if `fp16_run: true`, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores).\n   \n### Pretrained Weights\n\nWe provide pretrained weights via Google Drive. The models are trained for 5 M steps, then we averaged weights over 20 last checkpoints with `-a 20`. Audio quality almost matches the original paper. \n\n| Models        | Download |\n|:-------------:|:-------------:|\n| waveflow-h16-r64-bipartize      |[Link](https://drive.google.com/file/d/1z402Lvb3D3no469NpC_7PkIHB8V140gj/view?usp=sharing) |\n| waveflow-h16-r128-bipartize       |[Link](https://drive.google.com/file/d/12tKPQMu79kr29oMloNLIl0I0l86SyPdX/view?usp=sharing) |\n\n## Reference\nNVIDIA Tacotron2: https://github.com/NVIDIA/waveglow\n\nNVIDIA WaveGlow: https://github.com/NVIDIA/waveglow\n\nr9y9 wavenet-vocoder: https://github.com/r9y9/wavenet_vocoder\n\nFloWaveNet: https://github.com/ksw0306/FloWaveNet\n\nParakeet: https://github.com/PaddlePaddle/Parakeet\n\n[Tacotron2]: https://github.com/NVIDIA/tacotron2\n[DataParallel]: https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html\n[DistributedDataParallel]: https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html\n[WaveFlow]: https://arxiv.org/abs/1912.01219\n[LJ Speech Data]: https://keithito.com/LJ-Speech-Dataset\n[Apex]: https://github.com/nvidia/apex\n[official implementation]: https://github.com/PaddlePaddle/Parakeet\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fl0sg%2Fwaveflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fl0sg%2Fwaveflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fl0sg%2Fwaveflow/lists"}