{"id":13486545,"url":"https://github.com/NVIDIA/waveglow","last_synced_at":"2025-03-27T20:33:26.771Z","repository":{"id":40985131,"uuid":"156627789","full_name":"NVIDIA/waveglow","owner":"NVIDIA","description":"A Flow-based Generative Network for Speech Synthesis","archived":false,"fork":false,"pushed_at":"2023-10-19T15:19:59.000Z","size":437,"stargazers_count":2326,"open_issues_count":80,"forks_count":533,"subscribers_count":76,"default_branch":"master","last_synced_at":"2025-03-23T22:14:49.426Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-08T00:41:44.000Z","updated_at":"2025-03-20T06:51:48.000Z","dependencies_parsed_at":"2024-10-29T18:55:18.554Z","dependency_job_id":null,"html_url":"https://github.com/NVIDIA/waveglow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fwaveglow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fwaveglow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fwaveglow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fwaveglow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/waveglow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245920848,"owners_count":20694176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:00:48.123Z","updated_at":"2025-03-27T20:33:26.463Z","avatar_url":"https://github.com/NVIDIA.png","language":"Python","readme":"![WaveGlow](waveglow_logo.png \"WaveGLow\")\n\n## WaveGlow: a Flow-based Generative Network for Speech Synthesis\n\n### Ryan Prenger, Rafael Valle, and Bryan Catanzaro\n\nIn our recent [paper], we propose WaveGlow: a flow-based network capable of\ngenerating high quality speech from mel-spectrograms. WaveGlow combines insights\nfrom [Glow] and [WaveNet] in order to provide fast, efficient and high-quality\naudio synthesis, without the need for auto-regression. WaveGlow is implemented\nusing only a single network, trained using only a single cost function:\nmaximizing the likelihood of the training data, which makes the training\nprocedure simple and stable.\n\nOur [PyTorch] implementation produces audio samples at a rate of 1200 \nkHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio\nquality as good as the best publicly available WaveNet implementation.\n\nVisit our [website] for audio samples.\n\n## Setup\n\n1. Clone our repo and initialize submodule\n\n   ```command\n   git clone https://github.com/NVIDIA/waveglow.git\n   cd waveglow\n   git submodule init\n   git submodule update\n   ```\n\n2. Install requirements `pip3 install -r requirements.txt`\n\n3. Install [Apex]\n\n\n## Generate audio with our pre-existing model\n\n1. Download our [published model]\n2. Download [mel-spectrograms]\n3. Generate audio `python3 inference.py -f \u003c(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6`  \n\nN.b. use `convert_model.py` to convert your older models to the current model\nwith fused residual and skip connections.\n\n## Train your own model\n\n1. Download [LJ Speech Data]. In this example it's in `data/`\n\n2. Make a list of the file names to use for training/testing\n\n   ```command\n   ls data/*.wav | tail -n+10 \u003e train_files.txt\n   ls data/*.wav | head -n10 \u003e test_files.txt\n   ```\n\n3. Train your WaveGlow networks\n\n   ```command\n   mkdir checkpoints\n   python train.py -c config.json\n   ```\n\n   For multi-GPU training replace `train.py` with `distributed.py`.  Only tested with single node and NCCL.\n\n   For mixed precision training set `\"fp16_run\": true` on `config.json`.\n\n4. Make test set mel-spectrograms\n\n   `python mel2samp.py -f test_files.txt -o . -c config.json`\n\n5. Do inference with your network\n\n   ```command\n   ls *.pt \u003e mel_files.txt\n   python3 inference.py -f mel_files.txt -w checkpoints/waveglow_10000 -o . --is_fp16 -s 0.6\n   ```\n\n[//]: # (TODO)\n[//]: # (PROVIDE INSTRUCTIONS FOR DOWNLOADING LJS)\n[pytorch 1.0]: https://github.com/pytorch/pytorch#installation\n[website]: https://nv-adlr.github.io/WaveGlow\n[paper]: https://arxiv.org/abs/1811.00002\n[WaveNet implementation]: https://github.com/r9y9/wavenet_vocoder\n[Glow]: https://blog.openai.com/glow/\n[WaveNet]: https://deepmind.com/blog/wavenet-generative-model-raw-audio/\n[PyTorch]: http://pytorch.org\n[published model]: https://drive.google.com/open?id=1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF\n[mel-spectrograms]: https://drive.google.com/file/d/1g_VXK2lpP9J25dQFhQwx7doWl_p20fXA/view?usp=sharing\n[LJ Speech Data]: https://keithito.com/LJ-Speech-Dataset\n[Apex]: https://github.com/nvidia/apex\n","funding_links":[],"categories":["Uncategorized","User Interaction","Python","PyTorch Models","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["Uncategorized","Acoustic User Interface","Audio Processing","Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA%2Fwaveglow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVIDIA%2Fwaveglow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA%2Fwaveglow/lists"}