{"id":26527921,"url":"https://github.com/lmnt-com/wavegrad","last_synced_at":"2025-04-05T14:09:27.632Z","repository":{"id":39117529,"uuid":"295936181","full_name":"lmnt-com/wavegrad","owner":"lmnt-com","description":"A fast, high-quality neural vocoder.","archived":false,"fork":false,"pushed_at":"2023-07-18T01:28:53.000Z","size":19,"stargazers_count":279,"open_issues_count":4,"forks_count":48,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-29T13:10:27.580Z","etag":null,"topics":["deep-learning","machine-learning","neural-network","paper","pretrained-models","pytorch","speech","speech-synthesis","text-to-speech","tts","vocoder","wavegrad"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lmnt-com.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-16T05:51:19.000Z","updated_at":"2025-01-28T10:20:54.000Z","dependencies_parsed_at":"2025-03-21T15:51:06.618Z","dependency_job_id":null,"html_url":"https://github.com/lmnt-com/wavegrad","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fwavegrad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fwavegrad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fwavegrad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmnt-com%2Fwavegrad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lmnt-com","download_url":"https://codeload.github.com/lmnt-com/wavegrad/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247345854,"owners_count":20924102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","neural-network","paper","pretrained-models","pytorch","speech","speech-synthesis","text-to-speech","tts","vocoder","wavegrad"],"created_at":"2025-03-21T15:36:23.595Z","updated_at":"2025-04-05T14:09:27.611Z","avatar_url":"https://github.com/lmnt-com.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WaveGrad\n![PyPI Release](https://img.shields.io/pypi/v/wavegrad?label=release) [![License](https://img.shields.io/github/license/lmnt-com/wavegrad)](https://github.com/lmnt-com/wavegrad/blob/master/LICENSE)\n\n**We're hiring!** \nIf you like what we're building here, [come join us at LMNT](https://explore.lmnt.com).\n\nWaveGrad is a fast, high-quality neural vocoder designed by the folks at Google Brain. The architecture is described in [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf). In short, this model takes a log-scaled Mel spectrogram and converts it to a waveform via iterative refinement.\n\n## Status (2020-10-15)\n- [x] stable training (22 kHz, 24 kHz)\n- [x] high-quality synthesis\n- [x] mixed-precision training\n- [x] multi-GPU training\n- [x] custom noise schedule (faster inference)\n- [x] command-line inference\n- [x] programmatic inference API\n- [x] PyPI package\n- [x] audio samples\n- [x] pretrained models\n- [ ] precomputed noise schedule\n\n## Audio samples\n[24 kHz audio samples](https://lmnt.com/assets/wavegrad/24kHz)\n\n## Pretrained models\n[24 kHz pretrained model](https://lmnt.com/assets/wavegrad/wavegrad-24kHz.pt) (183 MB, SHA256: `65e9366da318d58d60d2c78416559351ad16971de906e53b415836c068e335f3`)\n\n## Install\n\nInstall using pip:\n```\npip install wavegrad\n```\n\nor from GitHub:\n```\ngit clone https://github.com/lmnt-com/wavegrad.git\ncd wavegrad\npip install .\n```\n\n### Training\nBefore you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [VCTK](https://pytorch.org/audio/_modules/torchaudio/datasets/vctk.html)). By default, this implementation assumes a sample rate of 22 kHz. If you need to change this value, edit [params.py](https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/params.py).\n\n```\npython -m wavegrad.preprocess /path/to/dir/containing/wavs\npython -m wavegrad /path/to/model/dir /path/to/dir/containing/wavs\n\n# in another shell to monitor training progress:\ntensorboard --logdir /path/to/model/dir --bind_all\n```\n\nYou should expect to hear intelligible speech by ~20k steps (~1.5h on a 2080 Ti).\n\n### Inference API\nBasic usage:\n\n```python\nfrom wavegrad.inference import predict as wavegrad_predict\n\nmodel_dir = '/path/to/model/dir'\nspectrogram = # get your hands on a spectrogram in [N,C,W] format\naudio, sample_rate = wavegrad_predict(spectrogram, model_dir)\n\n# audio is a GPU tensor in [N,T] format.\n```\n\nIf you have a custom noise schedule (see below):\n```python\nfrom wavegrad.inference import predict as wavegrad_predict\n\nparams = { 'noise_schedule': np.load('/path/to/noise_schedule.npy') }\nmodel_dir = '/path/to/model/dir'\nspectrogram = # get your hands on a spectrogram in [N,C,W] format\naudio, sample_rate = wavegrad_predict(spectrogram, model_dir, params=params)\n\n# `audio` is a GPU tensor in [N,T] format.\n```\n\n### Inference CLI\n```\npython -m wavegrad.inference /path/to/model /path/to/spectrogram -o output.wav\n```\n\n### Noise schedule\nThe default implementation uses 1000 iterations to refine the waveform, which runs slower than real-time. WaveGrad is able to achieve high-quality, faster than real-time synthesis with as few as 6 iterations without re-training the model with new hyperparameters.\n\nTo achieve this speed-up, you will need to search for a `noise schedule` that works well for your dataset. This implementation provides a script to perform the search for you:\n\n```\npython -m wavegrad.noise_schedule /path/to/trained/model /path/to/preprocessed/validation/dataset\npython -m wavegrad.inference /path/to/trained/model /path/to/spectrogram -n noise_schedule.npy -o output.wav\n```\n\nThe default settings should give good results without spending too much time on the search. If you'd like to find a better noise schedule or use a different number of inference iterations, run the `noise_schedule` script with `--help` to see additional configuration options.\n\n\n## References\n- [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf)\n- [Denoising Diffusion Probabilistic Models](https://arxiv.org/pdf/2006.11239.pdf)\n- [Code for Denoising Diffusion Probabilistic Models](https://github.com/hojonathanho/diffusion)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmnt-com%2Fwavegrad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flmnt-com%2Fwavegrad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmnt-com%2Fwavegrad/lists"}