{"id":19556313,"url":"https://github.com/maum-ai/nuwave2","last_synced_at":"2025-04-09T15:06:03.005Z","repository":{"id":37405352,"uuid":"473047152","full_name":"maum-ai/nuwave2","owner":"maum-ai","description":"NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates @ INTERSPEECH 2022","archived":false,"fork":false,"pushed_at":"2023-09-16T16:23:22.000Z","size":47483,"stargazers_count":286,"open_issues_count":10,"forks_count":23,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-09T15:05:52.507Z","etag":null,"topics":["deep-learning","neural-audio-upsampling","pytorch","super-resolution","upsampling"],"latest_commit_sha":null,"homepage":"https://mindslab-ai.github.io/nuwave2","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maum-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-23T05:21:08.000Z","updated_at":"2025-03-24T11:21:46.000Z","dependencies_parsed_at":"2024-11-11T04:37:39.669Z","dependency_job_id":"c2683d8d-52bc-4537-84d4-9c0aad0e40da","html_url":"https://github.com/maum-ai/nuwave2","commit_stats":null,"previous_names":["maum-ai/nuwave2","mindslab-ai/nuwave2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maum-ai%2Fnuwave2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maum-ai%2Fnuwave2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maum-ai%2Fnuwave2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maum-ai%2Fnuwave2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maum-ai","download_url":"https://codeload.github.com/maum-ai/nuwave2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248055284,"owners_count":21040157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","neural-audio-upsampling","pytorch","super-resolution","upsampling"],"created_at":"2024-11-11T04:37:31.572Z","updated_at":"2025-04-09T15:06:02.980Z","avatar_url":"https://github.com/maum-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NU-Wave2 \u0026mdash; Official PyTorch Implementation\n\n**NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates**\u003cbr\u003e\nSeungu Han, Junhyeok Lee @ [MINDsLab Inc.](https://github.com/mindslab-ai), SNU\n\n[![arXiv](https://img.shields.io/badge/arXiv-2206.08545-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2206.08545) [![GitHub Repo stars](https://img.shields.io/github/stars/mindslab-ai/nuwave2?color=yellow\u0026label=NU-Wave2\u0026logo=github\u0026style=flat-square)](https://github.com/mindslab-ai/nuwave2) [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github\u0026style=flat-square)](https://mindslab-ai.github.io/nuwave2/)\n\nOfficial Pytorch+[Lightning](https://github.com/PyTorchLightning/pytorch-lightning) Implementation for NU-Wave 2.\n\n![](./docs/sampling.gif)\n\n**Official Checkpoint can be downloaded from [here](https://drive.google.com/file/d/11t0cQYx6ZadKQjmfGnqxUUH2UEk5Yzk7/view?usp=sharing).**  \n\n**We add some additional samples for non-English voice (Korean) and ablation study without BSFT on the [demo page](https://mindslab-ai.github.io/nuwave2/). Please check it!**\n\n**We also trained a model targeting 16 kHz (3.2 kHz ~ 16 kHz source). The Checkpoint can be downloaded from [here](https://drive.google.com/file/d/1IZihqb0LKHLtqRjyhHBGxXHJhUwskVRo/view?usp=sharing).**  \n\n## Requirements\n- [Pytorch](https://pytorch.org/) \u003e=1.7.0 for nn.SiLU(swish activation)\n- [Pytorch-Lightning](https://github.com/PyTorchLightning/pytorch-lightning)==1.2.10\n- The requirements are highlighted in [requirements.txt](./requirements.txt).\n- We also provide docker setup [Dockerfile](./Dockerfile).\n\n## Clone our Repository\n```bash\ngit clone --recursive https://github.com/mindslab-ai/nuwave2.git\ncd nuwave2\n```\n\n## Preprocessing\nBefore running our project, you need to download and preprocess dataset to `.wav` files\n1. Download [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443)\n2. Remove speaker `p280` and `p315`\n3. Modify path of downloaded dataset `data:base_dir` in `hparameter.yaml`\n4. run `utils/flac2wav.py`\n```shell script\npython utils/flac2wav.py\n```\n\n## Training\n1. Adjust `hparameter.yaml`, especially `train` section.\n```yaml\ntrain:\n  batch_size: 12 # Dependent on GPU memory size\n  lr: 2e-4\n  weight_decay: 0.00\n  num_workers: 8 # Dependent on CPU cores\n  gpus: 2 # number of GPUs\n  opt_eps: 1e-9\n  beta1: 0.9\n  beta2: 0.99\n```\n- Adjust `data` section in `hparameters.yaml`.\n```yaml\ndata:\n  timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'\n  base_dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed/'\n  dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed_wav/' #dir/spk/format\n  format: '*mic1.wav'\n  cv_ratio: (100./108., 8./108., 0.00) #train/val/test\n```\n2. run `trainer.py`.\n```shell script\n$ python trainer.py\n```\n- If you want to resume training from checkpoint, check parser.\n```python\n    parser = argparse.ArgumentParser()\n    parser.add_argument('-r', '--resume_from', type =int,\\\n            required = False, help = \"Resume Checkpoint epoch number\")\n    parser.add_argument('-s', '--restart', action = \"store_true\",\\\n            required = False, help = \"Significant change occured, use this\")\n    parser.add_argument('-e', '--ema', action = \"store_true\",\\\n            required = False, help = \"Start from ema checkpoint\")\n    args = parser.parse_args()\n```\n- During training, tensorboard logger is logging loss, spectrogram and audio.\n```shell script\n$ tensorboard --logdir=./tensorboard --bind_all\n```\n\n![](./docs/images/train_loss.png)\n![](./docs/images/spec.png)\n\n## Evaluation\nrun `for_test.py`\n```shell script\npython for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}\n```\nPlease check parser.\n```python\n    parser = argparse.ArgumentParser()\n    parser.add_argument('-r', '--resume_from', type =int,\n                required = True, help = \"Resume Checkpoint epoch number\")\n    parser.add_argument('-e', '--ema', action = \"store_true\",\n                required = False, help = \"Start from ema checkpoint\")\n    parser.add_argument('--save', action = \"store_true\",\n               required = False, help = \"Save file\")\n    parser.add_argument('--sr', type=int, \\\n               required=True, help=\"input sampling rate\")\n```\n\n## Inference\n- run `inference.py`\n```shell script\npython inference.py -c {checkpoint_path} -i {input audio} --sr {Sampling rate of input audio} {--steps:option} {--gt:option}\n```\nPlease check parser.  \n  \n**__Note:__** If your input is downsampled (12kHz, 16kHz, etc.) audio sample with a full valid frequency component based on the corresponding sampling rate, give the parser as '--sr {Sampling rate of input audio}' without '--gt' parser.  \nOn the other hand, if you have a 48kHz audio sample with a full valid frequency component and just want to check whether the model works well, give the parser as '--sr {Sampling rate of input which you want to check}' and add '--gt' parser.  \nPlease check [this issue](https://github.com/mindslab-ai/nuwave2/issues/5) for more information. \n```python\n    parser = argparse.ArgumentParser()\n    parser.add_argument('-c',\n                        '--checkpoint',\n                        type=str,\n                        required=True,\n                        help=\"Checkpoint path\")\n    parser.add_argument('-i',\n                        '--wav',\n                        type=str,\n                        default=None,\n                        help=\"audio\")\n    parser.add_argument('--sr',\n                        type=int,\n                        required=True,\n                        help=\"Sampling rate of input audio\")\n    parser.add_argument('--steps',\n                        type=int,\n                        required=False,\n                        help=\"Steps for sampling\")\n    parser.add_argument('--gt', action=\"store_true\",\n                        required=False, help=\"Whether the input audio is 48 kHz ground truth audio.\")\n    parser.add_argument('--device',\n                        type=str,\n                        default='cuda',\n                        required=False,\n                        help=\"Device, 'cuda' or 'cpu'\")\n```\n\n## References\nThis implementation uses code from following repositories:\n- [official NU-Wave pytorch implementation](https://github.com/mindslab-ai/nuwave)\n- [revsic's Jax/Flax implementation of Variational-DiffWave](https://github.com/revsic/jax-variational-diffwave)\n- [ivanvovk's WaveGrad pytorch implementation](https://github.com/ivanvovk/WaveGrad)\n- [lmnt-com's DiffWave pytorch implementation](https://github.com/lmnt-com/diffwave)\n- [NVlabs' SPADE pytorch implementation](https://github.com/NVlabs/SPADE)\n- [pkumivision's FFC pytorch implementation](https://github.com/pkumivision/FFC)\n\nThis README and the webpage for the audio samples are inspired by:\n- [Tips for Publishing Research Code](https://github.com/paperswithcode/releasing-research-code)\n- [Audio samples webpage of DCA](https://google.github.io/tacotron/publications/location_relative_attention/)\n- [Cotatron](https://github.com/mindslab-ai/cotatron/)\n- [Audio samples wabpage of WaveGrad](https://wavegrad.github.io)\n\nThe audio samples on our [webpage](https://mindslab-ai.github.io/nuwave2/) are partially derived from:\n- [VCTK dataset(0.92)](https://datashare.ed.ac.uk/handle/10283/3443): 46 hours of English speech from 108 speakers.\n- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.\n\n## Repository Structure\n```\n.\n|-- Dockerfile\n|-- LICENSE\n|-- README.md\n|-- dataloader.py           # Dataloader for train/val(=test)\n|-- diffusion.py            # DPM\n|-- for_test.py             # Test with for_loop.\n|-- hparameter.yaml         # Config\n|-- inference.py            # Inference\n|-- lightning_model.py      # NU-Wave 2 implementation.\n|-- model.py                # NU-Wave 2 model based on lmnt-com's DiffWave implementation\n|-- requirements.txt        # requirement libraries\n|-- trainer.py              # Lightning trainer\n|-- utils\n|   |-- flac2wav.py             # Preprocessing\n|   |-- stft.py                 # STFT layer\n|   `-- tblogger.py             # Tensorboard Logger for lightning\n|-- docs                    # For github.io\n|   |-- ...\n`-- vctk-silence-labels     # For trimming\n    |-- ...\n```\n\n## Citation \u0026 Contact\nIf this repository useful for your research, please consider citing!\n```bib\n@article{han2022nu,\n  title={NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates},\n  author={Han, Seungu and Lee, Junhyeok},\n  journal={arXiv preprint arXiv:2206.08545},\n  year={2022}\n}\n```\nIf you have a question or any kind of inquiries, please contact Seungu Han at [hansw032@snu.ac.kr](mailto:hansw0326@snu.ac.kr)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaum-ai%2Fnuwave2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaum-ai%2Fnuwave2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaum-ai%2Fnuwave2/lists"}