{"id":15650141,"url":"https://github.com/yoyolicoris/pytorch_FFTNet","last_synced_at":"2025-03-09T10:30:45.965Z","repository":{"id":140051841,"uuid":"142843192","full_name":"yoyolicoris/pytorch_FFTNet","owner":"yoyolicoris","description":"A pytorch implementation of FFTNet.","archived":false,"fork":false,"pushed_at":"2018-08-31T07:38:28.000Z","size":560,"stargazers_count":36,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-01T21:50:00.959Z","etag":null,"topics":["cnn","fftnet","vocoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yoyolicoris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-30T07:51:06.000Z","updated_at":"2024-01-04T16:25:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"c3667ac1-f1e4-49d6-bd97-b34ba9e4c517","html_url":"https://github.com/yoyolicoris/pytorch_FFTNet","commit_stats":null,"previous_names":["yoyolicoris/pytorch_fftnet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoyolicoris%2Fpytorch_FFTNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoyolicoris%2Fpytorch_FFTNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoyolicoris%2Fpytorch_FFTNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoyolicoris%2Fpytorch_FFTNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yoyolicoris","download_url":"https://codeload.github.com/yoyolicoris/pytorch_FFTNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242679489,"owners_count":20168158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","fftnet","vocoder"],"created_at":"2024-10-03T12:33:36.252Z","updated_at":"2025-03-09T10:30:45.537Z","avatar_url":"https://github.com/yoyolicoris.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is a pytorch implementation of FFTNet described [here](http://gfx.cs.princeton.edu/pubs/Jin_2018_FAR/).\nWork in progress.\n\n## Quick Start\n\n1. Install requirements\n```\npip install -r requirements.txt\n```\n\n2. Download [CMU_ARCTIC](http://festvox.org/cmu_arctic/) dataset.\n\n3. Train the model and save. The default parameters are pretty much the same as int the original paper. \nRaise the flag _--preprocess_ when execute the first time.\n\n```\npython train.py \\\n    --preprocess \\\n    --wav_dir your_downloaded_wav_dir \\\n    --data_dir preprocessed_feature_dir \\\n    --model_file saved_model_name \\\n```\n\n4. Use trained model to decode/reconstruct a wav file from the mcc feature.\n\n```\npython decode.py \\\n    --infile wav_file\n    --outfile reconstruct_file_name\n    --data_dir preprocessed_feature_dir \\\n    --model_file saved_model_name \\\n```\n\n[FFTNet_generator](FFTNet_generator.py) and [FFTNet_vocoder](FFTNet_vocoder.py) are two files I used to test the model \nworkability using torchaudio yesno dataset.\n\n## Current result\n\nThere are some files decoded in the [samples](samples) folder. \n\n## Differences from paper\n\n* window size: 400 \u003e\u003e depend on minimum_f0 (cuz I use pyworld to get f0 and mcc coefficients)\n\n## TODO\n\n- [x] Zero padding.\n- [x] Injected noise.\n- [x] Voiced/unvoiced conditional sampling.\n- [x] Post-synthesis denoising.\n\n## Notes\n\n* I combine two 1x1 convolution kernel to one 1x2 dilated kernel.\nThis can remove redundant bias parameters and accelerate total speed.\n* The author said in the middle layers the channels size are 128 not 256.\n* My model will get stuck at the begining (loss aroung 4.x) for thousands of step, then go down very fast to 2.6 ~ 3.0.\nUse smaller learning rate can help a little bit.\n\n## Variations of FFTNet\n\n### Radix-N FFTNet\n\nUse the flag _--radixs_ to specify each layer's radix.\n\n```\n# a radix-4 FFTNet with 1024 receptive field\npython train.py --radixs 4 4 4 4 4\n```\n\nThe original FFtNet use Radix-2 structure. In my experiment, a radix-4 network can still achieved similar result, \neven radix-8, and by reduce the number of layers, it can run faster.\n\n### Transposed FFTNet\n\nFig. 2 in the paper can be redraw as dilated structure with kernel size 2 (also means radix size 2).\n\n![](images/fftnet_dilated.png)\n\nIf we draw all the lines;\n\n![](images/fftnet_dilated2.png)\n\nand transpose the the graph to let the arrows go backward, you'll find a WaveNet dilated structure.\n\n![](images/fftnet_wavenet.png)\n\nAdd the flag __--transpose__, you can get a simplified version of WaveNet.\n```\n# a WaveNet-like structure model withou gated/residual/skip unit.\npython train.py --transpose\n```\nIn my experiment, the transposed models are more easy to train and have slightly lower training loss compare to FFTNet.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoyolicoris%2Fpytorch_FFTNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyoyolicoris%2Fpytorch_FFTNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoyolicoris%2Fpytorch_FFTNet/lists"}