{"id":13578555,"url":"https://github.com/Deepest-Project/MelNet","last_synced_at":"2025-04-05T19:33:16.118Z","repository":{"id":38428657,"uuid":"202840577","full_name":"Deepest-Project/MelNet","owner":"Deepest-Project","description":"Implementation of \"MelNet: A Generative Model for Audio in the Frequency Domain\"","archived":false,"fork":false,"pushed_at":"2024-07-25T10:55:31.000Z","size":173,"stargazers_count":210,"open_issues_count":10,"forks_count":39,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-02T17:54:15.566Z","etag":null,"topics":["generative-model","pytorch","tts"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Deepest-Project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-17T05:38:54.000Z","updated_at":"2025-04-01T06:14:53.000Z","dependencies_parsed_at":"2024-11-05T16:52:40.702Z","dependency_job_id":null,"html_url":"https://github.com/Deepest-Project/MelNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepest-Project%2FMelNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepest-Project%2FMelNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepest-Project%2FMelNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deepest-Project%2FMelNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Deepest-Project","download_url":"https://codeload.github.com/Deepest-Project/MelNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393077,"owners_count":20931804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generative-model","pytorch","tts"],"created_at":"2024-08-01T15:01:31.770Z","updated_at":"2025-04-05T19:33:15.681Z","avatar_url":"https://github.com/Deepest-Project.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# MelNet\n\nImplementation of [MelNet: A Generative Model for Audio in the Frequency Domain](\u003chttps://arxiv.org/abs/1906.01083\u003e)\n\n## Prerequisites\n\n- Tested with Python 3.6.8 \u0026 3.7.4, PyTorch 1.2.0 \u0026 1.3.0.\n- `pip install -r requirements.txt`\n\n## How to train\n\n### Datasets\n\n- Blizzard, VoxCeleb2, and KSS have YAML files provided under `config/`. For other datasets, fill out your own YAML file according to the other provided ones.\n- Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by `data.extension` within the YAML file.\n- Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.\n\n### Running the code\n\n- `python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]`\n  - Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.\n    - Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.\n  - The `-s` flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when `[tier number] != 0` . Warning: this flag is toggled `True` no matter what follows the flag. Ignore it if you're not planning to use it.\n\n## How to sample\n\n### Preparing the checkpoints\n\n- The checkpoints must be stored under `chkpt/`.\n- A YAML file named `inference.yaml` must be provided under `config/`.\n- `inference.yaml` must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.\n\n### Running the code\n\n- `python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]`\n  - Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly `[sample rate] : [hop length of FFT]`.\n  - The `-i` flag is optional, only needed for conditional generation. Surround the sentence with `\"\"` and end with `.`.\n  - Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).\n\n## To-do\n\n- [x] Implement upsampling procedure\n- [x] GMM sampling + loss function\n- [x] Unconditional audio generation\n- [x] TTS synthesis \n- [x] Tensorboard logging\n- [x] Multi-GPU training\n- [ ] Primed generation\n\n## Implementation authors\n\n- [Seungwon Park](\u003chttps://github.com/seungwonpark\u003e), [June Young Yi](\u003chttps://github.com/Rick-McCoy\u003e), [Yoonhyung Lee](\u003chttps://github.com/LEEYOONHYUNG\u003e), [Joowhan Song](\u003chttps://github.com/Joovvhan\u003e) @ Deepest Season 6\n\n## License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeepest-Project%2FMelNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDeepest-Project%2FMelNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeepest-Project%2FMelNet/lists"}