{"id":20419225,"url":"https://github.com/aaaastark/nemo-weightsbiases-tts","last_synced_at":"2025-03-05T04:18:43.012Z","repository":{"id":179677703,"uuid":"576023117","full_name":"aaaastark/NeMo-WeightsBiases-TTS","owner":"aaaastark","description":"Training and Tunning a Text to speech model with Nvidia NeMo and Weights and Biases","archived":false,"fork":false,"pushed_at":"2022-12-08T20:55:48.000Z","size":5283,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-15T14:16:48.852Z","etag":null,"topics":["fastpitch","hifigan","nemo","nvidia-nemo","text-to-speech","weights-and-biases"],"latest_commit_sha":null,"homepage":"https://github.com/aaaastark/NeMo-WeightsBiases-TTS","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaaastark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-08T20:44:16.000Z","updated_at":"2023-06-17T11:31:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"5c33c16c-35c3-4a99-b0ea-c5de466bf4c3","html_url":"https://github.com/aaaastark/NeMo-WeightsBiases-TTS","commit_stats":null,"previous_names":["aaaastark/nemo-weightsbiases-tts"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaaastark%2FNeMo-WeightsBiases-TTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaaastark%2FNeMo-WeightsBiases-TTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaaastark%2FNeMo-WeightsBiases-TTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaaastark%2FNeMo-WeightsBiases-TTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaaastark","download_url":"https://codeload.github.com/aaaastark/NeMo-WeightsBiases-TTS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241961060,"owners_count":20049381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastpitch","hifigan","nemo","nvidia-nemo","text-to-speech","weights-and-biases"],"created_at":"2024-11-15T06:36:19.295Z","updated_at":"2025-03-05T04:18:42.997Z","avatar_url":"https://github.com/aaaastark.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nemo and W\u0026B workshop\n\nHere is the code for the workshop \"Training and Tunning a Text to speech model with Nvidia NeMo and Weights and Biases\".\n\n## Installation\n\nIf you are running in SageMaker you will have to choose an image with Pytorch and GPU capabilities. \n- Choose Pytorch image from the `Environment` drop down menu\n\u003cimg src=\"images/sm_pt_image.png\" width=\"300\"\u003e\n\n- Choose a GPU equiped machine\n\u003cimg src=\"images/sm_gpu.png\" width=\"300\"\u003e\n\n- Open a terminal on the machine and clone this repo using `git clone`\n\u003cimg src=\"images/sm_term.png\" width=\"300\"\u003e\n\n- Run the `setup.sh` script inside the Machine Image terminal.\n```\n\u003e bash sm_setup.sh\n```\n\n\n## Running the code\n\nFollow the notebooks in order\n\n- [01_log_datasets.ipynb](01_log_datasets.ipynb) will download the data and create a train/valid split. You will learn how to store your data on Weights and Biases.\n- [02_FastPitch_finetune.ipynb](02_FastPitch_finetune.ipynb) You will fine tune a `FastPitch` model on a particular speaker. We will then analyse the results using `wandb.Tables`s\n- [03_HiFIGAN_finetune.ipynb](03_HiFIGAN_finetune.ipynb) Improve previous results by finetunning the HiFiGan model with the data from the new speaker!\n- [04_final_validation.ipynb](04_final_validation.ipynb) Analyse the final results after both finetunnings.\n\n## Notes\n\n- You will need a machine with at least 16GB of VRAM. You can try decreasing batch size on smaller machines, but you will need to adjust the steps accordingly.\n- The checkpoints `exp_dir`use a lot of space, and also the `~/.cache` gets heavy quickly. You can safely delete this as it will re-download if necessary.\n- The `9017` speaker data is really small, so the model performance is not very good. \n\n## Bring your own dataset\n\nYou can create a dataset of your own, also using `NeMo` ASR features. \n- Pitch metrics are needed from your data: [script](https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/dataset_processing/tts/compute_speaker_stats.py) can help you out 😎\n- You will need to normalize your text inputs, check [this notebooks](https://github.com/NVIDIA/NeMo/blob/main/tutorials/text_processing/Text_(Inverse)_Normalization.ipynb).\n- The expected audio files need to be in 22050Hz as sample rate. You can export your data with `from scipy.io.wavfile import write`.\n\n### CopyRights are reserved by [Thomas Capelle @ Weights Biases ML Engineer](https://github.com/tcapelle/nemo_wandb) \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaaastark%2Fnemo-weightsbiases-tts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaaastark%2Fnemo-weightsbiases-tts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaaastark%2Fnemo-weightsbiases-tts/lists"}