{"id":20737440,"url":"https://github.com/candlewill/aivoice","last_synced_at":"2025-10-22T02:55:11.597Z","repository":{"id":71816380,"uuid":"110190261","full_name":"candlewill/AiVoice","owner":"candlewill","description":"Deep CNN networks for Speech Synthesis","archived":false,"fork":false,"pushed_at":"2017-11-15T03:23:42.000Z","size":30,"stargazers_count":49,"open_issues_count":3,"forks_count":15,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-30T05:41:16.481Z","etag":null,"topics":["cnn","deep-learning","tts"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/candlewill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-11-10T02:06:13.000Z","updated_at":"2023-11-11T02:36:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"52cc8c68-ec4a-4f44-86a8-0aa2d4ccbdef","html_url":"https://github.com/candlewill/AiVoice","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FAiVoice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FAiVoice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FAiVoice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/candlewill%2FAiVoice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/candlewill","download_url":"https://codeload.github.com/candlewill/AiVoice/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250545735,"owners_count":21448247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","deep-learning","tts"],"created_at":"2024-11-17T06:14:30.909Z","updated_at":"2025-10-22T02:55:11.514Z","avatar_url":"https://github.com/candlewill.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Voice 3\n\nThis is a tensorflow implementation of [DEEP VOICE 3: 2000-SPEAKER NEURAL TEXT-TO-SPEECH](https://arxiv.org/pdf/1710.07654.pdf). For now, we are just focusing on single speaker synthesis.\n\n\n## Requirement\n\n* Tensorflow \u003e= 1.2\n* Python \u003e= 3.0\n\n\n## Dataset\n\n[The LJ Speech Dataset](https://keithito.com/LJ-Speech-Dataset)\n\n## Pre-process\n\nDownload and unzip the LJ Speech Dataset. Run:\n\n```\npython prepro.py\n```\n\nNote: Make sure that we have unzipped the dataset into the same foler of `prepro.py`.\n\nAfter this, we would get three new folders:\n\n```\n├── dones          [New]\n├── mags           [New]\n├── mels           [New]\n├── metadata.csv\n├── README\n└── wavs\n```\n\n## Training\n\nTraining data is loaded from `./LJSpeech-1.0/metadata.csv`, `./LJSpeech-1.0/mels`, `./LJSpeech-1.0/dones`, `./LJSpeech-1.0/mags` as default. If we want to change the loading path, we could change the config in `class Hyperparams`.\n\nTo train the model, we use this command:\n\n```\npython train.py\n```\n\n## Pre-trained Model\n\nCurrently, we can not get good result. However, we still provide our pre-trained model in case someone is interested in it.\n\n[Pre-trained Model](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/pre_trained_model.tar.gz).\n\nIts attention figure is as follows:\n\n![Image of attention](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/alignment.png)\n\nAll the attention figures generated at training are included in the pre-trained model zipped file.\n\n## File Description\n\n  * hyperparams.py: hyper parameters\n  * prepro.py: creates inputs and targets, i.e., mel spectrogram, magnitude, and dones.\n  * data_load.py\n  * utils.py: several custom operational functions.\n  * modules.py: building blocks for the networks.\n  * networks.py: encoder, decoder, and converter\n  * train.py: train\n  * synthesize.py: inference\n  * test_sents.txt: some test sentences in the paper.\n\n## Reference\n\nMost of the code is borrowed from [Kyubyong/deepvoice3](https://github.com/Kyubyong/deepvoice3).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcandlewill%2Faivoice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcandlewill%2Faivoice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcandlewill%2Faivoice/lists"}