{"id":22682254,"url":"https://github.com/mddct/simple-tts","last_synced_at":"2025-04-12T03:42:22.485Z","repository":{"id":266475813,"uuid":"898449935","full_name":"Mddct/simple-tts","owner":"Mddct","description":"（WIP）long form speech generatoins","archived":false,"fork":false,"pushed_at":"2025-04-02T04:48:07.000Z","size":44,"stargazers_count":30,"open_issues_count":0,"forks_count":3,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-02T05:27:44.545Z","etag":null,"topics":["longform"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mddct.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-04T12:18:29.000Z","updated_at":"2025-04-02T04:48:10.000Z","dependencies_parsed_at":"2025-04-02T05:34:22.592Z","dependency_job_id":null,"html_url":"https://github.com/Mddct/simple-tts","commit_stats":null,"previous_names":["mddct/simple-tts"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mddct%2Fsimple-tts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mddct%2Fsimple-tts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mddct%2Fsimple-tts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mddct%2Fsimple-tts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mddct","download_url":"https://codeload.github.com/Mddct/simple-tts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248514221,"owners_count":21116899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["longform"],"created_at":"2024-12-09T20:27:58.999Z","updated_at":"2025-04-12T03:42:22.479Z","avatar_url":"https://github.com/Mddct.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## WIP\n\n## Training\n\n### 0 Data Prepare\n\n```bash\n# train tts llm\n{\"wav\": \"/data/BAC009S0764W0121.wav\", \"txt\": \"甚至出现交易几乎停滞的情况\"}\n{\"wav\": \"/data/BAC009S0764W0122.wav\", \"txt\": \"一二线城市虽然也处于调整中\"}\n```\n```bash\n# train tts DIT\n{\"wav\": \"/data/BAC009S0764W0121.wav\"}\n{\"wav\": \"/data/BAC009S0764W0122.wav\"}\n```\n\n\n###  1 (Optional) train ssl  ctc vq (not support yet)\nTODO\n- [x] https://github.com/xingchensong/S3Tokenizer\n- [ ] BestRQ + ctc + vq (future)\n\n### 2 train llm on speech tokens with  text and spk condition\n- [ ] tokenizer: char + bpe\n- [x] generate\n``` bash\noutput_dir=s1_output\n# campplus.onnx path\nspk_emb_onnx=....\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 simpletts/train_llm.py \\\n    --data_path train.jsonl \\\n    --eval_data_path eval.jsonl \\\n    --bf16 True \\\n    --output_dir $output_dir \\\n    --max_steps 100000 \\\n    --per_device_train_batch_size 8 \\\n    --per_device_eval_batch_size 1 \\\n    --save_steps 8000 \\\n    --learning_rate 3e-4 \\\n    --weight_decay 0.01 \\\n    --adam_beta2 0.95 \\\n    --warmup_steps 25000 \\\n    --lr_scheduler_type \"cosine\" \\\n    --gradient_checkpointing \\\n    --dataloader_num_workers 4 \\\n    --dataloader_prefetch_factor 10 \\\n    --campplus_onnx_path $spk_emb_onnx \\\n    --logging_steps=500 \\\n    --deepspeed ds_config_zero1.json\n```\n\n###  2 (Optional) train streaming flow matching or DIT (one-step)\n- [x] rectified flow training\n- [ ] rectified flow generate (**Ongoing**)\n\n``` bash\noutput_dir=s2_output\n# campplus.onnx path\nspk_emb_onnx=....\ntorchrun --standalone --nnodes=1 --nproc_per_node=8 simpletts/train_dit.py \\\n    --data_path train.jsonl \\\n    --eval_data_path eval.jsonl \\\n    --bf16 True \\\n    --output_dir $output_dir \\\n    --max_steps 100000 \\\n    --per_device_train_batch_size 8 \\\n    --per_device_eval_batch_size 1 \\\n    --save_steps 8000 \\\n    --learning_rate 3e-4 \\\n    --weight_decay 0.01 \\\n    --adam_beta2 0.95 \\\n    --warmup_steps 25000 \\\n    --lr_scheduler_type \"cosine\" \\\n    --gradient_checkpointing \\\n    --dataloader_num_workers 4 \\\n    --dataloader_prefetch_factor 10 \\\n    --campplus_onnx_path $spk_emb_onnx \\\n    --logging_steps=500 \\\n    --deepspeed ds_config_zero1.json\n```\n\n\n\n### 3 (Optional) train low latency streaming HIFIFAN or Vocos\nTODO\n- ［ ］https://github.com/Mddct/transformer-vocos\n\n## Inference\n- [ ] vllm\n- [ ] sglang\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmddct%2Fsimple-tts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmddct%2Fsimple-tts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmddct%2Fsimple-tts/lists"}