{"id":18537690,"url":"https://github.com/winking324/subtitle-to-audio-track","last_synced_at":"2025-05-15T01:34:55.823Z","repository":{"id":40690375,"uuid":"482793872","full_name":"winking324/subtitle-to-audio-track","owner":"winking324","description":"Subtitle to Audio Track","archived":false,"fork":false,"pushed_at":"2024-01-31T05:50:58.000Z","size":26,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T07:43:17.074Z","etag":null,"topics":["audio","subtitle","track"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/winking324.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-18T09:45:31.000Z","updated_at":"2024-01-19T06:53:16.000Z","dependencies_parsed_at":"2024-01-31T06:52:27.839Z","dependency_job_id":"32c2c126-f585-49f6-ae23-dc82fedc1e8b","html_url":"https://github.com/winking324/subtitle-to-audio-track","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/winking324%2Fsubtitle-to-audio-track","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/winking324%2Fsubtitle-to-audio-track/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/winking324%2Fsubtitle-to-audio-track/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/winking324%2Fsubtitle-to-audio-track/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/winking324","download_url":"https://codeload.github.com/winking324/subtitle-to-audio-track/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254256827,"owners_count":22040365,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","subtitle","track"],"created_at":"2024-11-06T19:39:37.785Z","updated_at":"2025-05-15T01:34:55.803Z","avatar_url":"https://github.com/winking324.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 字幕转音轨 Subtitle to Audio Track\n\n给小朋友看纪录片时，大多数都是英文的，所以不得不充当人工翻译，但是人工翻译的效率很低，容易错过一些精彩的内容。 \n所以在刚好看到字幕的瞬间，想到了一个不错的想法：把字幕转成语音，替换进去不就好了吗？\n\n# 流程\n\n假设有一个视频为 `A.mkv`，大概的流程如下：\n1. 提取 `A.mkv` 的音轨为 `A.mp3`；\n2. 使用 [Spleeter](https://github.com/deezer/spleeter) 对 `A.mp3` 进行背景音和配音的分离；\n3. 解析字幕，并调用百度语音合成 API，转换为音频段；\n4. 把所有音频段和背景音合并起来成为 `B.mp3`；\n5. 把 `B.mp3` 合并到 `A.mkv` 中，成为新的音轨；\n\n# 用法\n\n1. 保证视频和字幕在同一位置和名称，例如：\n   `/your/path/to/video.mkv`\n   `/your/path/to/video.ass`\n2. 修改 `helper/speech.py` 中关于百度 API 接口的设置；\n3. 执行 `python3 subtitle_to_audio_track.py /your/path/to/video.mkv`；\n4. 最终视频文件输出到 `/your/path/to/video.ch.mkv`；\n\n# TODO\n\n1. 目前字幕文件只支持 `.ass` 格式；\n2. ~~字幕文件的编码格式需要自动识别（目前为 UTF-16-LE）；~~\n3. 通过更好的方式设置百度 API，或者增加阿里等其他 API 的支持；\n4. 优化音质的问题；\n5. 优化音轨，例如增加音轨名称等；\n6. Docker 打包，避免用户安装环境；\n7. 复杂格式的 `ass` 字幕适配；\n8. 自动增加标点符号，以获取更好的 TTS 效果；\n9. 如果语音时长超过字幕时长，调整语速重新生成语音；\n10. 使用微软 [TTS](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) 替换百度；\n\n# 遗留的问题\n\n纪录片一般只有一个角色，所以用一种角色的配音，就可以获得比较好的效果。\n另外，纪录片一般采用平铺直述的方式，语言中没有很多情感的波动，所以语音中没有太多情绪特征。\n所以如果要给一般的视频转换配音，就需要更高的要求和挑战了，这是该项目目前所不具备的。\n\n# 补充点子\n\n现在媒体大多都转视频方式，不看完了也不知道能收获到什么，有点费时费力。而当下 ChatGPT 等 AI 助手已经很成熟，可以把文章汇总形成总结。可以考虑：\n\n1. 提取音频；\n2. 提取人声，并进行断句分割（可能需要根据声音特性区分成不同人）；\n3. 转文字；\n4. 使用 AI 汇总成总结（或者分段总结）；\n5. 看总结决定要不要看视频（或者分段）；\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinking324%2Fsubtitle-to-audio-track","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwinking324%2Fsubtitle-to-audio-track","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinking324%2Fsubtitle-to-audio-track/lists"}