{"id":13825055,"url":"https://github.com/obgnail/video-subtitle-extractor","last_synced_at":"2025-10-04T11:14:51.706Z","repository":{"id":112904567,"uuid":"586251716","full_name":"obgnail/video-subtitle-extractor","owner":"obgnail","description":"提取视频硬字幕。采用 PaddleOCR。","archived":false,"fork":false,"pushed_at":"2023-04-28T17:29:17.000Z","size":4176,"stargazers_count":9,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-18T16:55:31.962Z","etag":null,"topics":["extractor","ocr","opencv","python","subtitles","video"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/obgnail.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-07T13:36:39.000Z","updated_at":"2024-11-02T16:51:35.000Z","dependencies_parsed_at":"2024-01-20T03:00:22.178Z","dependency_job_id":null,"html_url":"https://github.com/obgnail/video-subtitle-extractor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/obgnail/video-subtitle-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obgnail%2Fvideo-subtitle-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obgnail%2Fvideo-subtitle-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obgnail%2Fvideo-subtitle-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obgnail%2Fvideo-subtitle-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/obgnail","download_url":"https://codeload.github.com/obgnail/video-subtitle-extractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obgnail%2Fvideo-subtitle-extractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264357294,"owners_count":23595576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extractor","ocr","opencv","python","subtitles","video"],"created_at":"2024-08-04T09:01:14.129Z","updated_at":"2025-10-04T11:14:46.677Z","avatar_url":"https://github.com/obgnail.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# video subtitle extractor\n\n![video_subtitle_extractor](assets/video_subtitle_extractor.gif)\n\n## introduction\n\n提取视频硬字幕。采用 PaddleOCR。\n\n在解析前可以对视频进行`节选`(select_fragment)，`框选范围`(select_roi)，`阈值`(select_threshold) 操作，提高 OCR 效率。\n\n三者操作都可以使用 use_fragment，use_roi，use_threshold 配置进入交互页面设置。也可以通过 time_start、time_end、roi_array、threshold 等配置参数直接设置。\n\n```bash\npython video_subtitle_extractor.py --path=\"./CyberpunkEdgerunners01.mkv\" --use_fragment=True --use_threshold=True --use_roi=True --roi_time=\"3:24\"\n```\n\n```python\nextractor = SubtitleExtractor(\n  video_path=r'./CyberpunkEdgerunners01.mkv'，\n  time_start='03:01',\n  time_end='23:44',\n  roi_array=(24, 789, 1896 191),\n  threshold=201,\n)\n# extractor.select_fragment()\n# extractor.select_roi()\n# extractor.select_threshold()\nsubtitles = extractor.extract()\nextractor.save(subtitles)\n```\n\n## config\n\ncommon：\n\n- path：视频路径\n- subtitle_max_show_second：字幕最长显示时间，默认 10s\n- text_similar_threshold：字幕相似度阈值(大于此阈值判定为相似)，默认 70\n- output_format：输出字幕的格式，默认 lrc\n\nparser：\n\n- parse_time_start：开始解析视频的时间。format: %H:%M:%S。\n- parse_time_end：停止解析视频的时间。format: %H:%M:%S。\n- parse_capture_interval：解析的采样率。默认每隔 0.5s 采样一次。\n- parse_gray：解析视频时是否将视频切换为灰度图。默认 False。\n- parse_resize：解析视频时是否将视频进行缩放处理。默认为 1。\n\nprepare：\n\n- use_fragment：是否通过交互页面设置 parse_time_start 和 parse_time_end。默认为 False（如果 parse_time_start 和 parse_time_end 都没有设置，use_fragment 将设置为 True）\n- fragment_reshow：选取片段后，是否重新展示。\n- use_roi：是否通过交互页面框选字幕位置。默认为 False\n- roi_time：出现字幕的时间。\n- roi_reshow：框选字幕位置后，是否重新展示。\n- use_threshold：是否使用阈值对视频进行处理。\n- threshold_time：出现字幕的时间。\n- threshold_reshow：使用阈值后，是否重新展示。\n\nocr：详见 PaddleOCR 的配置\n\n- ocr_lang\n- ocr_use_angle_cls\n- ocr_use_gpu\n- ocr_use_mp\n- ocr_enable_mkldnn\n- ocr_gpu_mem\n- ocr_det_limit_side_len\n- ocr_rec_batch_num\n- ocr_cpu_threads\n- ocr_drop_score\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobgnail%2Fvideo-subtitle-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fobgnail%2Fvideo-subtitle-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobgnail%2Fvideo-subtitle-extractor/lists"}