{"id":13824276,"url":"https://github.com/devmaxxing/videocr-PaddleOCR","last_synced_at":"2025-07-08T19:31:18.596Z","repository":{"id":45808706,"uuid":"386736767","full_name":"devmaxxing/videocr-PaddleOCR","owner":"devmaxxing","description":"Extract hardcoded subtitles from videos using machine learning","archived":false,"fork":true,"pushed_at":"2024-07-02T01:17:40.000Z","size":3455,"stargazers_count":127,"open_issues_count":6,"forks_count":18,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-05T09:13:02.102Z","etag":null,"topics":["machine-learning","ocr","paddleocr","paddlepaddle","subtitles"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"apm1467/videocr","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devmaxxing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-16T18:59:01.000Z","updated_at":"2024-07-31T23:23:56.000Z","dependencies_parsed_at":"2023-01-29T23:45:27.218Z","dependency_job_id":null,"html_url":"https://github.com/devmaxxing/videocr-PaddleOCR","commit_stats":null,"previous_names":["devmaxxing/videocr-paddleocr"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmaxxing%2Fvideocr-PaddleOCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmaxxing%2Fvideocr-PaddleOCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmaxxing%2Fvideocr-PaddleOCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmaxxing%2Fvideocr-PaddleOCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devmaxxing","download_url":"https://codeload.github.com/devmaxxing/videocr-PaddleOCR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225457760,"owners_count":17477350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","ocr","paddleocr","paddlepaddle","subtitles"],"created_at":"2024-08-04T09:00:59.958Z","updated_at":"2025-07-08T19:31:18.581Z","avatar_url":"https://github.com/devmaxxing.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# videocr\n\nExtract hardcoded (burned-in) subtitles from videos using the [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) OCR engine with Python. A Colab notebook for installing and running this library is included for convenience:\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oliverfei/videocr-PaddleOCR/blob/master/videocr_PaddleOCR.ipynb)\n\n## GUI Applications\nFor user friendly applications that make use of this library, see:\n- https://github.com/timminator/VideOCR\n\n## Usage\n\n```python\n# example.py\n\nfrom videocr import save_subtitles_to_file\n\nif __name__ == '__main__':\n    save_subtitles_to_file('example_cropped.mp4', 'example.srt', lang='ch', time_start='7:10', time_end='7:34',\n     sim_threshold=80, conf_threshold=75, use_fullframe=True,\n     brightness_threshold=210, similar_image_threshold=1000, frames_to_skip=1)\n```\n\n`$ python3 example.py`\n\nexample.srt:\n\n``` \n1\n00:07:10,000 --\u003e 00:07:10,083\n商城......现在没什么东西\n\n2\n00:07:10,416 --\u003e 00:07:12,000\n这边是战斗辅助系统\n\n3\n00:07:13,083 --\u003e 00:07:14,500\n要进去才能了解了\n\n4\n00:07:15,083 --\u003e 00:07:15,916\n没问题了吧\n\n5\n00:07:16,333 --\u003e 00:07:17,166\n我们准备登录\n\n6\n00:07:18,416 --\u003e 00:07:21,083\n啊对了， 登录没有服务器的选择么\n\n7\n00:07:21,333 --\u003e 00:07:25,000\n没有本游戏所有玩家， 都在个服务器内\n\n8\n00:07:25,833 --\u003e 00:07:28,833\n刺激了， 这么多玩家居然都不分流的么\n\n9\n00:07:29,500 --\u003e 00:07:31,083\n那......现在登录吗？\n\n10\n00:07:31,166 --\u003e 00:07:32,416\n好，登录吧！\n```\n\n## Install prerequisites\nPython 3.8 - 3.12\n\npaddlepaddle or paddlepaddle-gpu See https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html\n\n## Installation\n\n`pip install git+https://github.com/oliverfei/videocr-PaddleOCR.git`\n\nAlternatively for development:\n1. Clone this repo\n2. From the root directory of this repository run `python -m pip install .`\n\n## Performance\n\nThe OCR process can be very slow on CPU. Running with `paddlepaddle-gpu` is recommended if you have a CUDA GPU.\n\n## Tips\n\nTo shorten the amount of time it takes to perform OCR on each frame, you can use the `crop_x`, `crop_y`, `crop_width`, `crop_height` params to crop out only the areas of the videos where the subtitles appear. When cropping, leave a bit of buffer space above and below the text to ensure accurate readings.\n\n### Quick Configuration Cheatsheet\n\n|| More Speed | More Accuracy | Notes\n-|------------|---------------|--------\nInput Video Quality       | Use lower quality           | Use higher quality  | Performance impact of using higher resolution video can be reduced with cropping\n`frames_to_skip`          | Higher number               | Lower number        |\n`brightness_threshold`    | Higher threshold            | N/A                 | A brightness threshold can help speed up the OCR process by filtering out dark frames. In certain circumstances such as when subtitles are white and against a bright background, it may also help with accuracy.\n\n\n## API\n\n1. Return subtitle string in SRT format\n    ```python\n    get_subtitles(\n        video_path: str, lang='ch', time_start='0:00', time_end='',\n        conf_threshold=75, sim_threshold=80, use_fullframe=False,\n        det_model_dir=None, rec_model_dir=None, use_gpu=False,\n        brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1,\n        crop_x=None, crop_y=None, crop_width=None, crop_height=None)\n    ```\n\n2. Write subtitles to `file_path`\n    ```python\n    save_subtitles_to_file(\n        video_path: str, file_path='subtitle.srt', lang='ch', time_start='0:00', time_end='', \n        conf_threshold=75, sim_threshold=80, use_fullframe=False,\n        det_model_dir=None, rec_model_dir=None, use_gpu=False,\n        brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1,\n        crop_x=None, crop_y=None, crop_width=None, crop_height=None)\n    ```\n\n### Parameters\n\n- `lang`\n\n  The language of the subtitles. See [PaddleOCR docs](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/multi_languages_en.md#5-support-languages-and-abbreviations) for list of supported languages and their abbreviations\n\n- `conf_threshold`\n\n  Confidence threshold for word predictions. Words with lower confidence than this value will be discarded. The default value `75` is fine for most cases. \n\n  Make it closer to 0 if you get too few words in each line, or make it closer to 100 if there are too many excess words in each line.\n\n- `sim_threshold`\n\n  Similarity threshold for subtitle lines. Subtitle lines with larger [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance) ratios than this threshold will be merged together. The default value `80` is fine for most cases.\n\n  Make it closer to 0 if you get too many duplicated subtitle lines, or make it closer to 100 if you get too few subtitle lines.\n\n- `time_start` and `time_end`\n\n  Extract subtitles from only a clip of the video. The subtitle timestamps are still calculated according to the full video length.\n\n- `use_fullframe`\n\n  By default, the specified cropped area is used for OCR or if a crop is not specified, then the bottom third of the frame will be used. By setting this value to `True` the entire frame will be used.\n\n- `crop_x`, `crop_y`, `crop_width`, `crop_height`\n\n  Specifies the bounding area in pixels for the portion of the frame that will be used for OCR. See image below for example:\n  ![image](https://user-images.githubusercontent.com/8058852/226201081-f4ec9a23-4cc8-48d4-b15c-6ea2ac29ae93.png)\n\n- `det_model_dir`\n\n  the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/det; 2. The path of a specific inference model, the model and params files must be included in the model path.\n  \n  See PaddleOCR repo for list of prebuilt models: https://github.com/PaddlePaddle/PaddleOCR/.\n\n- `rec_model_dir`\n  \n  the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/rec; 2. The path of a specific inference model, the model and params files must be included in the model path.\n  \n  See PaddleOCR repo for list of prebuilt models: https://github.com/PaddlePaddle/PaddleOCR/.\n\n- `use_gpu`\n\n  Set to `True` if performing ocr with gpu (requires the `paddlepaddle-gpu` python package to be installed)\n\n- `brightness_threshold`\n  \n  If set, pixels whose brightness are less than the threshold will be blackened out. Valid brightness values range from 0 (black) to 255 (white). This can help improve accuracy when performing OCR on videos with white subtitles.\n\n- `similar_image_threshold`\n\n  The number of non-similar pixels there can be before the program considers 2 consecutive frames to be different. If a frame is not different from the previous frame, then the OCR result from the previous frame will be used (which can save a lot of time depending on how fast each OCR inference takes).\n\n- `similar_pixel_threshold`\n\n  Brightness threshold from 0-255 used with the `similar_image_threshold` to determine if 2 consecutive frames are different. If the difference between 2 pixels exceeds the threshold, then they will be considered non-similar.\n\n- `frames_to_skip`\n\n  The number of frames to skip before sampling a frame for OCR. Keep in mind the fps of the input video before increasing.\n\n## TODO\n- [ ] parallel processing\n- [ ] publish to pypi\n- [ ] commandline interface\n- [ ] user-friendly application for non-devs\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmaxxing%2Fvideocr-PaddleOCR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevmaxxing%2Fvideocr-PaddleOCR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmaxxing%2Fvideocr-PaddleOCR/lists"}