{"id":32647548,"url":"https://github.com/jacobmarks/pytesseract-ocr-plugin","last_synced_at":"2025-10-31T05:55:13.213Z","repository":{"id":195606906,"uuid":"692592697","full_name":"jacobmarks/pytesseract-ocr-plugin","owner":"jacobmarks","description":"Run optical character recognition with PyTesseract from the FiftyOne App!","archived":false,"fork":false,"pushed_at":"2024-04-05T00:00:17.000Z","size":24,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-16T07:21:08.007Z","etag":null,"topics":["computer-vision","document-understanding","fiftyone","nlp","ocr","plugin","python","tesseract","tesseract-ocr"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacobmarks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-09-17T00:48:43.000Z","updated_at":"2024-02-26T10:46:10.000Z","dependencies_parsed_at":"2023-09-18T23:47:48.973Z","dependency_job_id":"7f25fc4d-f0ca-4c10-942d-3a52e33f10d2","html_url":"https://github.com/jacobmarks/pytesseract-ocr-plugin","commit_stats":{"total_commits":18,"total_committers":1,"mean_commits":18.0,"dds":0.0,"last_synced_commit":"4a159e5c8012a1a645f2dab4d4089b0bf413af45"},"previous_names":["jacobmarks/pytesseract-ocr-plugin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jacobmarks/pytesseract-ocr-plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fpytesseract-ocr-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fpytesseract-ocr-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fpytesseract-ocr-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fpytesseract-ocr-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacobmarks","download_url":"https://codeload.github.com/jacobmarks/pytesseract-ocr-plugin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fpytesseract-ocr-plugin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281937758,"owners_count":26586774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","document-understanding","fiftyone","nlp","ocr","plugin","python","tesseract","tesseract-ocr"],"created_at":"2025-10-31T05:55:09.060Z","updated_at":"2025-10-31T05:55:13.198Z","avatar_url":"https://github.com/jacobmarks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## PyTesseract Optical Character Recognition Plugin\n\n\u003cimg width=\"1727\" alt=\"funsd_predictions\" src=\"https://github.com/jacobmarks/pytesseract-ocr-plugin/assets/12500356/1bda669c-f2f8-456f-912f-c3f6a6a0fadd\"\u003e\n\n### Updates\n\n- **2023-10-19**: Added support for customizing prediction fields, and embedded field for OCR text.\n\nThis plugin is a Python plugin that allows you to perform optical character\nrecognition on documents using PyTesseract — the Python bindings for the\nTesseract OCR engine!\n\n## Watch On Youtube\n[![Video Thumbnail](https://img.youtube.com/vi/jnNPGrM6Wr4/0.jpg)](https://www.youtube.com/watch?v=jnNPGrM6Wr4\u0026list=PLuREAXoPgT0RZrUaT0UpX_HzwKkoB-S9j\u0026index=6)\n\n## Installation\n\n```shell\nfiftyone plugins download https://github.com/jacobmarks/pytesseract-ocr-plugin\n```\n\nYou will also need to install the plugin's requirements:\n\n```shell\npip install -r requirements.txt\n```\n\n## Operators\n\n### `run_ocr_engine`\n\n- Runs the PyTesseract OCR engine on the documents in the dataset, converts the\n  results to FiftyOne labels, and stores individual word predictions as well\n  as block-level predictions on the dataset.\n\n## Usage\n\nYou can access the operator via the App's action menu, or by pressing the \"`\"\nkey on your keyboard and selecting the operator from the dropdown menu.\n\nIf you have a view loaded and/or samples selected, the operator will give you\nthe option to run the OCR engine on only those samples or on the entire dataset.\n\nYou can either choose to run the operator in the foreground, or to delegate the\nexecution of the operator to a background job.\n\n![ocr_queue_job](https://github.com/jacobmarks/pytesseract-ocr-plugin/assets/12500356/2ab239c1-8d37-44a7-b8d6-93285afe7f08)\n\n💡 Once you've generated OCR predictions, you can search through them using the [Keyword Search plugin](https://github.com/jacobmarks/keyword-search-plugin)!\n\n\u003cimg width=\"1727\" alt=\"funsd_block_predictions\" src=\"https://github.com/jacobmarks/pytesseract-ocr-plugin/assets/12500356/a7b6e81f-7a1e-4663-8ae9-c32c3266015d\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fpytesseract-ocr-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacobmarks%2Fpytesseract-ocr-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fpytesseract-ocr-plugin/lists"}