{"id":23542361,"url":"https://github.com/malexandersalazar/tools-python-image-to-text","last_synced_at":"2026-04-19T04:34:38.075Z","repository":{"id":144154010,"uuid":"582216026","full_name":"malexandersalazar/tools-python-image-to-text","owner":"malexandersalazar","description":" A Python tool based on OpenCV, Tesseract OCR and spaCy for reading and recognize the text in an image from Windows.","archived":false,"fork":false,"pushed_at":"2024-01-12T01:05:58.000Z","size":1245,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-15T06:11:33.653Z","etag":null,"topics":["opencv","python","spacy-nlp","tesseract-ocr"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/malexandersalazar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-12-26T05:35:24.000Z","updated_at":"2022-12-30T01:57:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"77a30a16-ac57-47c5-9688-19c5ada2282d","html_url":"https://github.com/malexandersalazar/tools-python-image-to-text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/malexandersalazar/tools-python-image-to-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malexandersalazar%2Ftools-python-image-to-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malexandersalazar%2Ftools-python-image-to-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malexandersalazar%2Ftools-python-image-to-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malexandersalazar%2Ftools-python-image-to-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/malexandersalazar","download_url":"https://codeload.github.com/malexandersalazar/tools-python-image-to-text/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malexandersalazar%2Ftools-python-image-to-text/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31995148,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["opencv","python","spacy-nlp","tesseract-ocr"],"created_at":"2024-12-26T06:11:44.578Z","updated_at":"2026-04-19T04:34:38.037Z","avatar_url":"https://github.com/malexandersalazar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Image to text tool\n\n![alt text](/img/variants.PNG \"Image to text tool\")\n\n A Python tool based on OpenCV, Tesseract OCR and spaCy for reading and recognize the text in an image from Windows.\n\n This script processes the image generating 30 variants using OpenCV adaptiveThreshold to then measure with spaCy the relevance and number of words obtained by Tesseract OCR and choose the best reading.\n\n## Installation\n\n### Tesseract OCR\n\nThe latest installers for Windows can be downloaded [here](https://github.com/UB-Mannheim/tesseract/wiki).\n\nFor more information about languages supported in different versions of Tesseract visit the following [link](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).\n\n### spaCy\n\nTo enable spaCy we must download the pre-trained models as indicated on its official [site](https://spacy.io/models).\n\n\u003e  pip install -U spacy\n\nInstalling English:\n\n\u003e  python -m spacy download en_core_web_md\n\nInstalling Spanish:\n\n\u003e  python -m spacy download es_core_news_md\n\n### Image to text tool\n\nJust copy the `itt.py` script located in the dist folder and update the Tesseract path if necessary.\n\n```\nimport pytesseract as pyt\n\npyt.pytesseract.tesseract_cmd = \"C:/Program Files/Tesseract-OCR/tesseract.exe\"\n```\n\n## Getting Started\n\nTo use the script we only have to indicate the path of the image that we want to read.\n\n\u003e python itt.py W:\\malexandersalazar\\tools-python-image-to-text\\raw\n\nYou can also set the language as a parameter. For now it only supports English (\"en\") and Spanish (\"es\").\n\n\u003e python itt.py W:\\malexandersalazar\\tools-python-image-to-text\\raw -l=en\n\nIf we want to support more languages we must install the necessary spaCy models and make sure that Tesseract OCR can support them as well.\n\n## Dependencies\n\n* python (== 3.11.3)\n* pytesseract (== 0.3.10)\n* cv2 (== 4.7.0)\n* spacy (== 3.6.0)\n* pandas (== 2.0.2)\n\n## License\n\nThis project is licenced under the [MIT License][1].\n\n[1]: https://opensource.org/licenses/mit-license.html \"The MIT License | Open Source Initiative\"","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalexandersalazar%2Ftools-python-image-to-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmalexandersalazar%2Ftools-python-image-to-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalexandersalazar%2Ftools-python-image-to-text/lists"}