{"id":19196978,"url":"https://github.com/jitesoft/docker-tesseract-ocr","last_synced_at":"2025-04-15T17:24:41.762Z","repository":{"id":88326147,"uuid":"88068594","full_name":"jitesoft/docker-tesseract-ocr","owner":"jitesoft","description":"Docker image containing Tesseract OCR.","archived":false,"fork":false,"pushed_at":"2024-11-17T14:33:10.000Z","size":142,"stargazers_count":43,"open_issues_count":0,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T23:04:37.405Z","etag":null,"topics":["docker","hacktoberfest","image","jitesoft","ocr","tesseract-ocr","ubuntu"],"latest_commit_sha":null,"homepage":"https://github.com/tesseract-ocr/tesseract","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jitesoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null},"funding":{"patreon":"jitesoft","open_collective":"jitesoft-open-source","custom":["https://sponsus.org/u/jitesoft"]}},"created_at":"2017-04-12T15:37:20.000Z","updated_at":"2025-02-05T12:51:17.000Z","dependencies_parsed_at":"2024-03-13T20:29:47.111Z","dependency_job_id":"d1d3f317-aa58-43cf-95be-2354e8b7ecc9","html_url":"https://github.com/jitesoft/docker-tesseract-ocr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jitesoft%2Fdocker-tesseract-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jitesoft%2Fdocker-tesseract-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jitesoft%2Fdocker-tesseract-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jitesoft%2Fdocker-tesseract-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jitesoft","download_url":"https://codeload.github.com/jitesoft/docker-tesseract-ocr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249117200,"owners_count":21215349,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","hacktoberfest","image","jitesoft","ocr","tesseract-ocr","ubuntu"],"created_at":"2024-11-09T12:15:02.389Z","updated_at":"2025-04-15T17:24:41.739Z","avatar_url":"https://github.com/jitesoft.png","language":"Dockerfile","funding_links":["https://patreon.com/jitesoft","https://opencollective.com/jitesoft-open-source","https://sponsus.org/u/jitesoft","https://github.com/sponsors/jitesoft","https://www.patreon.com/jitesoft"],"categories":[],"sub_categories":[],"readme":"# Tesseract OCR.\r\n\r\n[![Docker Pulls](https://img.shields.io/docker/pulls/jitesoft/tesseract-ocr.svg)](https://hub.docker.com/r/jitesoft/tesseract-ocr)\r\n[![Back project](https://img.shields.io/badge/Open%20Collective-Tip%20the%20devs!-blue.svg)](https://opencollective.com/jitesoft-open-source)\r\n\r\n[Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - Ubuntu and Alpine linux images.  \r\n\r\nTesseract and Leptonica are both built from source for each platform and distro, \r\nsupported platforms are amd64 (x86_64) arm64 (aarch64).\r\n\r\n## Tags\r\n\r\nVersions indicate OS version (or the name in case of alpine), the images with `4-` prefix uses\r\ntesseract version 4 while images without the prefix uses version 5.  \r\n\r\nAll versions use the same training data.\r\n\r\nImages can be found at:\r\n\r\n* [Docker hub](https://hub.docker.com/r/jitesoft/tesseract-ocr): `jitesoft/tesseract-ocr`  \r\n* [GitLab](https://gitlab.com/jitesoft/dockerfiles/tesseract): `registry.gitlab.com/jitesoft/dockerfiles/tesseract`\r\n* [GitHub](https://github.com/orgs/jitesoft/packages/container/package/tesseract): `ghcr.io/jitesoft/tesseract`\r\n* [Quay](https://quay.io/jitesoft/tesseract): `quay.io/jitesoft/tesseract`\r\n\r\n## Dockerfile\r\n\r\nDockerfile can be found at [GitLab](https://gitlab.com/jitesoft/dockerfiles/tesseract) or [GitHub](https://github.com/jitesoft/docker-tesseract-ocr)\r\n\r\n## Training and languages\r\n\r\nThe default image have the english training data installed from start. The training data used is the \"fast\" data. It parses quicker but not at best quality.  \r\nIt's possible to train another language by invoking the `train-lang` script, followed by the language code (ISO 639-2 `eng`, `swe` etc). If you wish to use `fast` or `best`, add that as an optional parameter after the language code (`train-lang eng --fast`) else use the standard without any extra arg.  \r\nThe above could easily be done in a derived image:\r\n\r\n```dockerfile \r\nFROM jitesoft/tesseract-ocr\r\nRUN train-lang bul --fast\r\n```\r\n\r\nThe languages are downloaded from the official tesseract tessdata repositories.\r\n\r\nFor a full list of supported languages check the following links:\r\n\r\nhttps://github.com/tesseract-ocr/tessdata  \r\nhttps://github.com/tesseract-ocr/tessdata_best  \r\nhttps://github.com/tesseract-ocr/tessdata_fast  \r\n\r\nIt is also possible to just copy a traineddata file to the `/usr/local/share/tessdata` (`/usr/share/tessdata` on alpine) directory of the container.\r\n\r\n## Example execution\r\n\r\n```bash\r\ndocker pull jitesoft/tesseract-ocr\r\ndocker run -v /path/to/image/img.jpg:/tmp/img.jpg jitesoft/tesseract-ocr /tmp/img.jpg stdout\r\n```\r\n\r\nUse high DPI image for best result. Higher DPI does increase the time to run though.\r\n\r\n### Image labels\r\n\r\nThis image follows the [Jitesoft image label specification 1.0.0](https://gitlab.com/snippets/1866155).\r\n\r\n## Licenses\r\n\r\nThe images and scripts in the repository are released under the [MIT license](https://gitlab.com/jitesoft/dockerfiles/tesseract/blob/master/LICENSE).  \r\nTesseract is released under the [Apache License v2](https://github.com/tesseract-ocr/tesseract/blob/master/LICENSE)  \r\n\r\nNotice: The tesseract source have been modified with a patch (`alpine/tess.patch`) to allow for compilation in alpine linux.\r\n\r\n\r\n### Sponsors\r\n\r\nJitesoft images are built via GitLab CI on runners hosted by the following wonderful organisations:\r\n\r\n\u003ca href=\"https://osuosl.org/\" target=\"_blank\" title=\"Oregon State University - Open Source Lab\"\u003e\r\n    \u003cimg src=\"https://jitesoft.com/images/oslx128.webp\" alt=\"Oregon State University - Open Source Lab\"\u003e\r\n\u003c/a\u003e\r\n\r\n_The companies above are not affiliated with Jitesoft or any Jitesoft Projects directly._\r\n\r\n---\r\n\r\nSponsoring is vital for the further development and maintaining of open source.  \r\nQuestions and sponsoring queries can be made by \u003ca href=\"mailto:sponsor@jitesoft.com\"\u003eemail\u003c/a\u003e.  \r\nIf you wish to sponsor our projects, reach out to the email above or visit any of the following sites:\r\n\r\n[Open Collective](https://opencollective.com/jitesoft-open-source)  \r\n[GitHub Sponsors](https://github.com/sponsors/jitesoft)  \r\n[Patreon](https://www.patreon.com/jitesoft)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjitesoft%2Fdocker-tesseract-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjitesoft%2Fdocker-tesseract-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjitesoft%2Fdocker-tesseract-ocr/lists"}