https://github.com/jitesoft/docker-tesseract-ocr
Docker image containing Tesseract OCR.
https://github.com/jitesoft/docker-tesseract-ocr
docker hacktoberfest image jitesoft ocr tesseract-ocr ubuntu
Last synced: 27 days ago
JSON representation
Docker image containing Tesseract OCR.
- Host: GitHub
- URL: https://github.com/jitesoft/docker-tesseract-ocr
- Owner: jitesoft
- License: mit
- Created: 2017-04-12T15:37:20.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-11-17T14:33:10.000Z (6 months ago)
- Last Synced: 2025-03-28T23:04:37.405Z (about 1 month ago)
- Topics: docker, hacktoberfest, image, jitesoft, ocr, tesseract-ocr, ubuntu
- Language: Dockerfile
- Homepage: https://github.com/tesseract-ocr/tesseract
- Size: 139 KB
- Stars: 43
- Watchers: 3
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Tesseract OCR.
[](https://hub.docker.com/r/jitesoft/tesseract-ocr)
[](https://opencollective.com/jitesoft-open-source)[Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - Ubuntu and Alpine linux images.
Tesseract and Leptonica are both built from source for each platform and distro,
supported platforms are amd64 (x86_64) arm64 (aarch64).## Tags
Versions indicate OS version (or the name in case of alpine), the images with `4-` prefix uses
tesseract version 4 while images without the prefix uses version 5.All versions use the same training data.
Images can be found at:
* [Docker hub](https://hub.docker.com/r/jitesoft/tesseract-ocr): `jitesoft/tesseract-ocr`
* [GitLab](https://gitlab.com/jitesoft/dockerfiles/tesseract): `registry.gitlab.com/jitesoft/dockerfiles/tesseract`
* [GitHub](https://github.com/orgs/jitesoft/packages/container/package/tesseract): `ghcr.io/jitesoft/tesseract`
* [Quay](https://quay.io/jitesoft/tesseract): `quay.io/jitesoft/tesseract`## Dockerfile
Dockerfile can be found at [GitLab](https://gitlab.com/jitesoft/dockerfiles/tesseract) or [GitHub](https://github.com/jitesoft/docker-tesseract-ocr)
## Training and languages
The default image have the english training data installed from start. The training data used is the "fast" data. It parses quicker but not at best quality.
It's possible to train another language by invoking the `train-lang` script, followed by the language code (ISO 639-2 `eng`, `swe` etc). If you wish to use `fast` or `best`, add that as an optional parameter after the language code (`train-lang eng --fast`) else use the standard without any extra arg.
The above could easily be done in a derived image:```dockerfile
FROM jitesoft/tesseract-ocr
RUN train-lang bul --fast
```The languages are downloaded from the official tesseract tessdata repositories.
For a full list of supported languages check the following links:
https://github.com/tesseract-ocr/tessdata
https://github.com/tesseract-ocr/tessdata_best
https://github.com/tesseract-ocr/tessdata_fastIt is also possible to just copy a traineddata file to the `/usr/local/share/tessdata` (`/usr/share/tessdata` on alpine) directory of the container.
## Example execution
```bash
docker pull jitesoft/tesseract-ocr
docker run -v /path/to/image/img.jpg:/tmp/img.jpg jitesoft/tesseract-ocr /tmp/img.jpg stdout
```Use high DPI image for best result. Higher DPI does increase the time to run though.
### Image labels
This image follows the [Jitesoft image label specification 1.0.0](https://gitlab.com/snippets/1866155).
## Licenses
The images and scripts in the repository are released under the [MIT license](https://gitlab.com/jitesoft/dockerfiles/tesseract/blob/master/LICENSE).
Tesseract is released under the [Apache License v2](https://github.com/tesseract-ocr/tesseract/blob/master/LICENSE)Notice: The tesseract source have been modified with a patch (`alpine/tess.patch`) to allow for compilation in alpine linux.
### Sponsors
Jitesoft images are built via GitLab CI on runners hosted by the following wonderful organisations:
_The companies above are not affiliated with Jitesoft or any Jitesoft Projects directly._
---
Sponsoring is vital for the further development and maintaining of open source.
Questions and sponsoring queries can be made by email.
If you wish to sponsor our projects, reach out to the email above or visit any of the following sites:[Open Collective](https://opencollective.com/jitesoft-open-source)
[GitHub Sponsors](https://github.com/sponsors/jitesoft)
[Patreon](https://www.patreon.com/jitesoft)