https://github.com/lgdd/tess4j-rest
OCR REST API using Tesseract OCR Engine (via Tess4J)
https://github.com/lgdd/tess4j-rest
dxp liferay ocr tess4j tesseract
Last synced: 6 months ago
JSON representation
OCR REST API using Tesseract OCR Engine (via Tess4J)
- Host: GitHub
- URL: https://github.com/lgdd/tess4j-rest
- Owner: lgdd
- License: mit
- Created: 2023-04-29T00:20:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-28T09:12:33.000Z (11 months ago)
- Last Synced: 2025-03-29T11:51:46.094Z (6 months ago)
- Topics: dxp, liferay, ocr, tess4j, tesseract
- Language: Java
- Homepage:
- Size: 12 MB
- Stars: 11
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tess4J-REST
OCR REST API using [Tesseract OCR Engine](https://github.com/tesseract-ocr/tesseract) (via [Tess4J](https://github.com/nguyenq/tess4j))
## Docker Image
Docker image available: https://hub.docker.com/r/lgdd/tess4j-rest
Try and run:
```sh
docker run -it --rm -p 8000:8000 lgdd/tess4j-rest
```## Usage
Run `docker-compose up --build` (also available as `make dev`).
> **Note**: You can also run `./mvnw quarkus:dev` (or `quarkus dev`).
> But for this method to work, you would need the environment variable _TESSDATA_PREFIX_ to be set to the absolute path of this project resource: `src/test/resources/test-tessdata/eng.traineddata`You can navigate to `http://localhost:8000/q/swagger-ui` and test uploading an image.
Or you can quickly test the endpoint with `curl` (from this project root):
```sh
curl -X 'POST' \
'http://localhost:8000/detect-text' \
-H 'accept: text/plain' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@src/test/resources/test-data/eurotext.png'
```## Environment variables
```Dockerfile
# Parent folder path for tesseract data files
ENV TESSDATA_PREFIX="/opt/tesseract/tessdata"# Suffix for the data repository to use.
# Either "best", "fast" or "".
# See https://github.com/tesseract-ocr/tessdata#readme
ENV TESSERACT_DATA_SUFFIX="best"# Version of the data repository.
# See https://github.com/tesseract-ocr/tessdata#readme
ENV TESSERACT_DATA_VERSION="4.1.0"# Additional languages to download on the application startup.
# For the possible values, see https://github.com/tesseract-ocr/tessdata
ENV TESSERACT_DATA_LANGS="fra,spa,deu"
```## Health probes
Readiness: `/q/healh/ready`
Liveness: `/q/healh/live`
Application is ready and live when all additional languages has been downloaded.
## License
[MIT](LICENSE)