https://github.com/itext/itext-pdfocr-java

pdfOCR is an iText add-on to recognize and extract text in scanned documents and images. It can also convert them into fully ISO-compliant PDF or PDF/A-3u files that are accessible, searchable, and suitable for archiving
https://github.com/itext/itext-pdfocr-java

archival character data diacritic extractable glyphs hindi image iso-compliant ligatures mandarin ocr optical pdf portuguese recognition scan searchable spanish tesseract

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/itext/itext-pdfocr-java
Owner: itext
License: other
Created: 2020-06-16T10:16:55.000Z (about 6 years ago)
Default Branch: develop
Last Pushed: 2025-09-22T16:09:39.000Z (10 months ago)
Last Synced: 2025-09-22T18:29:24.844Z (10 months ago)
Topics: archival, character, data, diacritic, extractable, glyphs, hindi, image, iso-compliant, ligatures, mandarin, ocr, optical, pdf, portuguese, recognition, scan, searchable, spanish, tesseract
Language: Java
Homepage: https://itextpdf.com/en/products/itext-7/pdfocr
Size: 536 MB
Stars: 36
Watchers: 9
Forks: 9
Open Issues: 6
Metadata Files:
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Security: SECURITY.md

Awesome Lists containing this project

awesome-java - PdfOCR

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/itext/itext-pdfocr-java

Awesome Lists containing this project