Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ONB-RD/hOCRTools
Utilities to process and handle hOCR
https://github.com/ONB-RD/hOCRTools
Last synced: 2 months ago
JSON representation
Utilities to process and handle hOCR
- Host: GitHub
- URL: https://github.com/ONB-RD/hOCRTools
- Owner: ONB-RD
- License: apache-2.0
- Created: 2015-06-17T12:35:02.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-07-31T06:21:33.000Z (over 6 years ago)
- Last Synced: 2024-07-31T21:54:16.013Z (5 months ago)
- Language: XSLT
- Size: 9.77 KB
- Stars: 6
- Watchers: 7
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
- awesome-ocr - hOCRTools - hOCR to ALTO conversion XSLT (Software / OCR file formats)
README
This is a space to collect utilities to work with hOCR as specified in
https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview?pli=1#Right now there is a simple transformation to ALTO which guesses
s and sWhen running from the command line saxon, please configure a system
catalog.xml so that it does not request the dtd for every
transformation from the w3c site. When running from one of the IDEs,
this should generally already been catered for.The transformation hOCR2ALTO lives in xsl/hOCR2ALTO.xsl
a sample call would be:
$ saxon -s: xsl/hOCR2ALTO.xsl