https://github.com/ryanfb/latinocr-lattestfodder
Latin page scans and ground truth text for testing OCR accuracy.
https://github.com/ryanfb/latinocr-lattestfodder
Last synced: 4 months ago
JSON representation
Latin page scans and ground truth text for testing OCR accuracy.
- Host: GitHub
- URL: https://github.com/ryanfb/latinocr-lattestfodder
- Owner: ryanfb
- Created: 2014-12-17T19:02:11.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-08-26T17:38:03.000Z (almost 11 years ago)
- Last Synced: 2023-04-11T17:41:04.167Z (about 3 years ago)
- Size: 2.64 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
A collection of page scans and corresponding text files of Latin.
These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the `tessaccsummary` script.
The naming of the files is quite straightforward:
* `.png` - the page scan
* `.txt` - the correct UTF-8 encoded text corresponding to the page scan
* `.src` - a text file describing the provenance of the page scan