An open API service indexing awesome lists of open source software.

https://github.com/ryanfb/latinocr-lattestfodder

Latin page scans and ground truth text for testing OCR accuracy.
https://github.com/ryanfb/latinocr-lattestfodder

Last synced: 4 months ago
JSON representation

Latin page scans and ground truth text for testing OCR accuracy.

Awesome Lists containing this project

README

          

A collection of page scans and corresponding text files of Latin.

These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the `tessaccsummary` script.

The naming of the files is quite straightforward:

* `.png` - the page scan
* `.txt` - the correct UTF-8 encoded text corresponding to the page scan
* `.src` - a text file describing the provenance of the page scan