https://github.com/ryanfb/ancientgreekocr-grctraining
'grctraining' repository from http://ancientgreekocr.org/. Rules and tools to deterministically generate all prerequisites for the final training process.
https://github.com/ryanfb/ancientgreekocr-grctraining
Last synced: 4 months ago
JSON representation
'grctraining' repository from http://ancientgreekocr.org/. Rules and tools to deterministically generate all prerequisites for the final training process.
- Host: GitHub
- URL: https://github.com/ryanfb/ancientgreekocr-grctraining
- Owner: ryanfb
- License: other
- Created: 2014-11-19T16:39:15.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-01-31T20:29:48.000Z (over 7 years ago)
- Last Synced: 2025-06-18T08:07:54.777Z (12 months ago)
- Language: Makefile
- Size: 7.43 MB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
README
Source files for some automatically generated parts of the Ancient
Greek (grc) training for Tesseract OCR. Specifically, this contains
the Makefile and its prerequisites to build the following files
needed for the grc training:
- training_text.txt
- grc.word.txt
- grc.freq.txt
- grc.unicharambigs
- grc.wordlist
# Dependencies
The tool tlgu is required. Download and install it from:
http://tlgu.carmen.gr/
On a Mac with homebrew, install coreutils and gnu-sed (needed for
gsed, gmktemp, gshuf).
# To build the training parts
Note that the build starts by downloading and unpacking a text
corpus from which to generate the wordlists.
Make all of the parts with the command:
make