https://github.com/ryanfb/latinocr-lattraining

Rules and tools to deterministically generate all prerequisites for the final training process. Adapted from https://github.com/ryanfb/ancientgreekocr-grctraining/
https://github.com/ryanfb/latinocr-lattraining

Last synced: 4 months ago
JSON representation

Rules and tools to deterministically generate all prerequisites for the final training process. Adapted from https://github.com/ryanfb/ancientgreekocr-grctraining/

Host: GitHub
URL: https://github.com/ryanfb/latinocr-lattraining
Owner: ryanfb
License: other
Created: 2014-11-21T20:17:52.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2016-01-13T15:57:53.000Z (over 10 years ago)
Last Synced: 2023-04-11T17:41:04.159Z (about 3 years ago)
Language: Makefile
Size: 15.3 MB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README
- License: LICENSE

Awesome Lists containing this project

README

Source files for some automatically generated parts of the
Latin (lat) training for Tesseract OCR. Specifically, this contains
the Makefile and its prerequisites to build the following files
needed for the lat training:

- training_text.txt
- lat.word.txt
- lat.freq.txt
- lat.unicharambigs
- lat.wordlist

# Dependencies

On a Mac with homebrew, install coreutils and gnu-sed (needed for
gsed, gmktemp, gshuf).

# To build the training parts

Note that the build starts by downloading and unpacking a text
corpus from which to generate the wordlists.

Make all of the parts with the command:
make

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ryanfb/latinocr-lattraining

Awesome Lists containing this project

README