https://github.com/shreeshrii/tessdata_ocrb
tesseract 4 traineddata for MRZ using OCR-B fonts
https://github.com/shreeshrii/tessdata_ocrb
Last synced: 2 months ago
JSON representation
tesseract 4 traineddata for MRZ using OCR-B fonts
- Host: GitHub
- URL: https://github.com/shreeshrii/tessdata_ocrb
- Owner: Shreeshrii
- Created: 2018-09-05T15:25:06.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-18T10:18:24.000Z (almost 6 years ago)
- Last Synced: 2025-03-18T11:11:27.014Z (3 months ago)
- Language: Shell
- Homepage:
- Size: 58.9 MB
- Stars: 78
- Watchers: 7
- Forks: 16
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# tessdata_ocrb
traineddata for MRZ using OCR-B fontsThis is a `proof of concept` traineddata
in response to [this post in tesseract-ocr forum](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/zi79vNsiSkg/UT3JwsNeBQAJ)Feel free to clone the repo and rerun training with your own custom training_text and fonts.
## Update: April 15, 2019
Retrained to add missing `X`
using 3 OCRB fonts and a [larger training text](eng.MRZ.training_text) compared to previous version.
Both float/best and integer/fast versions are provided.### Trained by plus finetuning tessdata_best/eng.traineddata
(800 iterations - char train=0.273%, word train=3.47%, word train=0%)
* [Download best version](https://github.com/Shreeshrii/tessdata_ocrb/raw/master/ocrb.traineddata) - 10.8 MB. use with **`-l ocrb`**.
* [Download fast version](https://github.com/Shreeshrii/tessdata_ocrb/raw/master/ocrb_int.traineddata) - 1.38 MB. use with **`-l ocrb_int`**.### Evaluation
`ocrb_eval` folder has synthetic MRZ samples in the same 3 fonts for evaluation. The box/tiff pairs are also saved.
* lstmeval of the files with `tessdata_best/eng` gives **Eval Char error rate=44.954738, Word error rate=89.583333**.
* lstmeval of the files with `ocrb` and `ocrb_int` gives **Eval Char error rate=0, Word error rate=0**.### Test
```
tesseract ./ocrb_eval/eng.OCR-B_10_BT.exp0.tif - -l ocrb --tessdata-dir ./Failed to load any lstm-specific dictionaries for lang ocrb!!
Page 1P