https://github.com/shreeshrii/tessdata_ocrb

tesseract 4 traineddata for MRZ using OCR-B fonts
https://github.com/shreeshrii/tessdata_ocrb

Last synced: 2 months ago
JSON representation

tesseract 4 traineddata for MRZ using OCR-B fonts

Host: GitHub
URL: https://github.com/shreeshrii/tessdata_ocrb
Owner: Shreeshrii
Created: 2018-09-05T15:25:06.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-07-18T10:18:24.000Z (almost 6 years ago)
Last Synced: 2025-03-18T11:11:27.014Z (3 months ago)
Language: Shell
Homepage:
Size: 58.9 MB
Stars: 78
Watchers: 7
Forks: 16
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# tessdata_ocrb
traineddata for MRZ using OCR-B fonts

This is a `proof of concept` traineddata
in response to [this post in tesseract-ocr forum](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/zi79vNsiSkg/UT3JwsNeBQAJ)

Feel free to clone the repo and rerun training with your own custom training_text and fonts.

## Update: April 15, 2019

Retrained to add missing `X`
using 3 OCRB fonts and a [larger training text](eng.MRZ.training_text) compared to previous version.
Both float/best and integer/fast versions are provided.

### Trained by plus finetuning tessdata_best/eng.traineddata

(800 iterations - char train=0.273%, word train=3.47%, word train=0%)

* [Download best version](https://github.com/Shreeshrii/tessdata_ocrb/raw/master/ocrb.traineddata) - 10.8 MB. use with **`-l ocrb`**.
* [Download fast version](https://github.com/Shreeshrii/tessdata_ocrb/raw/master/ocrb_int.traineddata) - 1.38 MB. use with **`-l ocrb_int`**.

### Evaluation

`ocrb_eval` folder has synthetic MRZ samples in the same 3 fonts for evaluation. The box/tiff pairs are also saved.

* lstmeval of the files with `tessdata_best/eng` gives **Eval Char error rate=44.954738, Word error rate=89.583333**.
* lstmeval of the files with `ocrb` and `ocrb_int` gives **Eval Char error rate=0, Word error rate=0**.

### Test

```
tesseract ./ocrb_eval/eng.OCR-B_10_BT.exp0.tif - -l ocrb --tessdata-dir ./

Failed to load any lstm-specific dictionaries for lang ocrb!!
Page 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shreeshrii/tessdata_ocrb

Awesome Lists containing this project

README