https://github.com/shreeshrii/tessdata_jav_java
Tesseract 4.0.0 training data for Javanese Script (Aksara Jawa)
https://github.com/shreeshrii/tessdata_jav_java
Last synced: 2 months ago
JSON representation
Tesseract 4.0.0 training data for Javanese Script (Aksara Jawa)
- Host: GitHub
- URL: https://github.com/shreeshrii/tessdata_jav_java
- Owner: Shreeshrii
- License: apache-2.0
- Created: 2018-07-19T18:01:08.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-03-23T04:04:35.000Z (about 5 years ago)
- Last Synced: 2025-01-26T05:11:26.361Z (4 months ago)
- Language: Shell
- Size: 210 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tessdata_jav_java
Tesseract 4.0.0 training data for Javanese Script (Aksara Jawa)Built in response to [this issue](https://github.com/tesseract-ocr/langdata/issues/126)
## Traineddata files
* [jav1.traineddata](https://github.com/Shreeshrii/tessdata_jav_java/blob/master/tessdata_best/jav1.traineddata)
* [jav2.traineddata](https://github.com/Shreeshrii/tessdata_jav_java/blob/master/tessdata_best/jav2.traineddata)## To run training for Javanese script
* Clone this repo
* To further continue training with existing data, run `./plustrain.sh`
* To customize for your own training, update training text in langdata and fonts lists in `makedata.sh`
* Training text needs to be in utf-8 encoding.
* Unicode fonts supporting the Javanese code-range need to be used.
* Source code changes will be needed in tesseract, additions will be similar to Khmer/Mynamar or Thai.## Custom bash scripts - run in following order
./makeeval.sh
./makedata.sh
./mergedata.sh
./plustrain.sh