https://github.com/wollmers/ocr-gt-austriannewspapers-scripts
Scripts for AustrianNewspapers
https://github.com/wollmers/ocr-gt-austriannewspapers-scripts
ground-truth ocr ocr-training
Last synced: 4 months ago
JSON representation
Scripts for AustrianNewspapers
- Host: GitHub
- URL: https://github.com/wollmers/ocr-gt-austriannewspapers-scripts
- Owner: wollmers
- Created: 2020-02-28T08:32:45.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-08-06T04:03:20.000Z (almost 6 years ago)
- Last Synced: 2025-07-11T22:03:14.986Z (11 months ago)
- Topics: ground-truth, ocr, ocr-training
- Language: HTML
- Homepage:
- Size: 42.9 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scripts for AustrianNewspapers
## Purpose
Improve the transcriptions in the AustrianNewspapers data set.
## Location of the data set
Unpacked original data (PAGE XML, TIFF) from ÖNB with some transcription texts enhanced and fixed
- [TrainingSet_ONB_Newseye_GT_M1+](https://github.com/UB-Mannheim/AustrianNewspapers/tree/master/TrainingSet_ONB_Newseye_GT_M1%2B)
- [ValidationSet_ONB_Newseye_GT_M1+](https://github.com/UB-Mannheim/AustrianNewspapers/tree/master/ValidationSet_ONB_Newseye_GT_M1%2B)
Extracted line pairs (image and ground truth text)
- [gt](https://github.com/UB-Mannheim/AustrianNewspapers/tree/master/gt)