https://github.com/ub-mannheim/digitue-gt
Ground truth for digitized publications of UB Tübingen
https://github.com/ub-mannheim/digitue-gt
escriptorium fraktur ground-truth ocr
Last synced: 10 months ago
JSON representation
Ground truth for digitized publications of UB Tübingen
- Host: GitHub
- URL: https://github.com/ub-mannheim/digitue-gt
- Owner: UB-Mannheim
- License: cc0-1.0
- Created: 2022-02-23T18:25:07.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-01-30T21:34:36.000Z (over 1 year ago)
- Last Synced: 2025-04-13T05:52:35.820Z (about 1 year ago)
- Topics: escriptorium, fraktur, ground-truth, ocr
- Language: Python
- Homepage:
- Size: 3.56 MB
- Stars: 9
- Watchers: 6
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## DigiTue Ground Truth
This repository contains transcriptions for digitized books and journals of
the University Library of Tübingen (https://opendigi.ub.uni-tuebingen.de/digitue/).
The transcriptions were done with eScriptorium, a transcription platform
developed as part of the Scripta and RESILIENCE projects
(https://gitlab.com/scripta/escriptorium/).
Get the related images in JPEG format using this script:
```
for xml in $(find DigiRegio Theo Tue VD18 -name "*.xml"); do (cd $(dirname $xml); page=$(basename $xml .xml); base=$(echo $page|sed 's/_[0-9]*$//'); test -f $page.jpg || (echo $page; curl --silent -Lo $page.jpg https://opendigi.ub.uni-tuebingen.de/opendigi/image/$base/$page.jp2/full/full/0/default.jpg)); done
```