Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tsmdt/handwriting-gt
Ground Truth for HTR training
https://github.com/tsmdt/handwriting-gt
ground-truth htr manuscript-transcription manuscripts
Last synced: about 1 month ago
JSON representation
Ground Truth for HTR training
- Host: GitHub
- URL: https://github.com/tsmdt/handwriting-gt
- Owner: tsmdt
- Created: 2023-07-09T10:34:11.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-01T10:21:09.000Z (over 1 year ago)
- Last Synced: 2024-06-03T16:53:41.518Z (7 months ago)
- Topics: ground-truth, htr, manuscript-transcription, manuscripts
- Language: Shell
- Homepage:
- Size: 136 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Handwriting-GT ✍️
A collection of handwritten ground truth for HTR training.# About
This collection is based on various manuscript editions of the Digital Humanities in order to provide the edited texts (transcriptions) as ground truth for training HTR models.All ground truth is provided as **PAGE XML**. All transcriptions are based on the [OCR-D transcription guidelines Level 2](https://ocr-d.de/en/gt-guidelines/trans/trLevels.html).
See sections below for individual data set descriptions.
# Data Sets
## Faustedition
| Folder | Source | Pages | Lines | License |
|------|--------|-------|-------|---------|
|gsa_389889|[faustedition](https://github.com/faustedition/faust-xml)|8|230|CC BY-NC-SA 4.0|
|gsa_390028|[faustedition](https://github.com/faustedition/faust-xml)|94|2493|CC BY-NC-SA 4.0|
|gsa_390825|[faustedition](https://github.com/faustedition/faust-xml)|30|743|CC BY-NC-SA 4.0|
|gsa_391098|[faustedition](https://github.com/faustedition/faust-xml)|414|10178|CC BY-NC-SA 4.0|
|gsa_391511|[faustedition](https://github.com/faustedition/faust-xml)|6|168|CC BY-NC-SA 4.0|
|gsa_391347|[faustedition](https://github.com/faustedition/faust-xml)|35|955|CC BY-NC-SA 4.0|
|gsa_391247|[faustedition](https://github.com/faustedition/faust-xml)|68|1698|CC BY-NC-SA 4.0|
|||**671**|**16816**||Download images using the bash script `download_imgs.sh` in each data set folder.
**Source**: [Johann Wolfgang Goethe: Faust. Historisch-kritische Edition. Herausgegeben von Anne Bohnenkamp, Silke Henke und Fotis Jannidis unter Mitarbeit von Gerrit Brüning, Katrin Henzel, Christoph Leijser, Gregor Middell, Dietmar Pravida, Thorsten Vitt und Moritz Wissenbach.](https://faustedition.net/)
**Transcription guidlines:** The following normalisations were resolved with respect to [OCR-D transcription guidelines Level 2](https://ocr-d.de/en/gt-guidelines/trans/trLevels.html):
- Round brackets: `(` and `)` (edition) → `/:` and `:/` (ground truth)
- Hyphens: `-` (edition) → `=` (ground truth)## Theodor Fontane Notizbücher
| Folder | Source | Pages | Lines | License |
|------|--------|-------|-------|---------|
|A01|[Fontane Edition](https://fontane-nb.dariah.eu/xml.html?id=/xml/data/16q90.xml)|67|1046|CC BY-NC-ND 4.0|
|C13|[Fontane Edition](https://fontane-nb.dariah.eu/xml.html?id=/xml/data/16q90.xml)|53|879|CC BY-NC-ND 4.0|
|||**120**|**1925**||Download images using the bash script `download_imgs.sh` in each data set folder.
**Source:** [Theodor Fontane: Notizbücher. Digitale genetisch-kritische und kommentierte Edition. Hrsg. von Gabriele Radecke.](https://fontane-nb.dariah.eu/index.html)
**Transcription guidlines:** The following normalisations were resolved with respect to [OCR-D transcription guidelines Level 2](https://ocr-d.de/en/gt-guidelines/trans/trLevels.html):
- `Sammlung` (edition) → `Sam̄lung` (ground truth)
## August Wilhelm Schlegel Briefe
| Folder | Source | Pages | Lines | License |
|------|--------|-------|-------|---------|
|GT_PAGE|[Schlegel Briefe](https://august-wilhelm-schlegel.de/briefedigital/)|40|788|CC BY-NC-SA 3.0|
|||**40**|**788**||Download images using the bash script `download_imgs.sh` in each data set folder.
**Source:** [August Wilhelm Schlegel: Digitale Edition der Korrespondenz. Hg. von Jochen Strobel und Claudia Bamberg. Dresden, Marburg, Trier 2014–2020.](https://august-wilhelm-schlegel.de)
**Transcription guidlines:** The following normalisations were resolved with respect to [OCR-D transcription guidelines Level 2](https://ocr-d.de/en/gt-guidelines/trans/trLevels.html):
- round `s` (edition) → long `ſ` (ground truth)