https://github.com/ocr-d/gt-labelling
https://github.com/ocr-d/gt-labelling
ground-truth metadata mets mets-xml ocr-d
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ocr-d/gt-labelling
- Owner: OCR-D
- Created: 2018-12-05T13:56:04.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-02-21T15:44:37.000Z (over 2 years ago)
- Last Synced: 2025-02-03T01:34:12.523Z (over 1 year ago)
- Topics: ground-truth, metadata, mets, mets-xml, ocr-d
- Language: XSLT
- Homepage:
- Size: 143 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# gt-labelling : semantic-labelling OCR ground truth data
> Semantic-labelling OCR ground truth data and store these data with METS metadata set.
## Use the XSD Schema in METS
### Namespace and schema location
Add the namespace `http://www.ocr-d.de/GT/`. We recommend `gt` as namespace prefix:
```xml
xmlns:gt="http://www.ocr-d.de/GT/"
```
Set XSD schema location `OCR-D_GT_schema.xsd`:
```xml
xsi:schemaLocation="file:///OCR-D_GT_schema.xsd" or URL...
```
### METS Example
See [`mets_example.xml`](./example/mets_example.xml).
## Developer info
The ontology is defined in
[`DefaultLabelTypes_3.xml`](./test/DefaultLabelTypes_3.xml) taken from
https://github.com/PRImA-Research-Lab/semantic-labelling
The [XSD](./xsd_schema/OCR-D_GT_schema.xsd) is generated by transforming that ontology with [an XSLT stylesheet](./xsl/OCR-D_GT_labelschema_maker.xsl).
```sh
java -jar ../saxon9he.jar -xsl:OCR-D_GT_labelschema_maker.xsl -s:DefaultLabelTypes_3.xml
```
## Acknowledgements
Ontology described in
> **Clausner, C and Antonacopoulos**: Ontology and framework for semantic labelling of document data and software methods in: 13th IAPR International Workshop on Document Analysis Systems (DAS2018), 24-27 April 2018, Vienna, Austria. http://usir.salford.ac.uk/46896/
Implemented as a set of Java tools in https://github.com/PRImA-Research-Lab/semantic-labelling