An open API service indexing awesome lists of open source software.

https://github.com/ocr-d/gt-labelling


https://github.com/ocr-d/gt-labelling

ground-truth metadata mets mets-xml ocr-d

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

# gt-labelling : semantic-labelling OCR ground truth data

> Semantic-labelling OCR ground truth data and store these data with METS metadata set.

## Use the XSD Schema in METS

### Namespace and schema location

Add the namespace `http://www.ocr-d.de/GT/`. We recommend `gt` as namespace prefix:

```xml
xmlns:gt="http://www.ocr-d.de/GT/"
```

Set XSD schema location `OCR-D_GT_schema.xsd`:

```xml
xsi:schemaLocation="file:///OCR-D_GT_schema.xsd" or URL...
```

### METS Example

See [`mets_example.xml`](./example/mets_example.xml).

## Developer info

The ontology is defined in
[`DefaultLabelTypes_3.xml`](./test/DefaultLabelTypes_3.xml) taken from
https://github.com/PRImA-Research-Lab/semantic-labelling

The [XSD](./xsd_schema/OCR-D_GT_schema.xsd) is generated by transforming that ontology with [an XSLT stylesheet](./xsl/OCR-D_GT_labelschema_maker.xsl).

```sh
java -jar ../saxon9he.jar -xsl:OCR-D_GT_labelschema_maker.xsl -s:DefaultLabelTypes_3.xml
```

## Acknowledgements

Ontology described in

> **Clausner, C and Antonacopoulos**: Ontology and framework for semantic labelling of document data and software methods in: 13th IAPR International Workshop on Document Analysis Systems (DAS2018), 24-27 April 2018, Vienna, Austria. http://usir.salford.ac.uk/46896/

Implemented as a set of Java tools in https://github.com/PRImA-Research-Lab/semantic-labelling