https://github.com/bertsky/docstruct
Document structure detection from PAGE-XML to METS-XML
https://github.com/bertsky/docstruct
ocr-d
Last synced: 11 months ago
JSON representation
Document structure detection from PAGE-XML to METS-XML
- Host: GitHub
- URL: https://github.com/bertsky/docstruct
- Owner: bertsky
- License: apache-2.0
- Created: 2022-09-06T16:57:44.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2025-04-09T18:04:38.000Z (about 1 year ago)
- Last Synced: 2025-04-09T19:22:32.749Z (about 1 year ago)
- Topics: ocr-d
- Language: Python
- Homepage:
- Size: 28.3 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/bertsky/docstruct/actions/workflows/docker-image.yml)
# docstruct
Document structure detection from PAGE to METS
Provides an [OCR-D processor](https://ocr-d.de/en/spec/cli)
which will parse the input page-level structure (as detected by
some [OCR-D](https://ocr-d.de/en/about) workflow including preprocessing,
layout analysis and OCR) of a document annotated via [PAGE-XML](https://ocr-d.de/en/spec/page)
and [METS-XML](https://ocr-d.de/en/spec/mets), further analyse it
(...) and wrap it into a document-level structure in the METS using
logical `mets:structMap` and either …
- `mets:structLink` ([DFG profile](http://dfg-viewer.de/fileadmin/groups/dfgviewer/METS-Anwendungsprofil_2.3.1.pdf)), or
- `mets:area` ([ENMAP profile](http://www.europeana-newspapers.eu/wp-content/uploads/2015/05/D5.3_Final_release_ENMAP_1.0.pdf))
… for representation.