Projects in Awesome Lists tagged with pagexml
A curated list of projects in awesome lists tagged with pagexml .
https://github.com/mauvilsa/tesseract-recognize
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
cli docker-image document-recognition ocr optical-character-recognition pagexml tesseract text-detection
Last synced: 05 May 2025
https://github.com/mauvilsa/nw-page-editor
Simple app for visual editing of Page XML files
annotation-tool desktop-app docker-image editor pagexml server-app
Last synced: 07 Jul 2025
https://github.com/andbue/nashi
Some bits of javascript to transcribe scanned pages using PageXML
Last synced: 11 Apr 2025
https://github.com/omni-us/pagexml
Library in C++ and a python wrapper for dealing with Page XML files
annotation-processing docker-image document-representation pagexml python
Last synced: 20 Mar 2025
https://github.com/ocr-d/gt-repo-template
A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.
ground-truth ocr-d pagexml repository template
Last synced: 15 Apr 2025
https://github.com/cconzen/readingorderrecalculation
Post-process PageXMLs to improve their region reading order
pagexml reading-order transkribus
Last synced: 28 Jun 2025
https://github.com/tboenig/gt_corpus_benchmark
This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.
corp ground-truth ocr-d pagexml
Last synced: 02 Feb 2026
https://github.com/bobld/publaynetsharp
Extract and convert PubLayNet data to PageXml format
csharp pagexml publaynet pubmed
Last synced: 12 Oct 2025
https://github.com/scdh/x2tei-transformations
Transformation from various Formats to TEI
converters docx pagexml tei tei-xml usx
Last synced: 06 Jan 2026