Projects in Awesome Lists tagged with ocr-d
A curated list of projects in awesome lists tagged with ocr-d .
https://github.com/ub-mannheim/tesseract
Tesseract Open Source OCR Engine (main repository)
lstm ocr ocr-d ocr-d-mp tesseract-ocr windows-build
Last synced: 06 Oct 2025
https://github.com/UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
alto finereader hocr ocr ocr-d page-xml transformation validation
Last synced: 14 Mar 2025
https://github.com/ub-mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
alto finereader hocr ocr ocr-d page-xml transformation validation
Last synced: 17 Mar 2025
https://github.com/ocr-d/core
Collection of OCR-related python tools and wrappers from @OCR-D
Last synced: 08 Apr 2025
https://github.com/ocr-d/ocrd_all
Master repository which includes most other OCR-D repositories as submodules
Last synced: 08 Apr 2025
https://github.com/qurator-spk/dinglehopper
An OCR evaluation tool
alto alto-xml ocr ocr-d ocr-evaluation page page-xml qurator
Last synced: 14 Jan 2026
https://github.com/OCR-D/ocrd_tesserocr
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
Last synced: 02 Apr 2025
https://github.com/ocr-d/ocrd_tesserocr
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
Last synced: 06 Apr 2025
https://github.com/hnesk/browse-ocrd
An extensible viewer for OCR-D mets.xml files
Last synced: 22 Jan 2026
https://github.com/ocr-d/spec
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
Last synced: 10 Apr 2025
https://github.com/bertsky/ocrd_detectron2
OCR-D wrapper for detectron2 based segmentation models
Last synced: 13 Apr 2025
https://github.com/ocr-d/ocrd_calamari
Recognize text using Calamari OCR and the OCR-D framework
Last synced: 10 Apr 2025
https://github.com/ocr-d/page-to-alto
Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
Last synced: 12 Jan 2026
https://github.com/ocr-d/format-converters
Converters for various file formats used for representing OCR
Last synced: 15 Apr 2025
https://github.com/bertsky/workflow-configuration
a makefilization for OCR-D workflows, with configuration examples
Last synced: 31 Aug 2025
https://github.com/slub/ocrd_manager
frontend for ocrd_controller and adapter towards ocrd_kitodo
Last synced: 24 Oct 2025
https://github.com/bertsky/nmalign
forced alignment of lists of string by fuzzy string matching
Last synced: 29 Jul 2025
https://github.com/bertsky/ocrd_publaynet
convert PubLayNet data into METS/PAGE-XML
Last synced: 13 Apr 2025
https://github.com/slub/ocrd_kitodo
Docker integration of Kitodo.Production and OCR-D
Last synced: 11 Apr 2025
https://github.com/ocr-d/gt-repo-template
A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.
ground-truth ocr-d pagexml repository template
Last synced: 15 Apr 2025
https://github.com/ub-mannheim/ocrd_pagetopdf
OCR-D wrapper for prima-pagetopdf
Last synced: 07 Aug 2025
https://github.com/ocr-d/ocrd_keraslm
Simple character-based language model using keras
Last synced: 10 Apr 2025
https://github.com/qurator-spk/ocrd-galley
A Dockerized test environment for OCR-D processors 🚢
Last synced: 16 Jan 2026
https://github.com/slub/ocrd_controller
Path to network implementation of OCR-D
Last synced: 27 Feb 2026
https://github.com/bertsky/docstruct
Document structure detection from PAGE-XML to METS-XML
Last synced: 15 Jul 2025
https://github.com/ocr-d/gt-guidelines
OCR-D guidelines for Ground Truth production
Last synced: 06 Jan 2026
https://github.com/ocr-d/ocr-d.github.io
Website for OCR-D specs, formats, requirements
Last synced: 30 Jan 2026
https://github.com/ocr-d/ocrd_vandalize
Demo processor to illustrate OCR-D Python API
Last synced: 10 Apr 2025
https://github.com/ocr-d/assets
Test data for testing specs and software in @OCR-D
Last synced: 06 Jan 2026
https://github.com/ocr-d/ocrmultieval
Extensible evaluation of (intermediate) results of an OCR workflow
Last synced: 13 Jul 2025
https://github.com/bertsky/ocrd_wrap
OCR-D wrapper for arbitrary coords-preserving image operations
Last synced: 13 Apr 2025
https://github.com/ocr-d/gt-mufilevelrules
OCR-D-Level-Rules can be created automatically with gt-MufiLevelRules from the encodings published by MUFI: The Medieval Unicode Font Initiative.
ground-truth guidelines ocr ocr-d transcription
Last synced: 07 Jan 2026
https://github.com/tboenig/gt_corpus_benchmark
This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.
corp ground-truth ocr-d pagexml
Last synced: 02 Feb 2026
https://github.com/ocr-d/ocrd_framework
Docker installation for the OCR-D framework containing all available processors, taverna workflow and local repository.
Last synced: 28 Mar 2025
https://github.com/stweil/tensorflow_gpu_to_tensorflow
Dummy Python package for tensorflow-gpu on hosts without GPU
ocr-d python tensorflow tensorflow-gpu
Last synced: 15 Apr 2026
https://github.com/tboenig/ocrd_bbaw_pilotbibliothek
Bericht über die OCR-D-Teststellung an Berlin-Brandenburgische Akademie der Wissenschaften (BBAW)
Last synced: 20 Feb 2026
https://github.com/ocr-d/gt-labelling
ground-truth metadata mets mets-xml ocr-d
Last synced: 07 Jan 2026
https://github.com/ocr-d/gt_structure_1_4
About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
ground-truth ocr-d page-xml repository segmentation
Last synced: 07 Jan 2026
https://github.com/ub-mannheim/hkb-gt
Ground truth for a political newspaper of the Mannheim region (1931–1945)
ground-truth newspaper ocr ocr-d
Last synced: 29 Mar 2025
https://github.com/bertsky/ocrd_doxa
OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding
Last synced: 13 Apr 2025
https://github.com/ocr-d/policy
OCR-D Empfehlungen Volltextdigitalisierung
digitisation ground-truth guidelines ocr-d
Last synced: 06 Jan 2026
https://github.com/ocr-d/repository_metastore
Microservice to manage the data and metadata of the OCR-D data. It provides read/write/update metadata (XML), registering XSD, validate XML and indexing of metadata.
Last synced: 28 Mar 2025
https://github.com/tboenig/17_frak_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 22 Jan 2026
https://github.com/tboenig/16_ant_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 22 Jan 2026
https://github.com/tboenig/19_frak_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 22 Jan 2026
https://github.com/ocr-d/gt_structure_1_1
The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
ground-truth ocr-d page-xml repository segmentation
Last synced: 27 Jan 2026
https://github.com/bertsky/ocrd_jdeskew
OCR-D wrapper for Document Image Skew Estimation using Adaptive Radial Projection
Last synced: 19 Jun 2025
https://github.com/bertsky/ocrd_origami
OCR-D wrapper for poke1024/origami OLR+OCR
Last synced: 25 Mar 2025
https://github.com/tboenig/17_frak_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 23 Jan 2026
https://github.com/tboenig/17_fontmix_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 23 Jan 2026
https://github.com/ocr-d/gt_structure_1_2
The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
ground-truth ocr-d page-xml repository segmentation
Last synced: 29 Jan 2026
https://github.com/tboenig/16_ant_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 06 Feb 2026
https://github.com/tboenig/18_ant_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 26 Feb 2026
https://github.com/tboenig/18_fontmix_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 09 Feb 2026
https://github.com/ocr-d/gt_structure_1_3
The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
ground-truth ocr-d page-xml repository segmentation
Last synced: 07 Jan 2026
https://github.com/ocr-d/gt-repo-scripts
XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).
ground-truth ocr-d page-xml repository template
Last synced: 06 Jan 2026
https://github.com/ocr-d/bibliothecabaltica2018
Slides for the OCR-D talk at the Bibliotheca Baltica 2018 symposium in Rostock
Last synced: 07 Jan 2026
https://github.com/tboenig/19_ant_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 01 Feb 2026
https://github.com/tboenig/18_frak_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 24 Jan 2026
https://github.com/tboenig/17_fontmix_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 24 Jan 2026
https://github.com/tboenig/16_frak_complex
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 03 Feb 2026
https://github.com/tboenig/18_frak_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 05 Feb 2026
https://github.com/tboenig/16_frak_simple
This repository provides the Ground Truth data for the OCR-D Quiver back end. This data serves as a basis for benchmarking the performance and accuracy of different OCR-D workflows for different types of input data.
Last synced: 24 Jan 2026