{"id":17182438,"url":"https://github.com/bertsky/ocrd_page2tei","last_synced_at":"2026-01-04T23:50:06.228Z","repository":{"id":118132860,"uuid":"456729400","full_name":"bertsky/ocrd_page2tei","owner":"bertsky","description":"OCR-D wrapper for page2tei","archived":false,"fork":false,"pushed_at":"2022-02-08T10:11:55.000Z","size":5,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-30T03:27:30.353Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bertsky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-08T00:40:13.000Z","updated_at":"2022-02-08T14:47:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"752d4e9e-b8f0-4c05-8254-aee3d40c11da","html_url":"https://github.com/bertsky/ocrd_page2tei","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_page2tei","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_page2tei/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_page2tei/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_page2tei/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bertsky","download_url":"https://codeload.github.com/bertsky/ocrd_page2tei/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245383118,"owners_count":20606265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2024-10-15T00:37:05.605Z","updated_at":"2026-01-04T23:50:06.188Z","avatar_url":"https://github.com/bertsky.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ocrd_page2tei\n\n    OCR-D wrapper for [page2tei](https://github.com/tboenig/page2tei)\n\n  * [Introduction](#introduction)\n  * [Installation](#installation)\n  * [Usage](#usage)\n     * [OCR-D processor interface ocrd-page2tei](#ocr-d-processor-interface-ocrd-page2tei)\n  * [Testing](#testing)\n\n\n## Introduction\n\nThis offers an [OCR-D](https://ocr-d.de) compliant [workspace processor](https://ocr-d.de/en/spec/cli) for\n[TEI](https://tei-c.org/) conversion.\n\nIt _wraps_ the XSL transformation [page2tei](https://github.com/tboenig/page2tei)\nfor OCR-D:\n\n * For XSL processing, it uses [Saxon](http://www.saxonica.com/).\n\n * For handling METS/PAGE, and providing the OCR-D CLI, it is written as a shell script,\nand relies heavily on the [OCR-D core bashlib API](https://github.com/OCR-D/core).\n\n## Installation\n\nRequires Java\u003e=8, [Saxon](http://www.saxonica.com/) and [GNU make](http://www.gnu.org/software/make).\n\nTo install system dependencies on Ubuntu, do\n\n    sudo make deps-ubuntu\n\nWhich is the equivalent of:\n\n    apt install openjdk-8-jre-headless\n\nTo install local dependencies (download Saxon and page2tei), do\n\n    make deps\n\nTo install this module, then do:\n\n    make install\n\n## Usage\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-page2tei`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-page2tei [OPTIONS]\n\nConvert PAGE-XML to TEI-C\n\nOptions:\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process\n  --overwrite                     Remove existing output pages/images\n                                  (with --page-id, remove only those)\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -m, --mets URL-PATH             URL or file path of METS to process\n  -w, --working-dir PATH          Working directory of local workspace\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -J, --dump-json                 Dump tool description as JSON and exit\n  -h, --help                      This help message\n  -V, --version                   Show version\n\n```\n\n## Testing\n\nnone yet\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_page2tei","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertsky%2Focrd_page2tei","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_page2tei/lists"}