{"id":24951944,"url":"https://github.com/ocr-d/core","last_synced_at":"2025-04-08T08:14:46.172Z","repository":{"id":37735366,"uuid":"112337283","full_name":"OCR-D/core","owner":"OCR-D","description":"Collection of OCR-related python tools and wrappers from @OCR-D","archived":false,"fork":false,"pushed_at":"2025-03-31T13:53:01.000Z","size":28101,"stargazers_count":128,"open_issues_count":138,"forks_count":31,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-01T05:38:02.903Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":"https://ocr-d.de/core/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-28T13:13:24.000Z","updated_at":"2025-03-28T11:33:08.000Z","dependencies_parsed_at":"2023-09-25T07:46:06.245Z","dependency_job_id":"280cea0d-377d-4681-9d46-1e97e669cbe0","html_url":"https://github.com/OCR-D/core","commit_stats":{"total_commits":2353,"total_committers":27,"mean_commits":87.14814814814815,"dds":"0.22056948576285595","last_synced_commit":"cbe83abf4b7841de543edaaa7e26865e142dda11"},"previous_names":[],"tags_count":346,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fcore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fcore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fcore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fcore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/core/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801169,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2025-02-03T01:32:54.742Z","updated_at":"2025-04-08T08:14:46.140Z","avatar_url":"https://github.com/OCR-D.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OCR-D/core\n\n\u003e Python modules implementing [OCR-D specs](https://github.com/OCR-D/spec) and related tools\n\n[![image](https://img.shields.io/pypi/v/ocrd.svg)](https://pypi.org/project/ocrd/)\n[![Docker Image CI](https://github.com/OCR-D/core/actions/workflows/docker-image.yml/badge.svg)](https://github.com/OCR-D/core/actions/workflows/docker-image.yml)\n[![Unit Test CI](https://github.com/OCR-D/core/actions/workflows/unit-test.yml/badge.svg)](https://github.com/OCR-D/core/actions/workflows/unit-test.yml)\n[![image](https://codecov.io/gh/OCR-D/core/branch/master/graph/badge.svg)](https://codecov.io/gh/OCR-D/core)\n[![image](https://scrutinizer-ci.com/g/OCR-D/core/badges/build.png?b=master)](https://scrutinizer-ci.com/g/OCR-D/core)\n[![image](https://scrutinizer-ci.com/g/OCR-D/core/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/OCR-D/core)\n\n[![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/OCR-D/Lobby)\n\n\n\u003c!-- BEGIN-MARKDOWN-TOC --\u003e\n* [Introduction](#introduction)\n* [Installation](#installation)\n* [Command line tools](#command-line-tools)\n\t* [`ocrd` CLI](#ocrd-cli)\n\t* [`ocrd-dummy` CLI](#ocrd-dummy-cli)\n* [Configuration](#configuration)\n* [Packages](#packages)\n\t* [ocrd_utils](#ocrd_utils)\n\t* [ocrd_models](#ocrd_models)\n\t* [ocrd_modelfactory](#ocrd_modelfactory)\n\t* [ocrd_validators](#ocrd_validators)\n\t* [ocrd_network](#ocrd_network)\n\t* [ocrd](#ocrd)\n* [bash library](#bash-library)\n* [Testing](#testing)\n* [See Also](#see-also)\n\n\u003c!-- END-MARKDOWN-TOC --\u003e\n\n## Introduction\n\nThis repository contains the python packages that form the base for tools within the\n[OCR-D ecosphere](https://github.com/topics/ocr-d).\n\nAll packages are also published to [PyPI](https://pypi.org/search/?q=ocrd).\n\n## Installation\n\n**NOTE** Unless you want to contribute to OCR-D/core, we recommend installation\nas part of [ocrd_all](https://github.com/OCR-D/ocrd_all) which installs a\ncomplete stack of OCR-D-related software.\n\nThe easiest way to install is via `pip`:\n\n    pip install ocrd\n\n\nAll Python software released by [OCR-D](https://github.com/OCR-D) requires Python 3.8 or higher.\n\n\u003e **NOTE** Some OCR-D tools (or even test cases) _might_ reveal an unintended behavior if you have specific environment modifications, like:\n* using a custom build of [ImageMagick](https://github.com/ImageMagick/ImageMagick), whose format delegates are different from what OCR-D supposes\n* custom Python logging configurations in your personal account\n\n## Command line tools\n\n**NOTE:** All OCR-D CLI tools support a `--help` flag which shows usage and\nsupported flags, options and arguments.\n\n### `ocrd` CLI\n\n* [CLI usage](https://ocr-d.de/core/api/ocrd/ocrd.cli.html)\n* [Introduction to `ocrd workspace`](https://github.com/OCR-D/ocrd-website/wiki/Intro-ocrd-workspace-CLI)\n* [OCR-D user guide](https://ocr-d.de/en/use)\n\n### `ocrd-dummy` CLI\n\nA minimal [OCR-D processor](https://ocr-d.de/en/user_guide#using-the-ocr-d-processors) that copies from `-I/-input-file-grp` to `-O/-output-file-grp`\n\n## Configuration\n\nAlmost all behaviour of the OCR-D/core software is configured via CLI options and flags, which can be listed with the `--help` flag that all CLI support.\n\nSome parts of the software are configured via environment variables:\n\n* `OCRD_PROFILE`: This variable configures the built-in CPU and memory profiling. If empty, no profiling is done. Otherwise expected to contain any of the following tokens:\n  * `CPU`: Enable CPU profiling of processor runs\n  * `RSS`: Enable RSS memory profiling\n  * `PSS`: Enable proportionate memory profiling\n* `OCRD_PROFILE_FILE`: If set, then the CPU profile is written to this file for later peruse with a analysis tools like [snakeviz](https://jiffyclub.github.io/snakeviz/)\n\n* `PATH`: Search path for processor executables (affects `ocrd process` and `ocrd resmgr`).\n* `HOME`: Directory to look for `ocrd_logging.conf`, fallback for unset XDG variables (see below).\n\n* `XDG_CONFIG_HOME`: Directory to look for `./ocrd/resources.yml` (i.e. `ocrd resmgr` user database) – defaults to `$HOME/.config`.\n* `XDG_DATA_HOME`: Directory to look for `./ocrd-resources/*` (i.e. `ocrd resmgr` data location) – defaults to `$HOME/.local/share`.\n\n* `OCRD_DOWNLOAD_RETRIES`: Number of times to retry failed attempts for downloads of resources or workspace files.\n* `OCRD_DOWNLOAD_TIMEOUT`: Timeout in seconds for connecting or reading (comma-separated) when downloading.\n\n* `OCRD_MISSING_INPUT`: How to deal with missing input files (for some fileGrp/pageId) during processing:\n  * `SKIP`: ignore and proceed with next page's input\n  * `ABORT`: throw `MissingInputFile` exception\n\n* `OCRD_MISSING_OUTPUT`: How to deal with missing output files (for some fileGrp/pageId) during processing:\n  * `SKIP`: ignore and proceed processing next page\n  * `COPY`: fall back to copying input PAGE to output fileGrp for page\n  * `ABORT`: re-throw whatever caused processing to fail\n\n* `OCRD_MAX_MISSING_OUTPUTS`: Maximal rate of skipped/fallback pages among all processed pages before aborting (decimal fraction, ignored if negative).\n\n* `OCRD_EXISTING_OUTPUT`: How to deal with already existing output files (for some fileGrp/pageId) during processing:\n  * `SKIP`: ignore and proceed processing next page\n  * `OVERWRITE`: force writing result to output fileGrp for page\n  * `ABORT`: re-throw `FileExistsError` exception\n\n\n* `OCRD_METS_CACHING`: Whether to enable in-memory storage of OcrdMets data structures for speedup during processing or workspace operations.\n\n* `OCRD_MAX_PROCESSOR_CACHE`: Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers.\n\n* `OCRD_MAX_PARALLEL_PAGES`: Maximum number of processor threads for page-parallel processing (within each Processor's selected page range, independent of the number of Processing Workers or Processor Servers). If set `\u003e1`, then a METS Server must be used for METS synchronisation.\n\n* `OCRD_PROCESSING_PAGE_TIMEOUT`: Timeout in seconds for processing a single page. If set \u003e0, when exceeded, the same as OCRD_MISSING_OUTPUT applies.\n\n* `OCRD_NETWORK_SERVER_ADDR_PROCESSING`: Default address of Processing Server to connect to (for `ocrd network client processing`).\n* `OCRD_NETWORK_SERVER_ADDR_WORKFLOW`: Default address of Workflow Server to connect to (for `ocrd network client workflow`).\n* `OCRD_NETWORK_SERVER_ADDR_WORKSPACE`: Default address of Workspace Server to connect to (for `ocrd network client workspace`).\n* `OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS`: Number of attempts for a worker to create its queue. Helpful if the rabbitmq-server needs time to be fully started.\n\n* `OCRD_NETWORK_CLIENT_POLLING_SLEEP`: How many seconds to sleep before trying `ocrd network client` again.\n* `OCRD_NETWORK_CLIENT_POLLING_TIMEOUT`: Timeout for a blocking `ocrd network client` (in seconds).\n\n* `OCRD_NETWORK_SOCKETS_ROOT_DIR`: The root directory where all mets server related socket files are created.\n* `OCRD_NETWORK_LOGS_ROOT_DIR`: The root directory where all ocrd_network related file logs are stored.\n\n\n\n## Packages\n\n### ocrd_utils\n\nContains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.\n\nSee [README for `ocrd_utils`](./README_ocrd_utils.md) for further information.\n\n### ocrd_models\n\nContains file format wrappers for PAGE-XML, METS, EXIF metadata etc.\n\nSee [README for `ocrd_models`](./README_ocrd_models.md) for further information.\n\n### ocrd_modelfactory\n\nCode to instantiate [models](#ocrd-models) from existing data.\n\nSee [README for `ocrd_modelfactory`](./README_ocrd_modelfactory.md) for further information.\n\n### ocrd_validators\n\nSchemas and routines for validating BagIt, `ocrd-tool.json`, workspaces, METS, page, CLI parameters etc.\n\nSee [README for `ocrd_validators`](./README_ocrd_validators.md) for further information.\n\n### ocrd_network\n\nComponents related to OCR-D Web API\n\nSee [README for `ocrd_network`](./README_ocrd_network.md) for further information.\n\n### ocrd\n\nDepends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.\n\nAlso contains the command line tool `ocrd`.\n\nSee [README for `ocrd`](./README_ocrd.md) for further information.\n\n## bash library\n\nBuilds a bash script that can be sourced by other bash scripts to create OCRD-compliant CLI.\n\nSee [README for `bashlib`](./README_bashlib.md) for further information.\n\n## Testing\n\nDownload assets (`make assets`)\n\nTest with local files: `make test`\n\n- Test with remote assets:\n  - `make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'`\n\n## See Also\n\n  - [OCR-D Specifications](https://https://ocr-d.de/en/spec/) ([Repo](https://github.com/ocr-d/spec))\n  - [OCR-D core API documentation](https://ocr-d.de/core) (built here via `make docs`)\n  - [OCR-D Website](https://ocr-d.de) ([Repo](https://github.com/ocr-d/ocrd-website))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fcore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Fcore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fcore/lists"}