{"id":17182423,"url":"https://github.com/bertsky/workflow-configuration","last_synced_at":"2025-08-31T16:41:11.060Z","repository":{"id":40588529,"uuid":"215899343","full_name":"bertsky/workflow-configuration","owner":"bertsky","description":"a makefilization for OCR-D workflows, with configuration examples","archived":false,"fork":false,"pushed_at":"2025-06-06T09:58:14.000Z","size":376,"stargazers_count":10,"open_issues_count":3,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-06T10:41:18.494Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":null,"language":"XSLT","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bertsky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-17T22:55:38.000Z","updated_at":"2025-06-06T09:58:16.000Z","dependencies_parsed_at":"2023-11-13T13:29:01.925Z","dependency_job_id":"16b88a96-498c-4b8a-a92a-6e33017a5177","html_url":"https://github.com/bertsky/workflow-configuration","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bertsky/workflow-configuration","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fworkflow-configuration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fworkflow-configuration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fworkflow-configuration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fworkflow-configuration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bertsky","download_url":"https://codeload.github.com/bertsky/workflow-configuration/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Fworkflow-configuration/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273010885,"owners_count":25030367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2024-10-15T00:37:03.543Z","updated_at":"2025-08-31T16:41:11.016Z","avatar_url":"https://github.com/bertsky.png","language":"XSLT","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CircleCI](https://dl.circleci.com/status-badge/img/gh/bertsky/workflow-configuration/tree/master.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/bertsky/workflow-configuration/tree/master)\n\n## OCR-D workflow configurations based on makefiles\n\nThis provides an attempt at running [OCR-D](https://ocr-d.de) workflows\nconfigured and controlled via makefiles using [GNU bash](http://www.gnu.org/software/bash),\n[GNU make](http://www.gnu.org/software/make/) and [GNU parallel](http://www.gnu.org/software/parallel).\n\nMakefilization offers the following _advantages_:\n\n- incremental builds (steps already processed for another configuration or in a failed run need not be repeated) and automatic dependencies (new files will force all their dependents to update)\n- persistency of configuration and results\n- encapsulation and ease of use\n- sharing configurations and repeating experiments\n- less writing effort, fast templating\n- parallelization across workspaces\n\nNevertheless, there are also some _disadvantages_:\n\n- depends on directories (fileGrps) as targets, which is hard to get correct under all circumstances\n- must mediate between filesystem perspective (understood by `make`) and METS perspective\n- `make` **cannot** handle _target names with spaces_ in them ([at all](https://savannah.gnu.org/bugs/?712))  \n  (This means that fileGrp directories must not have spaces.\n   Local file paths may contain spaces though, if the respective processors support that.)\n\nContents:\n * [Dependencies](#dependencies)\n * [Installation](#installation)\n * [Docker Image](#docker-image)\n * [Usage](#usage)\n    * [ocrd-import](#ocrd-import)\n    * [PAGE transformations and ocrd-page-transform](#ocrd-page-transform)\n    * [METS transformations](#mets-transformations)\n    * [ocrd-make](#ocrd-make)\n      * [LOGLEVEL](#loglevel)\n      * [PAGES](#pages)\n      * [TIMEOUT](#timeout)\n      * [FAILRETRY](#failretry)\n      * [FAILDUMMY](#faildummy)\n      * [METSSERV](#metsserv)\n      * [PAGEWISE](#pagewise)\n      * [JOBDB](#jobdb)\n      * [Remote distribution](#remote-distribution)\n * [Customisation](#customisation)\n    * [Recommendations](#recommendations)\n    * [Example](#example)\n * [Testing](#testing)\n * [Results](#results)\n    * [OCR-D ground truth](#ocr-d-ground-truth)\n * [Implementation](#implementation)\n    * [GPU vs CPU parallelism](#gpu-vs-cpu-parallelism)\n    * [workspace vs page parallelism](#workspace-vs-page-parallelism)\n\n### Dependencies\n\nTo install system dependencies for this package, run...\n\n    make deps-ubuntu\n\n...in a privileged context for Ubuntu (like a Docker container).\n\nOr equivalently, install the following packages:\n- `parallel` ([GNU parallel](http://www.gnu.org/software/parallel))\n- `make` ([GNU make](http://www.gnu.org/software/make))\n- `xmlstarlet`\n- `bc` and `sed`\n\nAdditionally, you must of course install [ocrd](https://github.com/OCR-D/core) itself\nalong with its dependencies in the current Python virtual environment (venv). Moreover,\ndepending on the specific configurations you want to use (i.e. the processors it contains),\nadditional modules must be installed. See [OCR-D setup guide](https://ocr-d.de/en/setup)\nfor instructions.\n\n(Yes, `workflow-configuration` is already part of [ocrd_all](https://github.com/OCR-D/ocrd_all),\nwhich is also available on [Dockerhub](https://hub.docker.com/r/ocrd/all).)\n\n\n### Installation\n\nRun:\n\n    make install\n\n... if you are in a (Python) virtual environment, which is recommended.\n\nYou can then call:\n\n    ocrd-make [OPTIONS] -f WORKFLOW-CONFIG.mk WORKSPACE...\n\n... for processing any number of workspace directories.\n\nWhere:\n\n- _`OPTIONS`_ are the usual options controlling GNU make (e.g. `-j` for parallel processing).\n- _`WORKFLOW_CONFIG.mk`_ is one of the configuration makefiles you find here or created yourself.\n- _`WORKSPACE`_ is a directory with a `mets.xml`, or `all` (the default) for all such directories that we can `find`.\n\nCalling workflows is possible from anywhere in your filesystem, but for the `WORKFLOW_CONFIG.mk` you may need to:\n\n- either provide the `*.mk` configurations in the source directory at installation time\n  (to ensure they are installed under the site prefix and can always be found by file name)\n- or provide full paths at runtime (by absolute path name, or relative to the CWD).\n\n(The previous version of `ocrd-make` tried to copy or symlink all makefiles to the runtime directory.\n You can still use those, but should remove the old `Makefile`.)\n\n### Docker Image\n\nInstead of the above native installation steps, you can use the prebuilt image from Docker Hub:\n\n    docker pull bertsky/workflow-configuration\n    docker run -V /path/to/data:/data bertsky/workflow-configuration ocrd-make ...\n\nFor general guidance on using Docker with OCR-D, see\n[User Guide](https://ocr-d.de/en/user_guide#translating-native-commands-to-docker-calls).\n\n### Usage\n\n#### ocrd-import\n\nTo create workspaces from directories which contain image files:\n\n    ocrd-import DIRECTORY\n\n\nTo get help for the import tool:\n\n    ocrd-import --help\n\n\u003cdetails\u003e\u003csummary\u003estandalone CLI\u003c/summary\u003e\n\n\n\u003cpre\u003e\nUsage: ocrd-import [OPTIONS] WORKSPACE_DIR\n\n  Create OCR-D workspace meta-data (mets.xml) in WORKSPACE_DIR (or $PWD), importing...\n  * all image files (with known file extension or convertible via ImageMagick) under fileGrp `image_group`\n  * all .xml files (if they validate as PAGE-XML) under fileGrp `pagexml_group`\n  * all .xml files (if they validate as ALTO-XML) under fileGrp `altoxml_group`\n  ...but failing otherwise (unless `ignore` is set)\n\nOptions:\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -i, --ignore                    keep going after unknown file types\n  -s, --skip SUFFIX               ignore file names ending in given SUFFIX\n                                  (repeatable)\n  -R, --regex EXPR                only include paths matching given EXPR\n                                  (repeatable)\n  -C, --no-convert                do not attempt to convert image file types\n  -r, --render DPI                when converting PDFs, render at DPI pixel\n                                  density  [default: 300]\n  -P, --nonnum-ids                do not use numeric pageIds but basename\n                                  patterns\n  -B, --basename                  only use basename for IDs\n  -n, --dry-run                   only show resulting METS to stdout via pager\n  -I, --image-group TEXT          fileGrp to place detected or converted\n                                  images into  [default: OCR-D-IMG]\n  -X, --pagexml-group TEXT        fileGrp to place detected PAGE-XML into\n                                  [default: OCR-D-PAGE]\n  -A, --altoxml-group TEXT        fileGrp to place detected ALTO-XML into\n                                  [default: OCR-D-ALTO]\n  -G, --directory-groups          instead of assigning files to `image_group`\n                                  or `pagexml_group`, and trying to convert\n                                  everything else to images, create a group\n                                  for every subdirectory and auto-detect its\n                                  MIME types\n  -h, --help                      Show this message and exit.\n\u003c/pre\u003e\n\n\u003c/details\u003e\n\n#### ocrd-page-transform\n\nTo perform various tasks via XSLT on PAGE-XML files (these all share the same options, including `--help`):\n\n    page-add-nsprefix-pc # adds namespace prefix 'pc:'\n    page-rm-nsprefix-pc # removes namespace prefix 'pc:'\n    page-set-nsversion-2019 # update the PAGE namespace schema version to 2019\n    page-fix-coords # replace negative values in coordinates by zero\n    page-flatten-regions # move recursive *Region/TextRegion to top level for editing in LAREX\n    page-flatten-tableregions # (special case for table cells, i.e. TableRegion/TextRegion)\n    page-unflatten-regions # move flattened sub-TextRegions back to hierarchy after editing in LAREX\n    page-unflatten-tableregions # (special case for table cells, i.e. TableRegion/TextRegion)\n    page-move-alternativeimage-below-page # try to push page-level AlternativeImage back to subsegments\n    page-remove-alternativeimages # remove $which [last] AlternativeImage entries at hierarchy $level [page]\n    page-remove-metadataitem # remove all MetadataItem entries\n    page-remove-dead-regionrefs # remove non-existing regionRefs\n    page-remove-empty-readingorder # remove empty ReadingOrder or groups\n    page-remove-empty-text-regions # remove empty TextRegion entries\n    page-remove-empty-lines # remove empty TextLine entries\n    page-remove-all-regions # remove all *Region (and TextLine and Word and Glyph) entries\n    page-remove-text-regions # remove all TextRegion (and TextLine and Word and Glyph) entries\n    page-remove-regions # remove all *Region (and TextLine and Word and Glyph) entries of $type\n    page-remove-lines # remove all TextLine (and Word and Glyph) entries\n    page-remove-words # remove all Word (and Glyph) entries\n    page-remove-glyphs # remove all Glyph entries\n    page-remove-textequiv # remove all TextEquiv entries for selected levels and @index\n    page-rename-id-clashes # reassign new @id of segments that clash with other existing @id\n    page-ensure-readingorder # generate ReadingOrder hierarchy from recursive document order if empty\n    page-ensure-textequiv-conf # set TextEquiv/@conf attributes if missing\n    page-ensure-textequiv-index # set TextEquiv/@index attributes from element order\n    page-ensure-textequiv-unicode # create empty TextEquiv/Unicode elements if empty\n    page-sort-textequiv-index # sort TextEquiv by @index\n    page-textequiv-lines-to-regions # project text from TextLines to TextRegions (concat with LF in between)\n    page-textequiv-words-to-lines # project text from Words to TextLines (concat with spaces in between)\n    page-extract-text # extract TextEquiv/Unicode from TextRegion|TextLine|Word|Glyph $level [highest] consecutively, in $order [reading-order], interspersed by $pb and $lb\n    page-extract-lines # extract TextEquiv/Unicode from TextLine consecutively, in $order [reading-order]\n    page-extract-words # extract TextEquiv/Unicode from Word consecutively\n    page-extract-glyphs # extract TextEquiv/Unicode from Glyph consecutively\n\n\n\u003cdetails\u003e\u003csummary\u003estandalone CLI\u003c/summary\u003e\n\n\n\u003cpre\u003e\nUsage: NAME [OPTIONS] [FILE]\n\n  Open PAGE file XMLFILE (or stdin) and apply the XSL transformation \"page-add-nsprefix-pc.xsl\"\n  Write the result to stdout, unless...\n  -i / --inplace is given - in which case the result is written back to the\n                            file silently, or\n  -d / --diff is given    - in which case the result will be compared to the\n                            input and a patch shown on stdout.\n\nOptions:\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -s, --string-param NAME=VALUE   set param NAME to string literal VALUE\n  -p, --xpath-param NAME=VALUE    set param NAME to XPath expression VALUE\n  -i, --inplace                   overwrite input file with result of\n                                  transformation\n  -P, --pretty                    pretty-print output (line breaks with\n                                  indentation\n  -d, --diff                      show diff between input and output via pager\n  -D, --dump                      just print the transformation stylesheet\n                                  (XSL)\n  -h, --help                      Show this message and exit.\n\u003c/pre\u003e\n\n\n\u003c/details\u003e\n\nTo perform the same transformations, but as a [workspace processor](https://ocr-d.de/en/spec/cli),\nuse `ocrd-page-transform` and pass the filename of the transformation as parameter, e.g.:\n\n    ocrd-page-transform -P xsl page-extract-lines.xsl -P xslt-params \"-s order=reading-order\"\n    ocrd-page-transform -P xsl page-remove-alternativeimages.xsl -P xslt-params \"-s level=line -s which=dewarped\"\n    cat \u003c\u003c'EOF' \u003e my-transform.xsl\n    \u003cxsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" xmlns:pc=\"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15\"\u003e\n      \u003cxsl:output method=\"xml\" standalone=\"yes\" encoding=\"UTF-8\" omit-xml-declaration=\"no\"/\u003e\n      \u003cxsl:template match=\"//pc:Word\"/\u003e\n      \u003cxsl:template match=\"node()|text()|@*\"\u003e\n        \u003cxsl:copy\u003e\n          \u003cxsl:apply-templates select=\"node()|text()|@*\"/\u003e\n        \u003c/xsl:copy\u003e\n      \u003c/xsl:template\u003e\n    \u003c/xsl:stylesheet\u003e\n    EOF\n    ocrd-page-transform -P xsl my-transform.xsl\n\n\n\u003cdetails\u003e\u003csummary\u003eOCR-D CLI\u003c/summary\u003e\n\n\n\u003cpre\u003e\nUsage: ocrd-page-transform [worker|server] [OPTIONS]\n\n  apply arbitrary XSL transformation file for PAGE-XML\n\n  \u003e Transform pages with the given XSLT.\n\n  \u003e Open the input PAGE element hierarchy and process it with the XSLT\n  \u003e processor parsed from the `xsl` resource file, passing `xslt-params`\n  \u003e as XSLT parameters (if any).\n\n  \u003e Generate a new PAGE object from the resulting hierarchy, finally\n  \u003e serialise and add it as new output file.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"xsl\" [string - REQUIRED]\n    File path of the XSL transformation script (see `ocrd resmgr` for\n    prepackaged and user-installed files available by file name)\n   \"xslt-params\" [string - \"\"]\n    Assignment of XSL transformation parameter values, given as in\n    `xmlstarlet` (which differentiates between `-s name=value` for\n    literal `value` and `-p name=value` for XPath expression `value`),\n    white-space separated.\n   \"pretty-print\" [number - 0]\n    Reformat with line breaks and this many spaces of indentation after\n    XSL transformation (unless zero).\n   \"mimetype\" [string - \"application/vnd.prima.page+xml\"]\n    MIME type to register the output files under (should correspond to\n    `xsl` result)\n\n\u003c/pre\u003e\n\n\n\u003c/details\u003e\n\n#### METS transformations\n\nBesides the transformations for PAGE-XML above, which are wrapped both as OCR-D CLI `ocrd-page-transform`\nand standlone CLIs `page-...`, this module installs some XSL transformations for METS-XML, which are\nlikewise wrapped as standalone CLIs `mets-...`:\n\n    mets-add-nsprefix-mets # add namespace prefix mets:\n    mets-alias-filegrp # zero-cost copy of fileGrp $input [FULLTEXT] as fileGrp $output [ALTO]\n    mets-copy-agents # copy all metsHdr/agent from $other-mets [mets.xml]\n\n\u003cdetails\u003e\u003csummary\u003estandalone CLI\u003c/summary\u003e\n\n\n\u003cpre\u003e\nUsage: NAME [OPTIONS] [FILE]\n\n  Open METS file XMLFILE (or stdin) and apply the XSL transformation \"mets-copy-agents.xsl\"\n  Write the result to stdout, unless...\n  -i / --inplace is given - in which case the result is written back to the\n                            file silently, or\n  -d / --diff is given    - in which case the result will be compared to the\n                            input and a patch shown on stdout.\n\nOptions:\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -s, --string-param NAME=VALUE   set param NAME to string literal VALUE\n  -p, --xpath-param NAME=VALUE    set param NAME to XPath expression VALUE\n  -i, --inplace                   overwrite input file with result of\n                                  transformation\n  -P, --pretty                    pretty-print output (line breaks with\n                                  indentation\n  -d, --diff                      show diff between input and output via pager\n  -D, --dump                      just print the transformation stylesheet\n                                  (XSL)\n  -h, --help                      Show this message and exit.\n\u003c/pre\u003e\n\n\n\u003c/details\u003e\n\n#### ocrd-make\n\nWorkflows are processed like _software builds_: File groups (depending on one another) are the targets to be built in each workspace, and all workspaces are built recursively. A build is finished when all targets exist and none are older than their respective prerequisites (e.g. image files).\n\nTo run a configuration...\n1. Activate working environment (virtualenv) and change to the target directory.\n2. Choose (or create) a workflow configuration makefile.  \n   (Yes, you can have to look inside and browse its rules!)\n3. Execute:\n\n        ocrd-make [OPTIONS] -f WORKFLOW-CONFIG.mk all\n\n    (The special target `all` (which is also the default goal) will search for all workspaces\n    in the current directory recursively.) You can also run on a subset of workspaces\n    by passing these as goals on the command line...\n\n        ocrd-make -f WORKFLOW-CONFIG.mk PATH/TO/WORKSPACE1 PATH/TO/WORKSPACE2 ...\n\n\u003cdetails\u003e\u003csummary\u003eFull CLI summary\u003c/summary\u003e\n\n \u003cpre\u003e\nRunning OCR-D workflow configurations on multiple workspaces:\n\n  Usage:\n  ocrd-make [OPTIONS] [-f CONFIGURATION] [TARGETS] [VARIABLE-ASSIGNMENTS]\n\n  Options (ocrd-specific):\n  -X|--transfer HOST:DIR  run workflow on remote HOST in remote DIR\n  --remote-init CMD  run CMD before the workflow on remote host\n\n  Options (make-specific):\n  -j|--jobs [N]   number of jobs to run simultaneously\n  -l|--load-average|--max-load N  system load limit for -j without N\n  -I|--include-dir DIR  extra search directory for included makefiles\n  -C|--directory DIR  change to directory before reading makefiles\n\n  Targets (general):\n  * help (this message)\n  * info (short self-description of the selected configuration)\n  * show (print command sequence that would be executed for the selected configuration)\n  * server (start workflow server for the selected configuration; control via 'ocrd workflow client')\n\n  Targets (data processing):\n  * all (recursively find all directories with a mets.xml, default goal)\n  * % (name of the workspace directory, overriding the default goal)\n\n  Variables:\n  * LOGLEVEL: override global loglevel for all OCR-D processors\n    (if unset, then default/configured logging levels apply)\n  * PAGES: override page selection (comma-separated list)\n    (if unset, then all pages will be processed)\n  * TIMEOUT: per-processor timeout (in seconds or with unit suffix)\n    (if unset, then processors may run forever)\n  * FAILRETRY: per-processor number of attempts on processing errors\n    (if unset, then the first attempt exits, passing the error on)\n  * FAILDUMMY: use ocrd-dummy (just copy -I to -O grp) on processing errors\n    (if unset, then failed processors terminate the workflow)\n  * METSSERV   start/use/stop METS Servers before/during/after workflows\n    (if unset, the METS file will have to be de/serialised between each call)\n  * PAGEWISE   call processors separately per page during workflows\n    (if unset, processors are called on the whole document)\n\n(This will merely delegate to `make` on the given working directories\nfrom the installation directory \"/data/ocr-d/ocrd_all/venv38/share/workflow-configuration\".\nAll options except -C and -I are allowed and passed through.\nOptions -j and -l are intercepted.)\n \u003c/pre\u003e\n\n \u003c/details\u003e\n\nTo get help:\n\n    ocrd-make help\n\n\nTo get a short description of the chosen configuration:\n\n    ocrd-make -f CONFIGURATION.mk info\n\n\nTo see the command sequence that would be executed for the chosen configuration (in the format of `ocrd process`):\n\n    ocrd-make -f CONFIGURATION.mk show\n\n\nTo run a workflow server for the command sequence that would be executed for the chosen configuration (to be controlled via `ocrd workflow client` or HTTP):\n\n    ocrd-make -f CONFIGURATION.mk server\n\nTo spawn a new configuration file, in the directory of the source repository, do:\n\n    ocrd-make NEW-CONFIGURATION.mk\n\n\nFurthermore, you can add any options that `make` understands (see `make --help` or `info make 'Options Summary'`). For example,\n- `-n` or `--dry-run` to just simulate the run\n- `-q` or `--question` to just check whether anything needs to be built at all\n- `-s` or `--silent` to suppress echoing recipes\n- `-j` or `--jobs` to run on workspaces in parallel\n- `-l` or `--max-load` to set the maximum load level in parallel mode\n- `-B` or `--always-make` to consider all targets out-of-date (i.e. unconditionally rebuild)\n- `-o` or `--old-file` to consider some target up-to-date w.r.t. its prerequisites (i.e. unconditionally keep) but older than its dependents (i.e. unconditionally ignore)\n- `-W` or `--new-file` to consider some target newer than its dependents (i.e. unconditionally update them)\n\nFor example, to rebuild anything _after_ the fileGrp `OCR-D-BIN`, do:\n\n    ocrd-make -f CONFIGURATION.mk -W OCR-D-BIN all\n\nYou can also use that pattern to specify any fileGrp other than the `.DEFAULT_GOAL` of your configuration as the overall target. For example, to build anything _up to_ the fileGrp `OCR-D-SEG-LINE`, do:\n\n    ocrd-make -f CONFIGURATION.mk .DEFAULT_GOAL=OCR-D-SEG-LINE all\n\nThere are 6 **special variables** and 1 **additional option**:\n\n##### LOGLEVEL\n\nTo override the default (or configured) log levels for all processors and libraries, use `LOGLEVEL`. For example, to get debugging everywhere, do:\n\n    ocrd-make -f CONFIGURATION.mk all LOGLEVEL=DEBUG\n\n##### PAGES\n\nTo process only a subset of pages in all fileGrps, set `PAGES`. For example, to only consider pages `PHYS_0005` through `PHYS_0007`, do:\n\n    ocrd-make -f CONFIGURATION.mk all PAGES=PHYS_0005..PHYS_0007\n\nThe variable gets interpreted as the usual [--page-id parameter](https://ocr-d.de/en/spec/cli#-g---page-id-id) by processors, so it supports\nrange expressions, comma-separated lists and regular expressions. \n\nIf the METS provides physical page labels (`@ORDER` or `@ORDERLABEL`), then these work as well:\n\n    ocrd-make -f CONFIGURATION.mk all PAGES=5..7\n\n##### TIMEOUT\n\nTo set an upper limit on the time each processor may take to run, use `TIMEOUT`.\nSet a numeric value in seconds, or post-fix with a temporal unit, as in `timeout(1)`.\n\nBeware that useful values may vary widely, depending on the processor and parameters\n(esp. whether GPUs are used), the input image size, and any `PAGES` setting.\n\nIn the case of `PAGEWISE=1`, this applies to single-page calls.\n\nIf `FAILRETRY\u003e0`, then repeated attempts will each contribute to one overall timeout.\n\nIf `FAILDUMMY=1`, then timed out calls (as with any other cause of failure)\nwill be caught be `ocrd-dummy`, which may take up additional time.\n\n##### FAILRETRY\n\nTo try recovering from transient errors (like OOM or network disruption), set `FAILRETRY`\nto the number of attempts you want processors to make.\n\nWithout this, a failed step causes falling back to `ocrd-dummy` (if `FAILDUMMY=1`) or\nthe workflow to stop (otherwise) for that target workspace,\nremoving output pages already processed successfully (unless `PAGEWISE=1`).\n\n##### FAILDUMMY\n\nTo handle errors gracefully, set `FAILDUMMY=1`. This will run a `ocrd-dummy` on the respective file groups and pages,\nwhich effectively copies the input to the output annotation (so subsequent steps can continue on these pages).\n\nWithout this, a failed step causes the workflow to stop for that target workspace,\nremoving output pages already processed successfully (unless `PAGEWISE=1`).\n\n##### METSSERV\n\nTo use METS Servers for each workspace, set `METSSERV=1`. On each workspace, this will\n- start a METS Server in the background, using local Unix Domain Socket `mets.sock`, then\n- run the workflow, having processors communicate via the socket, and finally\n- stop the METS Server.\n\nThe METS Server avoids the cost of de/serialisation of the METS between processor calls,\nand thus increases efficiency. It also allows calling processors for pages independently\n(because the server synchronises METS updates, which the filesystem `mets.xml` cannot).\n\nSo a very useful combination is `METSSERV=1 PAGEWISE=1`. (In that combination, the top-level\nnumber of jobs, `-j`, and load-level, `-l`, will be distributed to the page-wise calls;\nsee below).\n\n##### PAGEWISE\n\nTo run processors on each page individually, set `PAGEWISE=1`. For each workflow step that needs an update,\nthis will call `make` recursively with `PAGES` set to each single page ID. (The top-level `PAGES`\nsetting is still respected, i.e. it only splits up the requested pages.)\n\nThis is most useful in combination with `FAILDUMMY=1` (for per-page error handling) and `METSSERV=1`\n(for parallel distribution).\n\nNote: the combination `PAGEWISE=1 METSSERV=1` will reserve all jobs (options `-j N` and `-l N`)\nfor the parallel pages instead of parallel documents.\n\n##### JOBDB\n\nTo generate an SQL database and feed it with the jobs' status, set `JOBDB` to some non-empty\nfile path. As soon as ocrd-make starts, it will create a new `jobs` table with the following schema:\n\n| *table header* | *description* |\n| --- | --- |\n| Seq | consecutive job number |\n| Host | remote host (`-X` option), if any |\n| Starttime | date started, if running |\n| JobRuntime | duration so far, if running |\n| Send | number of bytes sent, if any |\n| Receive | number of bytes received, if any |\n| Exitval | exit status (-1000 if not started) |\n| _Signal | interrupt signal, if any |\n| Command | (ocrd-)make command line |\n| V1 | workspace path |\n| Stdout | captured standard output |\n| Stderr | captured standard error |\n\nThis will use `sqlite3`, which (requires `libdbd-sqlite3-perl` to be installed and) is incapable\nof true concurrency, so you need to open the database in read-only mode, e.g.\n\n    sqlite3 \"file:$JOBDB?immutable=1\u0026mode=ro\" '.headers on' '.mode csv' 'SELECT * FROM jobs;'\n\nWithout this, only a CSV-formatted log file of finished jobs gets generated under `$CFGNAME.$$.log`\n(i.e. using the name of the workflow and process ID).\n\n##### Remote distribution\n\nTo run jobs on another machine (which has ocrd-make and the respecive OCR-D processors installed),\ntransferring the workflow configuration file and workspace directories prior to execution, and\nthe results afterwards, use `-X` or `--transfer`.\n\nIt takes as argument the remote host name and remote working directory, separated by a colon. In case\nthe installation on the remote side needs initialization after login, use `--remote-init` followed by\nthe respective command.\n\nExample:\n\n    ocrd-make -j 4 --remote-init \". ~/.bash_profile\" -X user@host.domain:/local -f CONFIGURATION.mk all\n\n\n### Customisation\n\nTo write new configurations, first choose a (sufficiently descriptive) makefile name, and spawn a new file for that: `make -C workflow-configuration NEW-CONFIGURATION.mk` (or copy from an existing configuration).\n\nNext, edit the file to your needs: Write rules using file groups as prerequisites/targets in the normal GNU make syntax. The first target defined must be the default goal that builds the very last file group for that configuration, or else a variable `.DEFAULT_GOAL` pointing to that target must be set anywhere in the makefile.\n\n#### Recommendations\n\n- Change/customize at least the `info` target, and the `INPUT` and `OUTPUT` name/rule.\n- Copy/paste rules from the existing configurations.\n- Define variables with the names of all target/prerequisite file groups, so rules and dependent targets can re-use them (and the names can be easily changed later).\n- Try to utilise the provided static pattern rule (which takes the target as output file group and the prerequisite as input file group) for all processing steps. The rule covers any OCR-D compliant processor with no more than 1 output file group. Use it by simply defining the target-specific variable `TOOL` (and optionally `PARAMS` or `OPTIONS`) and giving no recipe whatsoever.\n- When any of your processors use GPU resources, you must prevent races for GPU memory during parallel execution.\n  \n  You can achieve this by simply setting `GPU = 1` for that target when using the static pattern rule, or by using `sem --id OCR-D-GPUSEM` when writing your own recipes.\n  \n  Alternatively, you can either prevent using GPUs globally by (un)setting `CUDA_VISIBLE_DEVICES=`, or prevent running parallel jobs (on multiple CPUs) by passing `-j`.\n\n#### Example\n\n```make\nINPUT = OCR-D-GT-SEG-LINE\n\n$(INPUT):\n\tocrd workspace find -G $@ --download\n\tocrd workspace find -G OCR-D-IMG --download # just in case\n\n# You can use variables for file group names to keep the rules brief:\nBIN = $(INPUT)-BINPAGE\n\n# This is how you use the pattern rule from Makefile (included below):\n# The prerequisite will become the input file group,\n# the target will become the output file group,\n# the recipe will call the executable given by TOOL,\n# also generating a JSON parameter file from PARAMS:\n$(BIN): $(INPUT)\n$(BIN): TOOL = ocrd-olena-binarize\n$(BIN): PARAMS = \"impl\": \"sauvola-ms-split\"\n# or equivalently:\n$(BIN): OPTIONS = -P impl sauvola-ms-split\n\n# You can also use the file group names directly:\nOCR-D-OCR-TESS: $(BIN)\nOCR-D-OCR-TESS: TOOL = ocrd-tesserocr-recognize\nOCR-D-OCR-TESS: PARAMS = \"textequiv_level\": \"glyph\", \"model\": \"frk+deu\"\n# or equivalently:\nOCR-D-OCR-TESS: OPTIONS = -P textequiv_level glyph -P model frk+deu\n\n# This uses more than 1 input file group and no output file group,\n# which works with the standard recipe as well (but mind the ordering):\nEVAL: $(INPUT) OCR-D-OCR-TESS\nEVAL: TOOL = ocrd-cor-asv-ann-evaluate\n\n# Because the first target in this file was $(BIN),\n# we must override the default goal to be our desired overall target:\n.DEFAULT_GOAL = EVAL\n\n```\n\n### Testing\n\nTo run `ocrd-import` and `ocrd-make` (in various modes) on sample data, \nin the installation directory do:\n\n    make test\n\nThis is also used by the CI.\n\n### Results\n\n#### OCR-D ground truth\n\n:construction: these results are no longer meaningful and should be updated!\n\nFor the `data_structure_text/dta` repository, which includes both layout and text annotation down to the textline level, but very coarse segmentation, the following _character error rate_ (CER) was measured:\n\n| *pipeline configuration* | *CER* |\n| ---------- | ----- |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .243 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.241** |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .255 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .252 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .263 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .248 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .262 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .273 |\n| OCR-D-OCR-OCRO-fraktur-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .266 |\n| | |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .290 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.287** |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .301 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .296 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .317 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .292 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .314 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .325 |\n| OCR-D-OCR-OCRO-frakturjze-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .318 |\n| | |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .114 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.113** |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP| .127 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .121 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .122 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .118 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .122 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .124 |\n| OCR-D-OCR-TESS-Fraktur-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .123 |\n| | |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .117 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.116** |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP| .131 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .121 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .126 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .122 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .124 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .128 |\n| OCR-D-OCR-TESS-Fraktur+Latin-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .126 |\n| | |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .110 |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.109** |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .126 |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .119 |\n| OCR-D-OCR-TESS-frk-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .118 |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP| .115 |\n| OCR-D-OCR-TESS-frk-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .116 |\n| OCR-D-OCR-TESS-frk-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .120 |\n| OCR-D-OCR-TESS-frk-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .119 |\n| | |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .106 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.106** |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .122 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .114 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .113 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .111 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .112 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .117 |\n| OCR-D-OCR-TESS-frk+deu-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .115 |\n| | |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-CLIP-RESEG-DEWARP | **.078** |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | .081 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .094 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .085 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .089 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP| .084 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .090 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .091 |\n| OCR-D-OCR-TESS-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .094 |\n| | |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-CLIP-RESEG-DEWARP | .081 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-DESKEW-ocropy-CLIP-RESEG-DEWARP | **.074** |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .087 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-CLIP-RESEG-DEWARP | .084 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-CLIP-RESEG-DEWARP | .085 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP| .086 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-RESEG-DEWARP | .109 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-sauvola-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .090 |\n| OCR-D-OCR-CALA-gt4histocr-BINPAGE-wolf-DENOISE-ocropy-DESKEW-ocropy-CLIP-DESKEW-tesseract-RESEG-DEWARP | .110 |\n\nHence, it appears that consistently (across different OCRs) ...\n- denoising with Ocropy (with `noise_maxsize=3.0`) does _not_ help\n- deskewing with Ocropy on the page level usually helps\n- additional deskewing and flipping with Tesseract on the region level usually deteriorates\n- binarization with `sauvola-ms-split` is better than `wolf`\n\nHowever, this result is still _preliminary_. Both the processor implementations evolve and the GT annotations get fixed over time.\n\n### Implementation\n\nTo make writing (and reading) configurations as simple as possible, they are expressed as rules operating on METS file groups (i.e. workspace-local). For convenience, the most common recipe pattern involving only 1 input and 1 output file group via some OCR-D CLI is available via static pattern rule, which merely takes the target-specific variables `TOOL` (the CLI executable) and optionally `PARAMS` (a JSON-formatted list of parameter assignments) or `OPTIONS` (a white-space separated list of parameter assignments). Custom rules are possible as well. If the makefile does not start with the overall target, it must specify its `.DEFAULT_GOAL`, so callers can run without knowledge of the target names.\n\nRules that are not configuration-specific (like the static pattern rule) are all shared by including a common `Makefile` at the end of configuration makefiles (which gets copied from `workflow.mk` at install time).\n\n`make` always operates on the level of the workspace directory (i.e. only one at a time), where targets are fileGrps and the default goal is the maximum fileGrp.\n\nFor running entire collections of workspaces (possibly in parallel), recursive `make` has been abandoned in favour of the `parallel`-based `bash` script `ocrd-make`. Its command-line interface _looks_ like `make`, but the targets are workspaces and the default goal is `all` (which recursively `find`s all workspaces).\n\n:construction: we should explain the use of GNU `parallel` here.\n\n#### GPU vs CPU parallelism\n\nWhen executing workflows in parallel across workspaces (with `--jobs`) on multiple CPUs, it must be ensured that not too many OCR-D processors which use GPU resources are running concurrently (to prevent over-allocation of GPU memory). Thus, make needs to know:\n1. which processors (have/want to) use GPU resources, and\n2. how many such processors can run in parallel.\n\nIt can then synchronize these processors with a semaphore. This is achieved by expanding the static pattern rule with a synchronisation mechanism (based on GNU parallel). Workflow configurations can use that by setting the target-specific variable `GPU` to a non-empty value for the respective rules. (Custom recipes will have to use `sem --id OCR-D-GPUSEM`.)\n\nThat way, races are prevented, but also GPUs cannot become the bottleneck: When all GPUs are busy, processors will fall back to CPU.\n\n#### workspace vs page parallelism\n\nWhen executing workflows in parallel across workspaces (with `--jobs`) on multiple CPUs, it must be ensured that OCR-D processors do not use local multiprocessing facilities themselves (to prevent over-allocation of CPUs).\n\nIn the current state of affairs, OCR-D processors cannot be run in parallel across pages via multiprocessing. (At least, they are never implemented that way.) That may change in the future with a [new OCR-D API](https://github.com/OCR-D/core/issues/322). But still, many processors do already use libraries like OpenMP or OpenBLAS which use multiprocessing locally within pages. This can be controlled via _environment variables_ like `OMP_THREAD_LIMIT`.\n\nThis is achieved by exporting these variables to all recipes with a value of `1` when `-j` is used, or half the number of physical CPUs (unless `NTHREADS` is explicitly given) otherwise.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Fworkflow-configuration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertsky%2Fworkflow-configuration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Fworkflow-configuration/lists"}