{"id":17182414,"url":"https://github.com/bertsky/ocrd_wrap","last_synced_at":"2025-04-13T16:31:25.871Z","repository":{"id":54223328,"uuid":"270846544","full_name":"bertsky/ocrd_wrap","owner":"bertsky","description":"OCR-D wrapper for arbitrary coords-preserving image operations","archived":false,"fork":false,"pushed_at":"2024-09-30T18:54:22.000Z","size":48,"stargazers_count":4,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-31T21:51:38.420Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bertsky.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-08T22:36:23.000Z","updated_at":"2024-09-30T18:54:25.000Z","dependencies_parsed_at":"2022-08-13T09:31:08.396Z","dependency_job_id":null,"html_url":"https://github.com/bertsky/ocrd_wrap","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_wrap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_wrap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_wrap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_wrap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bertsky","download_url":"https://codeload.github.com/bertsky/ocrd_wrap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223597074,"owners_count":17170872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2024-10-15T00:37:01.251Z","updated_at":"2025-04-13T16:31:25.864Z","avatar_url":"https://github.com/bertsky.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/ocrd-wrap.svg)](https://badge.fury.io/py/ocrd-wrap)\n[![Pytest CI](https://github.com/bertsky/ocrd_wrap/actions/workflows/ci.yml/badge.svg)](https://github.com/bertsky/ocrd_wrap/actions/workflows/ci.yml)\n[![Docker Image CD](https://github.com/bertsky/ocrd_wrap/actions/workflows/docker-image.yml/badge.svg)](https://github.com/bertsky/ocrd_wrap/actions/workflows/docker-image.yml)\n\n# ocrd_wrap\n\n    OCR-D wrapper for arbitrary coords-preserving image operations\n\n  * [Introduction](#introduction)\n  * [Installation](#installation)\n  * [Usage](#usage)\n     * [OCR-D processor interface ocrd-preprocess-image](#ocr-d-processor-interface-ocrd-preprocess-image)\n     * [OCR-D processor interface ocrd-skimage-normalize](#ocr-d-processor-interface-ocrd-skimage-normalize)\n     * [OCR-D processor interface ocrd-skimage-denoise-raw](#ocr-d-processor-interface-ocrd-skimage-denoise-raw)\n     * [OCR-D processor interface ocrd-skimage-binarize](#ocr-d-processor-interface-ocrd-skimage-binarize)\n     * [OCR-D processor interface ocrd-skimage-denoise](#ocr-d-processor-interface-ocrd-skimage-denoise)\n  * [Testing](#testing)\n\n\n## Introduction\n\nThis offers [OCR-D](https://ocr-d.de) compliant [workspace processors](https://ocr-d.de/en/spec/cli) for\nany image processing tools which have some (usable) CLI\nand do not modify/invalidate image coordinates.\n\nIt thus _wraps_ them for OCR-D without the need\nto write and manage code for each of them individually\n(exposing/passing/documenting their parameters and usage,\nmanaging releases etc). It shifts all the burden to\n**workflow configuration** (i.e. defining a suitable\nparameter set on how to call what program on what data,\nand installing all the required tools).\n\nIt is itself written in Python, and relies heavily on the\n[OCR-D core API](https://github.com/OCR-D/core). This is\nresponsible for handling METS/PAGE, and providing the OCR-D\nCLI.\n\nIn addition, this aims to wrap existing Python packages\nfor preprocessing as OCR-D processors (one at a time).\n\n## Installation\n\nCreate and activate a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) as usual.\n\nTo install Python dependencies:\n\n    make deps\n\nWhich is the equivalent of:\n\n    pip install -r requirements.txt\n\nTo install this module, then do:\n\n    make install\n\nWhich is the equivalent of:\n\n    pip install .\n\n\nAlternatively, download the prebuilt image from Dockerhub:\n\n    docker pull ocrd/wrap\n\n\n## Usage\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-preprocess-image`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-preprocess-image [worker|server] [OPTIONS]\n\n  Convert or enhance images\n\n  \u003e Performs coords-preserving image operations via runtime shell calls\n  \u003e anywhere.\n\n  \u003e Open and deserialize PAGE input files and their respective images,\n  \u003e then iterate over the element hierarchy down to the requested\n  \u003e ``level-of-operation`` in the element hierarchy.\n\n  \u003e For each segment element, retrieve a segment image according to the\n  \u003e layout annotation (from an existing AlternativeImage, or by cropping\n  \u003e via coordinates into the higher-level image, and - when applicable -\n  \u003e deskewing.\n\n  \u003e If ``input_feature_selector`` and/or ``input_feature_filter`` is\n  \u003e non-empty, then select/filter among the @imageFilename image and the\n  \u003e available AlternativeImages the last one which contains all of the\n  \u003e selected, but none of the filtered features (i.e. @comments\n  \u003e classes), or raise an error.\n\n  \u003e Then write that image into a temporary PNG file, create a new METS\n  \u003e file ID for the result image (based on the segment ID and the\n  \u003e operation to be run), along with a local path for it, and pass\n  \u003e ``command`` to the shell after replacing: - the string ``@INFILE``\n  \u003e with that input image path, and - the string ``@OUTFILE`` with that\n  \u003e output image path.\n\n  \u003e If the shell returns with a failure, skip that segment with an\n  \u003e approriate error message. Otherwise, add the new image to the\n  \u003e workspace along with the output fileGrp, and using a file ID with\n  \u003e suffix ``.IMG-``, and further identification of the input element.\n\n  \u003e Reference it as AlternativeImage in the element, adding\n  \u003e ``output_feature_added`` to its @comments.\n\n  \u003e Produce a new PAGE output file by serialising the resulting\n  \u003e hierarchy.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\", \"word\", \"glyph\"]\n   \"input_feature_selector\" [string - \"\"]\n    comma-separated list of required image features (e.g.\n    binarized,despeckled)\n   \"input_feature_filter\" [string - \"\"]\n    comma-separated list of forbidden image features (e.g.\n    binarized,despeckled)\n   \"output_feature_added\" [string - REQUIRED]\n    image feature(s) to be added after this operation (if multiple,\n    separate by comma)\n   \"input_mimetype\" [string - \"image/png\"]\n    File format to save input images to (tool's expected input)\n    Possible values: [\"image/bmp\", \"application/postscript\", \"image/gif\",\n    \"image/jpeg\", \"image/jp2\", \"image/png\", \"image/x-portable-pixmap\",\n    \"image/tiff\"]\n   \"output_mimetype\" [string - \"image/png\"]\n    File format to load output images from (tool's expected output)\n    Possible values: [\"image/bmp\", \"application/postscript\", \"image/gif\",\n    \"image/jpeg\", \"image/jp2\", \"image/png\", \"image/x-portable-pixmap\",\n    \"image/tiff\"]\n   \"command\" [string - REQUIRED]\n    shell command to operate on image files, with @INFILE as place-holder\n    for the input file path, and @OUTFILE as place-holder for the output\n    file path\n```\n\n#### presets\n\nThe following example recipes are included in the distribution:\n- enhancement/conversion/denoising using\n  - [x] ImageMagick: [param_im6convert-denoise-raw](ocrd_wrap/param_im6convert-denoise-raw.json)\n  - [ ] GIMP [script-fu](https://gitlab.gnome.org/GNOME/gimp/-/tree/master/plug-ins/script-fu/scripts)\n  - [ ] ...\n- binarization using \n  - [x] Olena/Scribo: [param_scribo-cli-binarize-sauvola-ms-split](ocrd_wrap/param_scribo-cli-binarize-sauvola-ms-split.json)\n  - [ ] https://github.com/ajgallego/document-image-binarization ...\n  - [ ] https://github.com/qurator-spk/sbb_binarization ...\n  - [ ] https://github.com/masyagin1998/robin ...\n  - [ ] ...\n- text/non-text segmentation using\n  - [ ] Olena/Scribo ...\n  - [ ] ...\n- ...\n\nThese presets will be distributed with as package resources and resolve by their filename, e.g. ...\n\n    ocrd-preprocess-image -p param_scribo-cli-binarize-sauvola-ms-split.json -I OCR-D-IMG -O OCR-D-BIN-OLENA\n\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-skimage-normalize`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-skimage-normalize [worker|server] [OPTIONS]\n\n  Equalize contrast/exposure of images with Scikit-image; stretches the color value/tone to the full dynamic range\n\n  \u003e Performs contrast-enhancing equalization of segment or page images\n  \u003e with scikit-image on the workspace.\n\n  \u003e Open and deserialize PAGE input files and their respective images,\n  \u003e then iterate over the element hierarchy down to the requested\n  \u003e ``level-of-operation`` in the element hierarchy.\n\n  \u003e For each segment element, retrieve a segment image according to the\n  \u003e layout annotation (from an existing AlternativeImage, or by cropping\n  \u003e via coordinates into the higher-level image, and - when applicable -\n  \u003e deskewing), in raw (non-binarized) form.\n\n  \u003e Next, normalize the image according to ``method`` in skimage.\n\n  \u003e Then write the new image to the workspace along with the output\n  \u003e fileGrp, and using a file ID with suffix ``.IMG-NRM`` with further\n  \u003e identification of the input element.\n\n  \u003e Produce a new PAGE output file by serialising the resulting\n  \u003e hierarchy.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\", \"word\", \"glyph\"]\n   \"dpi\" [number - 0]\n    pixel density in dots per inch (overrides any meta-data in the\n    images); disabled when zero\n   \"black-point\" [number - 1.0]\n    black point point in percent of luminance/value/tone histogram; up to\n    ``black-point`` darkest pixels will be clipped to black when\n    stretching\n   \"white-point\" [number - 7.0]\n    white point in percent of luminance/value/tone histogram; up to\n    ``white-point`` brightest pixels will be clipped to white when\n    stretching\n   \"method\" [string - \"stretch\"]\n    contrast-enhancing transformation to use after clipping; ``stretch``\n    uses ``skimage.exposure.rescale_intensity`` (globally linearly\n    stretching to full dynamic range) and ``adapthist`` uses\n    ``skimage.exposure.equalize_adapthist`` (applying over tiles with\n    context from 1/8th of the image's width)\n    Possible values: [\"stretch\", \"adapthist\"]\n```\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-skimage-denoise-raw`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-skimage-denoise-raw [worker|server] [OPTIONS]\n\n  Denoise raw images with Scikit-image\n\n  \u003e Performs raw denoising of segment or page images with scikit-image\n  \u003e on the workspace.\n\n  \u003e Open and deserialize PAGE input files and their respective images,\n  \u003e then iterate over the element hierarchy down to the requested\n  \u003e ``level-of-operation`` in the element hierarchy.\n\n  \u003e For each segment element, retrieve a segment image according to the\n  \u003e layout annotation (from an existing AlternativeImage, or by cropping\n  \u003e via coordinates into the higher-level image, and - when applicable -\n  \u003e deskewing), in raw (non-binarized) form.\n\n  \u003e Next, denoise the image with a Wavelet transform scheme according to\n  \u003e ``method`` in skimage.\n\n  \u003e Then write the new image to the workspace along with the output\n  \u003e fileGrp, and using a file ID with suffix ``.IMG-DEN`` with further\n  \u003e identification of the input element.\n\n  \u003e Produce a new PAGE output file by serialising the resulting\n  \u003e hierarchy.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\", \"word\", \"glyph\"]\n   \"dpi\" [number - 0]\n    pixel density in dots per inch (overrides any meta-data in the\n    images); disabled when zero\n   \"method\" [string - \"VisuShrink\"]\n    Wavelet filtering scheme to use\n    Possible values: [\"BayesShrink\", \"VisuShrink\"]\n```\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-skimage-binarize`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-skimage-binarize [worker|server] [OPTIONS]\n\n  Binarize images with Scikit-image\n\n  \u003e Performs binarization of segment or page images with scikit-image on\n  \u003e the workspace.\n\n  \u003e Open and deserialize PAGE input files and their respective images,\n  \u003e then iterate over the element hierarchy down to the requested\n  \u003e ``level-of-operation`` in the element hierarchy.\n\n  \u003e For each segment element, retrieve a segment image according to the\n  \u003e layout annotation (from an existing AlternativeImage, or by cropping\n  \u003e via coordinates into the higher-level image, and - when applicable -\n  \u003e deskewing).\n\n  \u003e Next, binarize the image according to ``method`` with skimage.\n\n  \u003e Then write the new image to the workspace along with the output\n  \u003e fileGrp, and using a file ID with suffix ``.IMG-BIN`` with further\n  \u003e identification of the input element.\n\n  \u003e Produce a new PAGE output file by serialising the resulting\n  \u003e hierarchy.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\", \"word\", \"glyph\"]\n   \"dpi\" [number - 0]\n    pixel density in dots per inch (overrides any meta-data in the\n    images); disabled when zero\n   \"method\" [string - \"sauvola\"]\n    Thresholding algorithm to use\n    Possible values: [\"sauvola\", \"niblack\", \"otsu\", \"gauss\", \"yen\", \"li\"]\n   \"window_size\" [number - 0]\n    For Sauvola/Niblack/Gauss, the (odd) window size in pixels; when zero\n    (default), set to DPI\n   \"k\" [number - 0.34]\n    For Sauvola/Niblack, formula parameter influencing the threshold\n    bias; larger is lighter foreground\n```\n\n### [OCR-D processor](https://ocr-d.de/en/spec/cli) interface `ocrd-skimage-denoise`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.de/en/about) annotation workflow.\n\n```\nUsage: ocrd-skimage-denoise [worker|server] [OPTIONS]\n\n  Denoise binarized images with Scikit-image\n\n  \u003e Performs binary denoising of segment or page images with scikit-\n  \u003e image on the workspace.\n\n  \u003e Open and deserialize PAGE input files and their respective images,\n  \u003e then iterate over the element hierarchy down to the requested\n  \u003e ``level-of-operation`` in the element hierarchy.\n\n  \u003e For each segment element, retrieve a segment image according to the\n  \u003e layout annotation (from an existing AlternativeImage, or by cropping\n  \u003e via coordinates into the higher-level image, and - when applicable -\n  \u003e deskewing), in binarized form.\n\n  \u003e Next, denoise the image by removing too small connected components\n  \u003e with skimage. (If ``protect`` is non-zero, then avoid removing\n  \u003e specks near large connected components up to that distance.)\n\n  \u003e Then write the new image to the workspace along with the output\n  \u003e fileGrp, and using a file ID with suffix ``.IMG-DEN`` with further\n  \u003e identification of the input element.\n\n  \u003e Produce a new PAGE output file by serialising the resulting\n  \u003e hierarchy.\n\nSubcommands:\n    worker      Start a processing worker rather than do local processing\n    server      Start a processor server rather than do local processing\n\nOptions for processing:\n  -m, --mets URL-PATH             URL or file path of METS to process [./mets.xml]\n  -w, --working-dir PATH          Working directory of local workspace [dirname(URL-PATH)]\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process instead of full document []\n  --overwrite                     Remove existing output pages/images\n                                  (with \"--page-id\", remove only those).\n                                  Short-hand for OCRD_EXISTING_OUTPUT=OVERWRITE\n  --debug                         Abort on any errors with full stack trace.\n                                  Short-hand for OCRD_MISSING_OUTPUT=ABORT\n  --profile                       Enable profiling\n  --profile-file PROF-PATH        Write cProfile stats to PROF-PATH. Implies \"--profile\"\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -U, --mets-server-url URL       URL of a METS Server for parallel incremental access to METS\n                                  If URL starts with http:// start an HTTP server there,\n                                  otherwise URL is a path to an on-demand-created unix socket\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Override log level globally [INFO]\n  --log-filename LOG-PATH         File to redirect stderr logging to (overriding ocrd_logging.conf).\n\nOptions for information:\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON\n  -D, --dump-module-dir           Show the 'module' resource location path for this processor\n  -h, --help                      Show this message\n  -V, --version                   Show version\n\nParameters:\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\", \"word\", \"glyph\"]\n   \"dpi\" [number - 0]\n    pixel density in dots per inch (overrides any meta-data in the\n    images); disabled when zero\n   \"protect\" [number - 0.0]\n    avoid removing fg specks near larger fg components by up to this\n    distance in pt\n   \"maxsize\" [number - 1.0]\n    maximum component size of (bg holes or fg specks) noise in pt\n```\n\n## Testing\n\nTo install Python dependencies:\n\n    make deps-test\n\nWhich is the equivalent of:\n\n    pip install -r requirements_test.txt\n\nTo install this module, then do:\n\n    make test\n\nWhich is the equivalent of:\n\n    pytest tests\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_wrap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertsky%2Focrd_wrap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_wrap/lists"}