{"id":17182410,"url":"https://github.com/bertsky/ocrd_doxa","last_synced_at":"2025-04-13T17:52:46.161Z","repository":{"id":57447718,"uuid":"416418917","full_name":"bertsky/ocrd_doxa","owner":"bertsky","description":"OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding","archived":false,"fork":false,"pushed_at":"2024-09-30T18:39:42.000Z","size":17,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-11T13:24:58.772Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bertsky.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-12T16:41:59.000Z","updated_at":"2024-09-30T18:39:45.000Z","dependencies_parsed_at":"2022-09-15T22:12:58.761Z","dependency_job_id":null,"html_url":"https://github.com/bertsky/ocrd_doxa","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_doxa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_doxa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_doxa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bertsky%2Focrd_doxa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bertsky","download_url":"https://codeload.github.com/bertsky/ocrd_doxa/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248758449,"owners_count":21156957,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2024-10-15T00:37:00.731Z","updated_at":"2025-04-13T17:52:46.141Z","avatar_url":"https://github.com/bertsky.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/ocrd-doxa.svg)](https://badge.fury.io/py/ocrd-doxa)\n[![Docker Image CD](https://github.com/bertsky/ocrd_doxa/actions/workflows/docker-image.yml/badge.svg)](https://github.com/bertsky/ocrd_doxa/actions/workflows/docker-image.yml)\n\n# ocrd_wrap\n\n    OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding\n\n  * [Introduction](#introduction)\n  * [Installation](#installation)\n  * [Usage](#usage)\n     * [OCR-D processor interface ocrd-doxa-binarize](#ocr-d-processor-interface-ocrd-doxa-binarize)\n  * [Testing](#testing)\n\n\n## Introduction\n\nThis offers [OCR-D](https://ocr-d.de) compliant [workspace processors](https://ocr-d.de/en/spec/cli) for\nbinarization via [Doxa](https://github.com/brandonmpetty/Doxa) (using its native [Python bindings](https://github.com/brandonmpetty/Doxa/tree/master/Bindings/Python)).\n\nIt is itself written in Python, and relies heavily on the\n[OCR-D core API](https://github.com/OCR-D/core). This is\nresponsible for handling METS/PAGE, and providing the OCR-D\nCLI.\n\n## Installation\n\nCreate and activate a [virtual environment](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) as usual.\n\nTo install Python dependencies:\n\n    make deps\n\nWhich is the equivalent of:\n\n    pip install -r requirements.txt\n\nTo install this module, then do:\n\n    make install\n\nWhich is the equivalent of:\n\n    pip install .\n\n## Usage\n\n### [OCR-D processor](https://ocr-d.github.io/cli) interface `ocrd-doxa-binarize`\n\nTo be used with [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) documents in an [OCR-D](https://ocr-d.github.io/) annotation workflow.\n\n```\nocrd-doxa-binarize -h\n\nUsage: ocrd-doxa-binarize [OPTIONS]\n\n  binarize via locally adaptive thresholding\n\nOptions:\n  -I, --input-file-grp USE        File group(s) used as input\n  -O, --output-file-grp USE       File group(s) used as output\n  -g, --page-id ID                Physical page ID(s) to process\n  --overwrite                     Remove existing output pages/images\n                                  (with --page-id, remove only those)\n  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string\n                                  or JSON file path\n  -P, --param-override KEY VAL    Override a single JSON object key-value pair,\n                                  taking precedence over --parameter\n  -m, --mets URL-PATH             URL or file path of METS to process\n  -w, --working-dir PATH          Working directory of local workspace\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME\n  -L, --list-resources            List names of processor resources\n  -J, --dump-json                 Dump tool description as JSON and exit\n  -h, --help                      This help message\n  -V, --version                   Show version\n\nParameters:\n   \"dpi\" [number - 0]\n    pixel density in dots per inch (overrides any meta-data in the\n    images); disabled when zero\n   \"level-of-operation\" [string - \"page\"]\n    PAGE XML hierarchy level to operate on\n    Possible values: [\"page\", \"region\", \"line\"]\n   \"algorithm\" [string - \"ISauvola\"]\n    Thresholding algorithm to use.\n    Possible values: [\"Otsu\", \"Bernsen\", \"Niblack\", \"Sauvola\", \"Wolf\",\n    \"Gatos\", \"NICK\", \"Su\", \"Singh\", \"Bataineh\", \"ISauvola\", \"WAN\"]\n   \"parameters\" [object - {}]\n    Dictionary of algorithm-specific parameters. Unless overridden here,\n    the following defaults are used:\n\tBernsen:        {'window': 75, 'threshold': 100, 'contrast-limit': 25}\n\tNICK:           {'window': 75, 'k': -0.2}\n\tNiblack:        {'window': 75, 'k': 0.2}\n\tSingh:          {'window': 75, 'k', 0.2}\n\tGatos:          {'glyph': 60}\n\tSauvola:        {'window': 75, 'k': 0.2}\n\tWolf:           {'window': 75, 'k': 0.2}\n\tWAN:            {'window': 75, 'k': 0.2}\n\tSu:             {'window': 0 (based on stroke size), \n                     'minN':  windowSize (roughly based on size of window)}\n\n   (window/glyph sizes are in px, threshold/limits in uint8 [0,255])\n```\n\n## Testing\n\nnone yet\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_doxa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbertsky%2Focrd_doxa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbertsky%2Focrd_doxa/lists"}