{"id":24951930,"url":"https://github.com/ocr-d/ocrd_kraken","last_synced_at":"2025-04-10T12:51:38.853Z","repository":{"id":38417997,"uuid":"129371698","full_name":"OCR-D/ocrd_kraken","owner":"OCR-D","description":"Wrapper for the kraken OCR engine","archived":false,"fork":false,"pushed_at":"2025-03-12T16:48:20.000Z","size":263,"stargazers_count":13,"open_issues_count":3,"forks_count":6,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-24T11:38:27.705Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-13T08:19:18.000Z","updated_at":"2025-03-12T16:48:25.000Z","dependencies_parsed_at":"2024-10-28T09:54:59.414Z","dependency_job_id":"886a4ca5-49a6-478e-a2a9-c15dcd3bc064","html_url":"https://github.com/OCR-D/ocrd_kraken","commit_stats":{"total_commits":112,"total_committers":7,"mean_commits":16.0,"dds":0.2589285714285714,"last_synced_commit":"802c6b0b76a3e75070c680aa3b19d36142decf4e"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_kraken","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_kraken/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_kraken/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_kraken/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/ocrd_kraken/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248220168,"owners_count":21067247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2025-02-03T01:32:29.738Z","updated_at":"2025-04-10T12:51:38.834Z","avatar_url":"https://github.com/OCR-D.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ocrd_kraken\n\n\u003e OCR-D wrapper for the Kraken OCR engine\n\n[![CI](https://github.com/OCR-D/ocrd_kraken/actions/workflows/ci.yml/badge.svg)](https://github.com/OCR-D/ocrd_kraken/actions/workflows/ci.yml)\n[![Docker Automated build](https://img.shields.io/docker/automated/ocrd/kraken.svg)](https://hub.docker.com/r/ocrd/kraken/tags/)\n[![image](https://circleci.com/gh/OCR-D/ocrd_kraken.svg?style=svg)](https://circleci.com/gh/OCR-D/ocrd_kraken)\n\n## Introduction\n\nThis package offers [OCR-D](https://ocr-d.de/en/spec) compliant [workspace processors](https://ocr-d.de/en/spec/cli)\nfor (some of) the functionality of [Kraken](https://kraken.re).\n\n(Each processor is a parameterizable step in a configurable [workflow](https://ocr-d.de/en/workflows)\nof the [OCR-D functional model](https://ocr-d.de/en/about).\nThere are usually various alternative processor implementations for each step.\nData is represented with [METS](https://ocr-d.de/en/spec/mets) and [PAGE](https://ocr-d.de/en/spec/page).)\n\nIt includes image preprocessing (binarization), layout analysis (region and line+baseline segmentation), and text recognition.\n\n## Installation\n\n### With Docker\n\nThis is the best option if you want to run the software in a container.\n\nYou need to have [Docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/)\n\n\n    docker pull ocrd/kraken\n\n\nTo run with Docker:\n\n\n    docker run --rm \\\n    -v path/to/workspaces:/data \\\n    -v path/to/models:/usr/local/share/ocrd-resources \\\n    ocrd/kraken ocrd-kraken-recognize ...\n    # or ocrd-kraken-segment or ocrd-kraken-binarize\n\n\n### Native, from PyPI\n\nThis is the best option if you want to use the stable, released version.\n\n    pip install ocrd_kraken\n\n\n### Native, from git\n\nUse this option if you want to change the source code or install the latest, unpublished changes.\n\nWe strongly recommend to use [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).\n\n    git clone https://github.com/OCR-D/ocrd_kraken\n    cd ocrd_kraken\n    sudo make deps-ubuntu # or manually from git or via ocrd_all\n    make deps        # or pip install -r requirements.txt\n    make install     # or pip install .\n\n## Models\n\nKraken uses data-driven (neural) models for segmentation and recognition, but comes with no pretrained \"official\" models.\nThere is a [public repository](https://zenodo.org/communities/ocr_models) of community-provided models, which can also\nbe queried and downloaded from via `kraken` standalone CLI.\n(See [Kraken docs](https://kraken.re/master/advanced.html#repo) for details.)\n\nFor the OCR-D wrapper, since all OCR-D processors must resolve file/data resources in a [standardized way](https://ocr-d.de/en/spec/cli#processor-resources), there is a general mechanism for managing models, i.e. installing and using them by name. We currently manage our own list of recommended models (without delegating to the above repo).\n\nModels always use the filename suffix `.mlmodel`, but are just loaded by their basename.\n\nSee the [OCR-D model guide](https://ocr-d.de/en/models) and\n\n    ocrd resmgr --help\n\n## Usage\n\nFor details, see docstrings in the individual processors and [ocrd-tool.json](ocrd_tesserocr/ocrd-tool.json) descriptions,\nor simply `--help`.\n\nAvailable [OCR-D processors](https://ocr-d.de/en/spec/cli) are:\n\n- [ocrd-kraken-binarize](ocrd_kraken/binarize.py) (nlbin – not recommended)  \n  - adds `AlternativeImage` files (per page, region or line) to the output fileGrp\n- [ocrd-kraken-segment](ocrd_kraken/segment.py) (all-in-one segmentation – recommended for handwriting and simply layouted prints, or as pure line segmentation)  \n  - adds `TextRegion`s to `Page` (if `level-of-operation=page`) or `TableRegion`s (if `table`)\n  - adds `TextLine`s (with `Baseline`) to `TextRegion`s (for all `level-of-operation`)\n  - masks existing segments during detection (unless `overwrite_segments`)\n- [ocrd-kraken-recognize](ocrd_kraken/recognize.py) (benefits from annotated `Baseline`s, falls back to center-normalized bboxes)\n  - adds `Word`s to `TextLine`s\n  - adds `Glyph`s to `Word`s\n  - adds `TextEquiv` (removing existing `TextEquiv` if `overwrite_text`)\n\n## Testing\n\n    make test\n\n\nThis downloads test data from https://github.com/OCR-D/assets under `repo/assets`, and runs some basic tests of the Python API.\n\nSet `PYTEST_ARGS=\"-s --verbose\"` to see log output (`-s`) and individual test results (`--verbose`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_kraken","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Focrd_kraken","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_kraken/lists"}