{"id":21660201,"url":"https://github.com/slub/ocrd_controller","last_synced_at":"2026-02-27T03:31:07.498Z","repository":{"id":39856415,"uuid":"457987005","full_name":"slub/ocrd_controller","owner":"slub","description":"Path to network implementation of OCR-D","archived":false,"fork":false,"pushed_at":"2024-03-07T23:19:06.000Z","size":98,"stargazers_count":6,"open_issues_count":1,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-06-16T23:03:09.104Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-10T23:57:53.000Z","updated_at":"2024-08-02T02:23:17.000Z","dependencies_parsed_at":"2024-03-07T13:29:40.637Z","dependency_job_id":"dc35ec98-ae74-43ad-b949-74955fad9c35","html_url":"https://github.com/slub/ocrd_controller","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/slub/ocrd_controller","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Focrd_controller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Focrd_controller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Focrd_controller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Focrd_controller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slub","download_url":"https://codeload.github.com/slub/ocrd_controller/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Focrd_controller/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29883689,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T23:51:21.483Z","status":"online","status_checked_at":"2026-02-27T02:00:06.759Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2024-11-25T09:32:29.348Z","updated_at":"2026-02-27T03:31:07.470Z","avatar_url":"https://github.com/slub.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OCR-D Controller\n\n\u003e Path to network implementation of OCR-D\n\n1. In the simplest (and current) form, the controller will be a SSH login server for a full [command-line](https://ocr-d.de/en/spec/cli) [OCR-D](https://ocr-d.de) [installation](https://github.com/OCR-D/ocrd_all). \n   Files must be mounted locally (if they are network shares, this must be done on the host side running the container).\n2. Next, the SSH server can also dynamically receive and send data.\n3. The first true network implementation will offer an HTTP interface for processing (like the [workflow server](https://github.com/OCR-D/core/pull/652)).\n4. From there, the actual processing could be further delegated into different processing servers.\n5. A more powerful workflow engine would then offer instantiating different workflows, and monitoring jobs.\n6. In the final form, the controller will implement (most parts of) the OCR-D Web API.\n\n * [Usage](#usage)\n   * [Building](#building)\n   * [Starting and mounting](#starting-and-mounting)\n   * [General management](#general-management)\n   * [Processing](#processing)\n   * [Data transfer](#data-transfer)\n   * [Parallel options](#parallel-options)\n   * [Logging](#logging)\n * [See also](#see-also)\n\n\n## Usage\n\n### Building\n\nBuild or pull the Docker image:\n\n    make build # or docker pull ghcr.io/slub/ocrd_controller\n\n### Starting and mounting\n\nThen run the container – providing **host-side directories** for the volumes …\n\n * `DATA`: directory for data processing (including images or existing workspaces),  \n   defaults to current working directory\n * `MODELS`: directory for persistent storage of processor [resource files](https://ocr-d.de/en/models),  \n   defaults to `~/.local/share`; models will be under `./ocrd-resources/*`\n * `CONFIG`: directory for persistent storage of processor [resource list](https://ocr-d.de/en/models),  \n   defaults to `~/.config`; file will be under `./ocrd/resources.yml`\n\n… but also a file `KEYS` with public key **credentials** for log-in to the controller, and (optionally) some **environment variables** …\n\n * `WORKERS`: number of parallel jobs (i.e. concurrent login sessions for `ocrd`)\n    (should be set to match the available computing resources)\n * `UID`: numerical user identifier to be used by programs in the container  \n    (will affect the files modified/created); defaults to current user\n * `GID`: numerical group identifier to be used by programs in the container  \n    (will affect the files modified/created); defaults to current group\n * `UMASK`: numerical user mask to be used by programs in the container  \n    (will affect the files modified/created); defaults to 0002\n * `PORT`: numerical TCP port to expose the SSH server on the host side  \n    defaults to 8022 (for non-priviledged access)\n * `NETWORK` name of the Docker network to use  \n    defaults to `bridge` (the default Docker network)\n\n… thus, for **example**:\n\n    make run DATA=/mnt/workspaces MODELS=~/.local/share KEYS=~/.ssh/id_rsa.pub PORT=8022 WORKERS=3\n\n### General management\n\nThen you can **log in** as user `ocrd` from remote (but let's use `controller` in the following – \nwithout loss of generality):\n\n    ssh -p 8022 ocrd@controller bash -i\n\nUnless you already have the data in [workspaces](https://ocr-d.de/en/spec/glossary#workspace), \nyou need to [**create workspaces**](https://ocr-d.de/en/user_guide#preparing-a-workspace) prior to processing.\nFor example:\n\n    ssh -p 8022 ocrd@controller \"ocrd-import -P some-document\"\n\nFor actual processing, you will first need to [**download some models**](https://ocr-d.de/en/models)\ninto your `MODELS` volume:\n\n    ssh -p 8022 ocrd@controller \"ocrd resmgr download ocrd-tesserocr-recognize *\"\n\n### Processing\n\nSubsequently, you can use these models on your `DATA` files:\n\n    ssh -p 8022 ocrd@controller \"ocrd process -m some-document/mets.xml 'tesserocr-recognize -P segmentation_level region -P model Fraktur'\"\n    # or equivalently:\n    ssh -p 8022 ocrd@controller \"ocrd-tesserocr-recognize -m some-document/mets.xml -P segmentation_level region -P model Fraktur\"\n\n### Data transfer\n\nIf your data files cannot be directly mounted on the host (not even as a network share),\nthen you can use `rsync`, `scp` or `sftp` to transfer them to the server:\n\n    rsync --port 8022 -av some-directory ocrd@controller:/data\n    scp -P 8022 -r some-directory ocrd@controller:/data\n    echo put some-directory /data | sftp -P 8022 ocrd@controller\n\nAnalogously, to transfer the results back:\n\n    rsync --port 8022 -av ocrd@controller:/data/some-directory .\n    scp -P 8022 -r ocrd@controller:/data/some-directory .\n    echo get /data/some-directory | sftp -P 8022 ocrd@controller\n\n### Parallel options\n\nFor parallel processing, you can either\n- run multiple processes on a single controller by\n  - logging in multiple times, or\n  - issuing parallel commands –\n    * via basic shell scripting\n    * via [ocrd-make](https://bertsky.github.io/workflow-configuration) calls\n- run processes on multiple controllers.\n\nNote: internally, `WORKERS` is implemented as a (GNU parallel-based) semaphore\nwrapping the SSH sessions inside blocking `sem --fg` calls within .ssh/rc.\nThus, commands will get queued but not processed until a 'worker' is free.\n\n### Logging\n\nAll logs are accumulated on standard output, which can be inspected via Docker:\n\n    docker logs ocrd_controller\n\n## See also\n\n- [Meta-repo for integration of Kitodo.Production with OCR-D in Docker](https://github.com/slub/ocrd_kitodo)\n- [Sister component OCR-D Manager](https://github.com/slub/ocrd_manager)\n\n## Maintainer\n\nIf you have any questions or encounter any problems, please do not hesitate to contact me.\n\n- [Robert Sachunsky](https://github.com/bertsky)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Focrd_controller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslub%2Focrd_controller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Focrd_controller/lists"}