{"id":24951934,"url":"https://github.com/ocr-d/ocrd_all","last_synced_at":"2025-04-08T03:17:58.148Z","repository":{"id":38452073,"uuid":"216225963","full_name":"OCR-D/ocrd_all","owner":"OCR-D","description":"Master repository which includes most other OCR-D repositories as submodules","archived":false,"fork":false,"pushed_at":"2025-03-22T19:41:23.000Z","size":1353,"stargazers_count":72,"open_issues_count":29,"forks_count":18,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-28T19:44:42.767Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":null,"language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-19T15:15:29.000Z","updated_at":"2025-02-11T15:08:13.000Z","dependencies_parsed_at":"2024-02-20T17:47:35.470Z","dependency_job_id":"881985c5-984d-4b12-b307-dbddb1a5f7a2","html_url":"https://github.com/OCR-D/ocrd_all","commit_stats":null,"previous_names":[],"tags_count":78,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_all","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_all/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_all/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_all/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/ocrd_all/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247767237,"owners_count":20992548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2025-02-03T01:32:33.631Z","updated_at":"2025-04-08T03:17:58.125Z","avatar_url":"https://github.com/OCR-D.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OCR-D/ocrd_all\n\n[![Built on CircleCI](https://circleci.com/gh/OCR-D/ocrd_all.svg?style=svg)](https://circleci.com/gh/OCR-D/ocrd_all)\n[![MIT licensed](https://img.shields.io/github/license/OCR-D/ocrd_all)](https://github.com/OCR-D/ocrd_all/blob/master/LICENSE)\n[![](https://images.microbadger.com/badges/image/ocrd/all:latest.svg)](https://hub.docker.com/r/ocrd/all)\n\nThis controls installation of all OCR-D modules from source (as git submodules).\n\nIt includes a Makefile for their installation into a virtual environment (venv) or Docker container.\n\n(A [venv](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments)\nis a local user directory with shell scripts to load/unload itself\nin the current shell environment via PATH and PYTHONHOME.)\n\n\u003e **Note**: If you are going to install ocrd_all, you may want to first consult\n\u003e the [OCR-D setup guide](https://ocr-d.de/en/setup) on the [OCR-D website](https://ocr-d.de).\n\u003e If you are a non-IT user, it is especially recommended you utilize the guide.\n\n* [Prerequisites](#prerequisites)\n    * [Space](#space)\n    * [Locale](#locale)\n    * [System packages](#system-packages)\n    * [GPU support](#gpu-support)\n * [Usage](#usage)\n    * [Targets](#targets)\n       * [\u003cem\u003edeps-ubuntu\u003c/em\u003e](#deps-ubuntu)\n       * [\u003cem\u003edeps-cuda\u003c/em\u003e](#deps-cuda)\n       * [\u003cem\u003emodules\u003c/em\u003e](#modules)\n       * [\u003cem\u003eocrd\u003c/em\u003e](#ocrd)\n       * [\u003cem\u003eall\u003c/em\u003e](#all)\n       * [\u003cem\u003edocker\u003c/em\u003e](#docker)\n       * [\u003cem\u003edockers\u003c/em\u003e](#dockers)\n       * [\u003cem\u003eclean\u003c/em\u003e](#clean)\n       * [\u003cem\u003eshow\u003c/em\u003e](#show)\n       * [\u003cem\u003ehelp\u003c/em\u003e (default goal)](#help-default-goal)\n       * [\u003cem\u003e[any module name]\u003c/em\u003e](#any-module-name)\n       * [\u003cem\u003e[any executable name]\u003c/em\u003e](#any-executable-name)\n    * [Variables](#variables)\n       * [\u003cem\u003eOCRD_MODULES\u003c/em\u003e](#ocrd_modules)\n       * [\u003cem\u003eNO_UPDATE\u003c/em\u003e](#no_update)\n       * [\u003cem\u003ePYTHON\u003c/em\u003e](#python)\n       * [\u003cem\u003eVIRTUAL_ENV\u003c/em\u003e](#virtual_env)\n       * [\u003cem\u003eTMPDIR\u003c/em\u003e](#tmpdir)\n       * [\u003cem\u003ePIP_OPTIONS\u003c/em\u003e](#pip_options)\n       * [\u003cem\u003eGIT_RECURSIVE\u003c/em\u003e](#git_recursive)\n    * [Examples](#examples)\n    * [Results](#results)\n    * [Persistent configuration](#persistent-configuration)\n    * [Docker Hub](#docker-hub)\n * [Challenges](#challenges)\n    * [No published/recent version on PyPI](#no-publishedrecent-version-on-pypi)\n    * [Conflicting requirements](#conflicting-requirements)\n    * [System requirements](#system-requirements)\n  * [Contributing](#contributing)\n\n## Prerequisites\n\n### Space\n\nMake sure that there is enough free disk space. For a **full installation** including executables from all modules,\naround **22 GiB** will be needed (mostly on the same filesystem as the ocrd_all checkout). The same goes for the\n[`maximum-cuda`](#docker-hub) variant of the prebuilt Docker images (due on the filesystem harboring Docker, typically\n`/var/lib/docker`).\n\nAlso, during build, an additional 5 GiB may be needed for temporary files, typically in the `/tmp` directory.\nTo use a different location path with more free space, set the `TMPDIR` variable when calling `make`:\n\n    TMPDIR=/path/to/my/tempdir make all\n\n\n### Locale\n\nThe (shell) environment must have a Unicode-based localization.\n(Otherwise Python code based on `click` will not work, i.e. most OCR-D CLIs.)\nThis is true for most installations today, and can be verified by:\n\n    locale | fgrep .UTF-8\n\nThis should show several `LC_*` variables. Otherwise, either select another localization globally...\n\n    sudo dpkg-reconfigure locales\n\n... or use the Unicode-based POSIX locale temporarily:\n\n    export LC_ALL=C.UTF-8\n    export LANG=C.UTF-8\n\n### System packages\n\n* Install git, GNU make and GNU parallel.\n\n        # on Debian / Ubuntu:\n        sudo apt install make git parallel\n\n* Install wget or curl if you want to download Tesseract models.\n\n        # on Debian / Ubuntu:\n        sudo apt install wget\n\n* Install the packages for Python3 development and Python3 virtual environments\nfor your operating system / distribution.\n\n        # on Debian / Ubuntu:\n        sudo apt install python3-dev python3-venv\n\n* Some modules require [Tesseract](https://github.com/tesseract-ocr/tesseract).\nIf your operating system / distribution already provides Tesseract 4.1\nor newer, then just install its development package:\n\n        # on Debian / Ubuntu:\n        sudo apt install libtesseract-dev\n\n   Otherwise, recent Tesseract packages for Ubuntu are available via PPA\n   [alex-p](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel).\n\n   If no Tesseract is installed, a recent version will be downloaded and built as part\n   of the `ocrd_tesserocr` module rules.\n\n* Other modules will have additional system dependencies.\n\n\u003e **Note**: System dependencies **for all modules** on Ubuntu 20.04 (or similar)\n\u003e can also be installed **automatically** by running:\n\u003e \n\u003e         # on Debian / Ubuntu:\n\u003e         make modules\n\u003e         sudo apt install make\n\u003e         sudo make deps-ubuntu\n\u003e \n\u003e (And you can define the scope of _all modules_ by setting the `OCRD_MODULES`\n[variable](#Variables) as described below. If unsure, consider doing a dry-run\nfirst, by using `make -n`.)\n\n### GPU support\n\nMany executables can utilize Nvidia GPU for much faster computation, _if available_ (i.e. optionally).\n\nFor that, as a further prerequisite you need an installation of\n[CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) and additional optimised\nlibraries like [cuDNN](https://developer.nvidia.com/cudnn) for your system.\n\nThe CUDA version currently supported is 11.8 (but other's may work as well).\n\n\u003e **Note**: CUDA toolkit and libraries (in a  development version with CUDA compiler)\n\u003e can also be installed **automatically** by running:\n\u003e \n\u003e         make ocrd\n\u003e         sudo make deps-cuda\n\u003e \n\u003e This will deploy [Micromamba](https://mamba.readthedocs.io/en/latest/index.html)\nnon-intrusively (without system packages or Conda environments), but also share some\nof the CUDA libraries installed as Python packages system-wide via ld.so.conf rules.\nIf unsure, consider doing a dry-run first, by using `make -n`.)\n\n## Usage\n\nRun `make` with optional parameters for __variables__ and __targets__ like so:\n\n    make [PYTHON=python3] [VIRTUAL_ENV=./venv] [OCRD_MODULES=\"...\"] [TARGET...]\n\n### Targets\n\n#### _deps-ubuntu_\n\nInstall system packages for all modules. (Depends on [_modules_](#modules).)\n\nSee [system package prerequisites](#system-packages) above.\n\n#### _deps-cuda_\n\nInstall CUDA toolkit and libraries. (Depends on [_ocrd_](#ocrd).)\n\nSee (optional) [GPU support prerequisites](#gpu-support) above.\n\n#### _modules_\n\nCheckout/update all modules, but do not install anything.\n\n#### _all_\n\nInstall executables from all modules into the venv. (Depends on [_modules_](#modules) and [_ocrd_](#ocrd).)\n\n#### _ocrd_\n\nInstall only the `core` module and its CLI `ocrd` into the venv.\n\n#### _docker_\n\n(Re-)build a Docker image for all modules/executables. (Depends on [_modules_](#modules).)\n\n#### _dockers_\n\n(Re-)build Docker images for some pre-selected subsets of modules/executables. (Depends on [_modules_](#modules).)\n\n(These are the very same variants published as [prebuilt images on Docker Hub](#docker-hub),\ncf. [CI configuration](.circleci/config.yml#L27-L65).)\n\n\u003e **Note**: The image will contain all refs and branches of all checked out modules,\n\u003e which may not be actually needed. If you are planning on building and distributing\n\u003e Docker images with minimal size, consider using `GIT_DEPTH=--single-branch`\n\u003e before `modules` or running `make tidy` later-on.\n\n#### _clean_\n\nRemove the venv and the modules' build directories.\n\n#### _show_\n\nPrint the venv directory, the module directories, and the executable names – as configured by the current variables.\n\n#### _check_\n\nVerify that all executables are runnable and the venv is consistent.\n\n#### _help_ (default goal)\n\nPrint available targets and variables.\n\n---\n\nFurther targets:\n#### _[any module name]_\n\nDownload/update that module, but do not install anything.\n\n#### _[any executable name]_\n\nInstall that CLI into the venv. (Depends on that module and on [_ocrd_](#ocrd).)\n\n### Variables\n\n#### _OCRD_MODULES_\n\nOverride the list of git submodules to include. Targets affected by this include:\n- [deps-ubuntu](#deps-ubuntu) (reducing the list of system packages to install)\n- [modules](#modules) (reducing the list of modules to checkout/update)\n- [all](#all) (reducing the list of executables to install)\n- [docker](#docker) (reducing the list of executables and modules to install)\n- [show](#show) (reducing the list of `OCRD_MODULES` and of `OCRD_EXECUTABLES` to print)\n\n#### _NO_UPDATE_\n\nIf set to `1`, then when installing executables, does not attempt to `git submodule update`\nany currently checked out modules. (Useful for development when testing different module version\nprior to a commit.)\n\n#### _PYTHON_\n\nName of the Python binary to use (at least python3 required).\n\nIf set to just `python`, then for the target `deps-ubuntu` it is assumed that Python is already installed.\n\n#### _VIRTUAL_ENV_\n\nDirectory prefix to use for local installation. \n\n(This is set automatically when activating a virtual environment on the shell.\nThe build system will re-use the venv if one already exists here, or create one otherwise.)\n\n#### _TMPDIR_\n\nOverride the default path (`/tmp` on Unix) where temporary files during build are stored.\n\n#### _PIP_OPTIONS_\n\nAdd extra options to the `pip install` command like `-q` or `-v` or `-e`.\n\n\u003e **Note**: The latter option will install Python modules in __editable mode__,\n\u003e i.e. any update to the source would directly affect the executables.\n\n#### _GIT_RECURSIVE_\n\nSet to `--recursive` to checkout/update all modules recursively. (This usually installs additional tests and models.)\n\n### Examples\n\nTo build the latest Tesseract locally, run this command first:\n\n    # Get code, build and install Tesseract with the default English model.\n    make install-tesseract\n    make ocrd-tesserocr-recognize\n\nOptionally install additional Tesseract models.\n\n    # Download models from tessdata_fast into the venv's tessdata directory.\n    ocrd resmgr download ocrd-tesserocr-recognize deu_latf.traineddata\n    ocrd resmgr download ocrd-tesserocr-recognize Latin.traineddata\n    ocrd resmgr download ocrd-tesserocr-recognize Fraktur.traineddata\n\nOptionally install Tesseract training tools.\n\n    make install-tesseract-training\n\nRunning `make ocrd` or just `make` downloads/updates and installs the `core` module,\nincluding the `ocrd` CLI in a virtual Python 3 environment under `./venv`.\n\nRunning `make ocrd-tesserocr-recognize` downloads/updates the `ocrd_tesserocr` module\nand installs its CLIs, including `ocrd-tesserocr-recognize` in the venv.\n\nRunning `make modules` downloads/updates all modules.\n\nRunning `make all` additionally installs the executables from all modules.\n\nRunning `make all OCRD_MODULES=\"core ocrd_tesserocr ocrd_cis\"` installs only the executables from these modules.\n\n### Results\n\nTo use the built executables, simply activate the virtual environment:\n\n    . ${VIRTUAL_ENV:-venv}/bin/activate\n    ocrd --help\n    ocrd-...\n\nFor the Docker image, run it with your data path mounted as a user,\nand the processor resources as named volume (for model persistency):\n\n    docker run -it -u $(id -u):$(id -g) -v $PWD:/data -v ocrd-models:/models ocrd/all\n    ocrd --help\n    ocrd-...\n\n### Persistent configuration\n\nIn order to make choices permanent, you can put your variable preferences\n(or any custom rules) into `local.mk`. This file is always included if it exists.\nSo you don't have to type (and memorise) them on the command line or shell environment.\n\nFor example, its content could be:\n```make\n# restrict everything to a subset of modules\nOCRD_MODULES = core ocrd_im6convert ocrd_cis ocrd_tesserocr\n\n# use a non-default path for the virtual environment\nVIRTUAL_ENV = $(CURDIR)/.venv\n\n# install in editable mode (i.e. referencing the git sources)\nPIP_OPTIONS = -e\n\n# use non-default temporary storage\nTMPDIR = $(CURDIR)/.tmp\n\n# avoid automatic submodule updates\nNO_UPDATE = 1\n```\n\n\u003e **Note**: When `local.mk` exists, variables can still be overridden on the command line,\n\u003e (i.e. `make all OCRD_MODULES=` will build all executables for all modules again),\n\u003e but not from the shell environment\n\u003e (i.e. `OCRD_MODULES= make all` will still use the value from local.mk).\n\n### Docker Hub\n\nBesides native installation, `ocrd_all` is also available as **prebuilt** Docker images\nfrom [Docker Hub as `ocrd/all`](https://hub.docker.com/r/ocrd/all), backed by CI/CD.\nYou can choose from three tags, `minimum`, `medium` and `maximum`. These differ w.r.t.\nwhich modules are included, with `maximum` being the equivalent of doing `make all`\nwith the default (unset) value for `OCRD_MODULES`.\n\nTo download the images on the command line:\n\n    docker pull ocrd/all:minimum\n    # or\n    docker pull ocrd/all:medium\n    # or\n    docker pull ocrd/all:maximum\n\nIn addition to these base variants, there are `minimum-cuda`, `medium-cuda` and `maximum-cuda` with GPU support.\n(These also need [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) runtime, which will add the\n`docker --gpus` option.)\n\nThe `maximum-cuda` variant will be aliased to `latest` as well.\n\nThese tags will be _overwritten_ with every new release of ocrd_all (i.e. rolling release).\n(You can still differentiate and reference them by their sha256 digest if necessary.)\n\nHowever, the `maximum-cuda` variant of each release will also be aliased to a _permanent_ tag by ISO **date**, e.g. `2023-04-02`.\n\nUsage of the prebuilt Docker image is the same [as if you had built the image yourself](#results).\n\nThis table lists which tag contains which module:\n| Module                      | `minimum` | `medium` | `maximum` |\n| -----                       | ----      | ----     | ----      |\n| core                        | ☑         | ☑        | ☑         |\n| ocrd_cis                    | ☑         | ☑        | ☑         |\n| ocrd_fileformat             | ☑         | ☑        | ☑         |\n| ocrd_olahd_client           | ☑         | ☑        | ☑         |\n| ocrd_im6convert             | ☑         | ☑        | ☑         |\n| ocrd_pagetopdf              | ☑         | ☑        | ☑         |\n| ocrd_repair_inconsistencies | ☑         | ☑        | ☑         |\n| ocrd_tesserocr              | ☑         | ☑        | ☑         |\n| ocrd_wrap                   | ☑         | ☑        | ☑         |\n| workflow-configuration      | ☑         | ☑        | ☑         |\n| cor-asv-ann                 | -         | ☑        | ☑         |\n| dinglehopper                | -         | ☑        | ☑         |\n| docstruct                   | -         | ☑        | ☑         |\n| format-converters           | -         | ☑        | ☑         |\n| nmalign                     | -         | ☑        | ☑         |\n| ocrd_calamari               | -         | ☑        | ☑         |\n| ocrd_keraslm                | -         | ☑        | ☑         |\n| ocrd_neat                   | -         | ☑        | ☑         |\n| ocrd_olena                  | -         | ☑        | ☑         |\n| ocrd_segment                | -         | ☑        | ☑         |\n| ocrd_anybaseocr             | -         | -        | ☑         |\n| ocrd_detectron2             | -         | -        | ☑         |\n| ocrd_doxa                   | -         | -        | ☑         |\n| ocrd_kraken                 | -         | -        | ☑         |\n| ocrd_froc                   | -         | -        | ☑         |\n| sbb_binarization            | -         | -        | ☑         |\n| cor-asv-fst                 | -         | -        | -         |\n| ocrd_ocropy                 | -         | -        | -         |\n| ocrd_pc_segmentation        | -         | -        | -         |\n\n\u003e **Note**: The following modules have been disabled by default and can only be\n\u003e enabled by explicitly setting `OCRD_MODULES` or `DISABLED_MODULES`:\n\u003e \n\u003e * `cor-asv-fst` (runtime issues)\n\u003e * `ocrd_ocropy` (better implementation in ocrd_cis available)\n\u003e * `ocrd_pc_segmentation` (dependency and quality issues)\n\n### Uninstall\n\nIf you have installed ocrd_all natively and wish to uninstall, first `deactivate` the virtual environment  and remove the `ocrd_all` directory:\n```\nrm -rf ocrd_all\n```\n\nNext, remove all contents under ~/.parallel/semaphores:\n```\nrm -rf ~/.parallel/semaphores\n```\n\n## Challenges\n\nThis repo offers solutions to the following problems with OCR-D integration.\n\n### No published/recent version on PyPI\n\nPython modules which are not available in PyPI:\n\n_(Solved by installation from source.)_\n\n### Conflicting requirements\n\nMerging all packages into one venv does not always work.\nModules may require mutually exclusive sets of dependent packages.\n\n`pip` does not even stop or resolve conflicts – it merely warns!\n\n- Tensorflow:\n   * version 2 (required by `ocrd_calamari`, `ocrd_anybaseocr` and `eynollah`)\n   * version 1 (required by `cor-asv-ann`, `ocrd_segment` and  `ocrd_keraslm`)\n   \n   The temporary solution is to require different package names:\n   - `tensorflow\u003e=2`\n   - `tensorflow-gpu==1.15.*`\n   \n   Both cannot be installed in parallel in different versions, and usually also depend on different versions of CUDA toolkit.\n   \n- OpenCV:\n   * `opencv-python-headless` (required by core and others, avoids pulling in X11 libraries)\n   * `opencv-python` (probably dragged in by third party packages)\n   \n   As long as we keep reinstalling the headless variant and no such package attempts GUI, we should be fine. \n   Custom build (as needed for ARM) under the _module_ `opencv-python` already creates the headless variant.\n\n- PyTorch:\n   * `torch\u003c1.0`\n   * `torch\u003e=1.0`\n   \n- ...\n\n_(Solved by managing and delegating to different subsets of venvs.)_\n\n### System requirements\n\nModules which do not advertise their system package requirements via `make deps-ubuntu`:\n\n_(Solved by maintaining these requirements under `deps-ubuntu` here.)_\n\n## Contributing\n\nPlease see our [contributing\nguide](https://github.com/OCR-D/ocrd_all/blob/master/.github/contributing.md)\nto learn how you can support the project.\n\n## Acknowledgments\n\nThis software uses GNU parallel.\nGNU Parallel is a general parallelizer to run multiple serial command line\nprograms in parallel without changing them.\n\n### Reference\n\nTange, Ole. (2020). _GNU Parallel 20200722 ('Privacy Shield')_. Zenodo. https://doi.org/10.5281/zenodo.3956817\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_all","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Focrd_all","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_all/lists"}