{"id":24951961,"url":"https://github.com/ocr-d/ocrd_framework","last_synced_at":"2025-03-28T19:40:29.279Z","repository":{"id":93167873,"uuid":"227784841","full_name":"OCR-D/ocrd_framework","owner":"OCR-D","description":"Docker installation for the OCR-D framework containing all available processors, taverna workflow and local repository.","archived":false,"fork":false,"pushed_at":"2020-01-08T07:51:00.000Z","size":17,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-03T01:34:25.538Z","etag":null,"topics":["ocr-d"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-13T07:46:28.000Z","updated_at":"2020-09-22T11:49:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"b19ef39d-589c-4ee5-a188-f5fc1e7822ac","html_url":"https://github.com/OCR-D/ocrd_framework","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Focrd_framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/ocrd_framework/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246091206,"owners_count":20722171,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-d"],"created_at":"2025-02-03T01:33:57.236Z","updated_at":"2025-03-28T19:40:29.266Z","avatar_url":"https://github.com/OCR-D.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OCR-D Famework\nInstallation of OCR-D framework containing all available processors, taverna workflow and local research data repository.\n\n## Installation with Docker\n### Hardware Requirements\nMore than 8GB RAM and 20 GB of hard disc.\n\n### Requirements\n- Docker (see installation for [Ubuntu](https://github.com/OCR-D/repository_metastore/blob/master/installDocker/installationDocker.md))\n  - docker\n  - docker-compose \n- git\n- sed\n- unzip\n- wget\n\n### Installation\nChoose a directory on a disc with at least 10 GB free space left.\n(In our example we use ocrd_framework inside the home directory)\nDownload and start [installation script](install_OCR-D_framework.sh).\n```bash=bash\nuser@localhost:/home/user/$bash install_OCR-D_framework.sh /home/user/ocrd_framework\n[...]\nSUCCESS\nNow you can start an OCR-D workflow with the following commands:\ncd \"/home/user/ocrd_framework/taverna\"\ndocker run --network=\\\"host\\\" -v `pwd`:/data ocrd/taverna process\"\n```\nNow there exists several folders\n- repository - Contains all files of repository and the databases\n- taverna - Contains all files workspaces and configuration of workflows\n\n### Prepare hosts for accessing files in repo via browser\n```bash=bash\nuser@localhost:/home/user/$ echo '127.0.0.1   kitdm20' | sudo tee -a /etc/hosts \n127.0.0.1   kitdm20\n```\n\n### First Test\nTo check if the installation works fine you can start a first test.\n```bash=bash\nuser@localhost:~/ocrd_framework/taverna$docker run --network=\"host\" -v `pwd`:/data ocrd/taverna testWorkflow\n[...]\nOutputs will be saved to the directory: /taverna/git/Execute_OCR_D_workfl_output\n# The processed workspace should look like this:\nuser@localhost:~/ocrd_framework/taverna$ls -1 workspace/example/data/\nmetadata\nmets.xml\nOCR-D-GT-SEG-BLOCK\nOCR-D-GT-SEG-PAGE\nOCR-D-IMG\nOCR-D-IMG-BIN\nOCR-D-IMG-BIN-OCROPY\nOCR-D-OCR-CALAMARI_GT4HIST\nOCR-D-OCR-TESSEROCR-BOTH\nOCR-D-OCR-TESSEROCR-FRAKTUR\nOCR-D-OCR-TESSEROCR-GT4HISTOCR\nOCR-D-SEG-LINE\nOCR-D-SEG-REGION\n```\nEach sub folder starting with 'OCR-D-OCR' should now\ncontain 4 files with the detected full text.\n\n\n#### The metadata sub directory\nThe subdirectory 'metadata' contains the provenance of the workflow all\nintermediate mets files and the stdout and stderr output of all executed processors.\n\n#### Check results in browser\nAfter the workflow all results are ingested to the research data repository.\nThe repository is available at http://localhost:8080/api/v1/metastore/bagit\n\n### Create your own workflow\nFor configuration of the workflow see instructions in [README.md](https://github.com/OCR-D/taverna_workflow/blob/master/README.MD).\n\n:information_source: All provided paths inside the parameter and workflow configuration files have to be 'dockerized'. For executing scripts relative paths are also possible. \n\nThe commands should look like this:\n### Test Processors\nFor a fast test if a processor is available try the following command:\n```bash=bash\n# Test if processor is installed e.g. ocrd-cis-ocropy-binarize\nuser@localhost:~/ocrd_framework/taverna$docker run -v `pwd`:/data ocrd/taverna dump ocrd-cis-ocropy-binarize\n{\n \"executable\": \"ocrd-cis-ocropy-binarize\",\n \"categories\": [\n  \"Image preprocessing\"\n ],\n \"steps\": [\n  \"preprocessing/optimization/binarization\",\n  \"preprocessing/optimization/grayscale_normalization\",\n  \"preprocessing/optimization/deskewing\"\n ],\n \"input_file_grp\": [\n  \"OCR-D-IMG\",\n  \"OCR-D-SEG-BLOCK\",\n  \"OCR-D-SEG-LINE\"\n ],\n \"output_file_grp\": [\n  \"OCR-D-IMG-BIN\",\n  \"OCR-D-SEG-BLOCK\",\n  \"OCR-D-SEG-LINE\"\n ],\n \"description\": \"Binarize (and optionally deskew/despeckle) pages / regions / lines with ocropy\",\n \"parameters\": {\n  \"method\": {\n   \"type\": \"string\",\n   \"enum\": [\n    \"none\",\n    \"global\",\n    \"otsu\",\n    \"gauss-otsu\",\n    \"ocropy\"\n   ],\n   \"description\": \"binarization method to use (only ocropy will include deskewing)\",\n   \"default\": \"ocropy\"\n  },\n  \"grayscale\": {\n   \"type\": \"boolean\",\n   \"description\": \"for the ocropy method, produce grayscale-normalized instead of thresholded image\",\n   \"default\": false\n  },\n  \"maxskew\": {\n   \"type\": \"number\",\n   \"description\": \"modulus of maximum skewing angle to detect (larger will be slower, 0 will deactivate deskewing)\",\n   \"default\": 0.0\n  },\n  \"noise_maxsize\": {\n   \"type\": \"number\",\n   \"description\": \"maximum pixel number for connected components to regard as noise (0 will deactivate denoising)\",\n   \"default\": 0\n  },\n  \"level-of-operation\": {\n   \"type\": \"string\",\n   \"enum\": [\n    \"page\",\n    \"region\",\n    \"line\"\n   ],\n   \"description\": \"PAGE XML hierarchy level granularity to annotate images for\",\n   \"default\": \"page\"\n  }\n }\n}\nuser@localhost:~/ocrd_framework/taverna$\n```\n\n### Execute your own Workflow\nIf workflow is configured it can be started.\n```bash=bash\nuser@localhost:~/ocrd_framework/taverna$docker run --network=\"host\" -v `pwd`:/data ocrd/taverna process my_parameters.txt relative/path/to/workspace/containing/mets\n```\n\n\n\n## More Information\n\n* [Docker](https://www.docker.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Focrd_framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Focrd_framework/lists"}