{"id":24951976,"url":"https://github.com/ocr-d/gt-repo-scripts","last_synced_at":"2026-01-06T19:51:00.643Z","repository":{"id":50508923,"uuid":"483675189","full_name":"OCR-D/gt-repo-scripts","owner":"OCR-D","description":"XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).","archived":false,"fork":false,"pushed_at":"2024-04-17T13:35:22.000Z","size":1314,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-17T14:35:59.357Z","etag":null,"topics":["ground-truth","ocr-d","page-xml","repository","template"],"latest_commit_sha":null,"homepage":"","language":"XSLT","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-04-20T13:53:25.000Z","updated_at":"2024-04-17T14:36:09.066Z","dependencies_parsed_at":"2024-04-17T14:47:58.588Z","dependency_job_id":null,"html_url":"https://github.com/OCR-D/gt-repo-scripts","commit_stats":null,"previous_names":["ocr-d/gt-repo-scripts"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-repo-scripts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-repo-scripts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-repo-scripts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-repo-scripts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/gt-repo-scripts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246091229,"owners_count":20722175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ground-truth","ocr-d","page-xml","repository","template"],"created_at":"2025-02-03T01:34:07.069Z","updated_at":"2026-01-06T19:51:00.599Z","avatar_url":"https://github.com/OCR-D.png","language":"XSLT","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"./img/gt-LevelParser2.jpg\" width=\"200\" align=\"right\"\u003e\n\n\n# gt-repo-scripts\n\n\n## Description\nXSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).\n\nThe format of the output files:\n- Markdown,\n- ruleset (JSON)\n- METS (XML) \n- Shell scripts\n\n## Overview of scripts or programs\n\n### **🚀 gt-overview_unitTest.xsl**\n\n- It lists all files in the Ground Truth (GT) directory. In a second step, the xsl checks whether the specified GT directory   structure with the data and GT-PAGE directories is present. If other directories or a different directory structure are present, an error is output (pathtest.md). \n  - It is part of the gtrepo github-action workflow.\n  - **:wrench: general program call**\n      -  ```shell    \n         java -jar saxon-XX.jar -xsl:scripts/gt-overview_unitTest.xsl \\\n         output=unitTest1 \\\n         -s:scripts/gt-overview_unitTest.xsl -o:ghout/pathtest.md\n         ```\n\n\n### **🚀 gt-overview_metadata.xsl**\n\n   - **Environment parameters group**\n        - Analysis of ground truth, GitHub page creation, following parameters are to be followed. Use environment variables https://docs.github.com/en/actions/learn-github-actions/environment-variables\n            - repoBase=$GITHUB_REF_NAME\n            - repoName=$GITHUB_REPOSITORY\n            - bagitDumpNum=$GITHUB_RUN_NUMBER    \n        \n   - **Output parameter group:**\n        - Specifies what type of analysis and in what form it should be displayed.\n            - output=METADATA -\u003e transform METADATA and create GT overview \n            - output=TABLE -\u003ecompressed table view\n            - output=OVERVIEW-\u003edetailed table view\n\n   - **Metadata parameter group:**\n        - indicates that a metadata set is created for the GT corpus and the README and the README file is adapted.\n            - output=METS -\u003egenerate metadata for (METS)-Ingest in OCR-D workflow, mets.sh is generated\n            - output=METSvolume-\u003egenerate METS metadata for the whole corpus\n            - output=METSdefault-\u003egenerate METS metadata file without DEFAULT fileGrp (file Group), the METS file(s) contains only the Realease files\n            - output=README -\u003ecreation of a customized README file\n     - **:wrench: general program call**\n        - ```shell\n          java -jar saxon-XX.jar -xsl:scripts/gt-overview_metadata.xsl \\\n          output=XX repoBase=$GITHUB_REF_Name repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER \\\n          -s:scripts/gt-overview_metadata.xsl -o:XX\n          ```  \n\n### **🚀 gt-level_parser.xsl**\n   - It is a rule-based parser for determining the transcription and structure level of a page file and the corpus of page files.\n     The transcription level distinguishes three and the structure two levels.\n   - The parser determines the frequencies of characters and structures (regions) that are defined in the rules. Based on this analysis, a specific level is determined for the page and for the corpus.\n   - The gt-level_parser.xsl include gt-level_structure.xsl.**gt-level_structure.xsl** specialises in determining the regions used in the Page-XML files. An independent call of this stylesheet is not provided.\n   - The output file is overview-level.md, it is the level matrix, the analysis result.\n     - **:wrench: general program call**\n       - ```shell\n         java -jar saxon-XX.jar -xsl:scripts/gt-level_parser.xsl \\\n         repoName=$GITHUB_REPOSITORY \\\n         -s:scripts/gt-level_parser.xsl -o:ghout/overview-level.md\n         ```\n\n### **🚀 gt-coll_metadata.xsl**\n  - gt-coll_metadata.xsl automatically creates a readme file for a collection/corpus of Ground Truth repositories. \n    - **:wrench: general program call**\n       - ```shell\n         java -jar saxon-xx.jar -xsl:scripts/gt-coll_metadata.xsl \\\n         -s:scripts/gt-coll_metadata.xsl -o:README.md\n         ```\n\n\n### **🚀 data_structure.sh**\n   - Analysis of the data structure, determination of the METS metadata file and afterwards creation of the Bagit files. For Bagit see: https://ocr-d.de/en/spec/ocrd_zip\n     - **:wrench: general program call**\n       - ```shell\n           sh scripts/data_structure.sh\n         ``` \n### **🚀 data_mets.sh**\n   - During the Github action workflow, METS files that do not contain `OCR-D-IMG fileGrp` are deleted. \n\n\n### **🚀 readmefolder.sh**\n   - Archiving the original README file to the `readme_old` folder\n     - **:wrench: general program call**\n       - ```shell\n           sh scripts/readmefolder.sh\n         ```\n\n### **🚀 xreadme.sh**\n   - Determination of the README file and change of the filename extension from Markdown to XML\n     - **:wrench: general program call**\n       - ```shell\n           sh scripts/xreadme.sh\n         ```\n**🌻 lang.js**\n   - Javascript for the automated language conversion (German/English) of the level description and the links to the OCR-D-GT Guidelines.\n     \n**🌻 table_hide.css**\n   - CSS stylesheet to customize the formatting of GH pages. The GH pages use the dinky template (https://pages-themes.github.io/dinky/).\n\n**🌻 levelparser.css**\n   - CSS stylesheet for customising the formatting of GH pages, in particular for determining the transcription and structure levels.\n\n## Overview of additional files\n\n**🖹 megalevelrules.xml**\n  -  Megalevelrules.xml file contains all OCR-D Ground-Truth Transcription Level Rules. These rules are based on the encodings published by the Medieval Unicode Font Initiative (MUFI). \n  - These rules are used for so-called level parsing. \n  - The megalevelrules are generated automatically. See also: https://github.com/OCR-D/gt-MufiLevelRules  \n  - The file available here is a copy of: https://raw.githubusercontent.com/OCR-D/gt-MufiLevelRules/gh-pages/rules/megalevelrules.xml\n\n\n**🌻 metadata.xsl**\n  - The Metadata.xsl file updates the metadata file CITATION.cff of the repo **gt-repo-scripts**. The update is performed by a GitHub action workflow.\n\n## Github Action Template\n\nIn combination or individually, the individual programs and stylesheets can also be used in a Github Action Workflow.\n- With XSLT, an XSLT transformer should also be installed. \n- OCR-D is used for the creation of Bagit data containers.\n\n### Example Github Action Workflow with the programs\n#### Example 1\nsee application: https://github.com/OCR-D/gt-repo-template\n- gt-overview_unitTest.xsl\n- gt-overview_metadata.xsl\n- gt-level_parser.xsl\n- data_structure.sh\n- data_mets.sh \n- readmefolder.sh\n- xreadme.sh\n \n```yml\nname: gtrepo\non:\n  push:\n    tags:\n      - 'v[0-9]+.[0-9]+.[0-9]+'\n      \n  workflow_dispatch:\n      inputs:\n        tag-name:\n          description: Name of the release tag\n          \njobs:\n    job1:\n        name: uniTest\n        runs-on: ubuntu-latest\n        permissions:\n            checks: write\n            contents: write\n        # Map a step output to a job output\n        outputs:\n          output1: ${{ steps.step4.outputs.test }}\n          output2: ${{ steps.step4.outputs.test2 }}\n          \n        steps:      \n          - name: Git checkout\n            id: step1\n            uses: actions/checkout@v4\n\n           # Installation Styles and Saxon\n      \n          - name: install analyse xsl-styles\n            id: step2\n            run: | \n                git clone https://github.com/tboenig/gt-repo-scripts.git\n                mv gt-repo-scripts/scripts scripts/\n                rm -r gt-repo-scripts\n          \n          - name: Download and install saxon\n            id: step3\n            run: |\n              wget https://github.com/Saxonica/Saxon-HE/releases/download/SaxonHE12-3/SaxonHE12-3J.zip \n              unzip SaxonHE12-3J.zip      \n          \n\n           # Installation and Directories   \n          \n          - name: make gh-pages_out\n            run: mkdir ghout\n\n\n          - name: Get SDK Version from config\n            id: lookupSdkVersion\n            uses: mikefarah/yq@master\n            with:\n             cmd: yq -o=json METADATA.yml \u003e METADATA.json  \n\n          - name: PathTest\n            run: |\n                java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_unitTest.xsl \\\n                output=unitTest1 \\\n                -s:scripts/gt-overview_unitTest.xsl -o:ghout/pathtest.md\n            shell: bash\n\n          # Test GT-Page Folder Repo Structure\n          \n          - name: Empty\n            id: step4\n            run: |\n                [ -s ghout/pathtest.md ] || echo \"test=empty\" \u003e\u003e $GITHUB_OUTPUT\n                [ ! -s ghout/pathtest.md ] || echo \"test2=full\" \u003e\u003e $GITHUB_OUTPUT\n          \n          # Error Logview     \n          \n          - name: uniTestError\n            id: step5\n            if: ${{steps.step4.outputs.test2 == 'full'}}  \n            run: |\n              less ghout/pathtest.md          \n   \n    \n    job2:\n        name: analyse_and_makebagit\n        needs: job1\n        if: ${{needs.job1.outputs.output1 == 'empty'}}        \n        runs-on: ubuntu-latest\n        permissions:\n            checks: write\n            contents: write\n              \n        \n        steps:\n          - name: Using tag name from ref name\n            if: github.event.inputs.tag-name == ''\n            run: echo \"TAG_NAME=$GITHUB_REF_NAME\" \u003e\u003e $GITHUB_ENV\n\n          - name: Using tag name from input param\n            if: github.event.inputs.tag-name != ''\n            run: echo \"TAG_NAME=${{ github.event.inputs.tag-name}}\" \u003e\u003e $GITHUB_ENV  \n  \n          - name: Git checkout\n            uses: actions/checkout@v4\n      \n            # Installation Styles\n            \n          - name: install analyse xsl-styles\n            run: | \n              git clone https://github.com/tboenig/gt-repo-scripts.git\n              mv gt-repo-scripts/scripts scripts/\n              rm -r gt-repo-scripts\n      \n            # Installation GT-Labelling Documentation\n      \n            \n          - name: install labeling\n            run: |\n              git clone https://github.com/tboenig/gt-guidelines.git\n      \n            \n          # Installation and Directories\n            \n          - name: install jq\n            run: sudo apt-get install jq\n          \n                      \n          - name: Download and install saxon\n            run: |\n              wget https://github.com/Saxonica/Saxon-HE/releases/download/SaxonHE12-3/SaxonHE12-3J.zip \n              unzip SaxonHE12-3J.zip\n                            \n          - name: make metadata_out\n            run: mkdir metadata_out\n      \n          - name: make ocrdzip_out\n            run: mkdir ocrdzip_out\n            \n          - name: make gh-pages_out\n            run: mkdir ghout\n            \n          - name: make readme_out \n            run:  sh scripts/readmefolder.sh\n      \n      \n          - name: readme.xml file\n            run: sh scripts/xreadme.sh  \n      \n                \n          \n          # Transformation and analyzing\n          \n          - name: Get SDK Version from config\n            id: lookupSdkVersion\n            uses: mikefarah/yq@master\n            with:\n              cmd: yq -o=json METADATA.yml \u003e METADATA.json\n                  \n          - name: transform METADATA and make GT-Overview\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=METADATA repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl -o:ghout/metadata.md\n            shell: bash\n      \n          - name: make Compressed table view\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=TABLE repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY \\\n              -s:scripts/gt-overview_metadata.xsl -o:ghout/table.md\n            shell: bash\n      \n          - name: detailed table view \n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=OVERVIEW repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY \\\n              -s:scripts/gt-overview_metadata.xsl -o:ghout/overview.md\n            shell: bash\n\n          - name: leveling the volume and documents \n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-level_parser.xsl \\\n              repoName=$GITHUB_REPOSITORY \\\n              -s:scripts/gt-level_parser.xsl -o:ghout/overview-level.md\n            shell: bash  \n      \n          - name: generate mets.sh\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=METS repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY \\\n              -s:scripts/gt-overview_metadata.xsl -o:scripts/mets.sh\n            shell: bash\n            \n          - name: generate Metadata JSON file\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=METAJSON repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl -o:metadata_out/metadata_l.json\n            shell: bash\n            \n            \n          - name: format json file and copy to gh branch\n            run: |\n              jq '.' metadata_out/metadata_l.json \u003e metadata_out/metadata.json\n              cp metadata_out/metadata.json ghout/\n              rm metadata_out/metadata_l.json\n            \n            \n          - name: generate README\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=README repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY \\\n              -s:scripts/gt-overview_metadata.xsl -o:README.md\n            shell: bash\n            \n          - name: generate METS Volume File\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=METSvolume repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl -o:metadata_out/mets.xml\n            shell: bash\n      \n          - name: generate release download List\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=download repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl -o:ghout/download.txt\n            shell: bash  \n            \n          - name: delete fileGrp DEFAULT\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=METSdefault repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl\n            shell: bash\n\n          - name: generate CITATION.cff\n            run: |\n              java -jar saxon-he-12.3.jar -xsl:scripts/gt-overview_metadata.xsl \\\n              output=CITATION repoBase=${{ env.TAG_NAME }} repoName=$GITHUB_REPOSITORY bagitDumpNum=$GITHUB_RUN_NUMBER releaseTag=${{ env.TAG_NAME }} \\\n              -s:scripts/gt-overview_metadata.xsl -o:rawCITATION.cff\n            shell: bash\n\n          - name: formating CITATION.cff\n            id: lookupSdkVersion2\n            uses: mikefarah/yq@master\n            with:\n              cmd: |\n                yq -I4 rawCITATION.cff \u003e CITATION.cff\n                rm rawCITATION.cff\n            \n        \n          - name: Index-link\n            run: |\n                cd ghout\n                ln -s metadata.md index.md\n    \n      \n          # Mets handling, Install OCR-D and Bagit \n      \n          - name: del invalidMets\n            run: sh -ex scripts/data_mets.sh\n            shell: bash    \n              \n\n          - name: install ocrd, make validMets and bagit\n            run: |\n              sudo apt-get install -y python3 imagemagick libgeos-dev\n              python3 -m venv venv         \n              source venv/bin/activate     \n              pip install -U pip 'setuptools\u003e=61'\n              pip install ocrd\n              ocrd --version\n              \n\n          - name: make validMets  \n            run: |\n              source venv/bin/activate\n              sh -ex scripts/mets.sh\n                  \n\n          - name: make bagit\n            run: |\n              source venv/bin/activate\n              sh scripts/data_structure.sh\n\n```\n#### Example 2\n\nsee application: https://github.com/tboenig/gt_corpus_benchmark\n- gt-coll_metadata.xsl\n- xreadme.sh\n\n```yml\nname: gtrepo\non:\n  push:\n    tags:\n      - 'v[0-9]+.[0-9]+.[0-9]+'\n      \n      \n      \n  workflow_dispatch:\n\n\n\njobs:\n  cli:\n    name: makeDescription\n    runs-on: ubuntu-latest\n    permissions:\n     checks: write\n     contents: write\n     \n    steps:\n     \n    - name: Git checkout\n      uses: actions/checkout@v3\n\n     # Create Directories\n\n    - name: create directories\n      run: |\n        mkdir frak\n        mkdir ant\n        mkdir fontmix\n        mkdir frak/frak_simple\n        mkdir frak/frak_complex\n        mkdir ant/ant_simple\n        mkdir ant/ant_complex\n        mkdir fontmix/fontmix_simple\n        mkdir fontmix/fontmix_complex\n\n\n     # Clone Repos\n    - name: clone repos and delete files\n      run: |\n        cd frak\n        cd frak_simple\n        git clone https://github.com/tboenig/16_frak_simple.git --branch gh-pages\n        cd 16_frak_simple\n        rm -rf _config.yml index.md metadata.md overview.md table.md table_hide.css\n        cd ..\n        git clone https://github.com/tboenig/17_frak_simple.git --branch gh-pages\n        cd 17_frak_simple\n        rm -rf _config.yml index.md metadata.md overview.md table.md table_hide.css\n        \n\n     # Installation Styles\n      \n    - name: install analyse xsl-styles\n      run: | \n        git clone https://github.com/tboenig/gt-repo-scripts.git\n        mv gt-repo-scripts/scripts scripts/\n        rm -r gt-repo-scripts\n     \n    # Installation GT-Labelling Documentation\n\n    - name: install labeling\n      run: |\n        git clone https://github.com/tboenig/gt-guidelines.git\n\n          \n    # Installation Transformer\n\n    - name: Download and install saxon\n      run: |\n        wget https://sourceforge.net/projects/saxon/files/Saxon-HE/11/Java/SaxonHE11-4J.zip/download\n        unzip download\n    \n    \n    # Transform Readme\n\n    - name: readme.xml file\n      run: sh scripts/xreadme.sh  \n    \n      \n    # Transformation and analyzing\n\n    - name: generate README\n      run: |\n        java -jar saxon-he-11.4.jar -xsl:scripts/gt-coll_metadata.xsl \\\n        -s:scripts/gt-coll_metadata.xsl -o:README.md\n      shell: bash\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fgt-repo-scripts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Fgt-repo-scripts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fgt-repo-scripts/lists"}