{"id":24951943,"url":"https://github.com/ocr-d/gt-mufilevelrules","last_synced_at":"2026-01-07T20:04:25.182Z","repository":{"id":61639797,"uuid":"526122167","full_name":"OCR-D/gt-MufiLevelRules","owner":"OCR-D","description":"OCR-D-Level-Rules can be created automatically with gt-MufiLevelRules from the encodings published by MUFI: The Medieval Unicode Font Initiative.","archived":false,"fork":false,"pushed_at":"2024-04-18T17:44:27.000Z","size":1272,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-21T19:33:11.957Z","etag":null,"topics":["ground-truth","guidelines","ocr","ocr-d","transcription"],"latest_commit_sha":null,"homepage":"https://tboenig.github.io/gt-MufiLevelRules/","language":"XSLT","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCR-D.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-18T08:24:48.000Z","updated_at":"2024-04-17T11:12:11.000Z","dependencies_parsed_at":"2024-01-03T12:28:42.150Z","dependency_job_id":"70fc720c-cafa-4477-b830-490cbbfdcfcb","html_url":"https://github.com/OCR-D/gt-MufiLevelRules","commit_stats":{"total_commits":187,"total_committers":3,"mean_commits":"62.333333333333336","dds":0.06417112299465244,"last_synced_commit":"7d62f29cfdd6a3405e7f3be15172b1f127639e95"},"previous_names":["ocr-d/gt-mufilevelrules","tboenig/gt-mufilevelrules"],"tags_count":159,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-MufiLevelRules","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-MufiLevelRules/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-MufiLevelRules/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCR-D%2Fgt-MufiLevelRules/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCR-D","download_url":"https://codeload.github.com/OCR-D/gt-MufiLevelRules/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246091180,"owners_count":20722168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ground-truth","guidelines","ocr","ocr-d","transcription"],"created_at":"2025-02-03T01:32:51.141Z","updated_at":"2026-01-07T20:04:25.120Z","avatar_url":"https://github.com/OCR-D.png","language":"XSLT","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003clink href=\"table_hide.css\" rel=\"stylesheet\"/\u003e\n\n# gt-MufiLevelRules\n\nCreates OCR-D Ground-Truth Transcription Level Rules automatically from the encodings published by [MUFI: The Medieval Unicode Font Initiative](https://mufi.info).\n\nThe resulting OCR-D level rules conform to the [OCR-D specification](https://ocr-d.de/en/gt-guidelines/trans/transkription.html). \nThese rules can be used for substitutions or level checks, among other things. \n\nNote:\n- There may not always be a definition for every level, esp. on level 1.\n- OCR-D will try to fill in these gaps manually or automatically. The automated completion is based on the [unicruft](https://github.com/tboenig/gt-MufiLevelRules/tree/main/unicruft) program.\n- For this reason, using the rules for automatic character normalization from level 3 or level 2 to level 1\n  is currently not recommended before manually checking and correcting the corresponding rules.\n\n## Download the Rules\n\n**🚦 You can download the set of rules here. 🚦**\n- select the corresponding rule file: [rules directory](https://github.com/tboenig/gt-MufiLevelRules/tree/gh-pages/rules/characters)\n- as zip release file: [latest Releases](https://github.com/tboenig/gt-MufiLevelRules/releases/latest)\n\n\n\n## Recreation of the rules\n\n1. copy or clone the repository.\n\n    `git clone https://github.com/tboenig/gt-MufiLevelRules.git`\n2. Install [Saxon](https://www.saxonica.com/download/download_page.xml) for XSL Transformations v3.0. Then simply run with:\n\n    \n    `java -jar saxon-he-XX.jar -xsl:scripts/MufiGTLevelRules2.xsl -s:scripts/MufiGTLevelRules.xsl output=characters merge=yes`\n\nParameters:\n- **output** ``characters`` -\u003e create the rules, all rules are saved under directory: ``[directory]/rules/characters``\n- **merge** ``yes`` -\u003e create the megarules, all rules in one file. Megarules saved under directoy ``[directory]/rules``\n \nThe result of the conversion can be found in the directory: ``[directory]/rules/characters``.\n- Output Format:\n  - xml\n  - json\n\n\nThe script uses:\n\n1. the [MUFI rules](https://gefin.ku.dk/q.php?q=mufiexport) [new Version] and [MUFI rules old-Version](https://raw.githubusercontent.com/tboenig/keyboardGT/main/metadata/mufi.json)\n\n2. a summary of the following [**additional rules**](https://github.com/tboenig/gt-MufiLevelRules/blob/main/metadata/megarules.json) from the [OCR-D Ground-Truth Transcription Guide](https://ocr-d.de/en/gt-guidelines/trans/trBeispiele.html), which have priority (take precendence over MUFI rules where applicable):\n   - [ruleset_character.json](https://github.com/tboenig/gt-guidelines/blob/gh-pages/rules/ruleset_character.json)\n   - [ruleset_hyphenation.json](https://github.com/tboenig/gt-guidelines/blob/gh-pages/rules/ruleset_hyphenation.json)\n   - [ruleset_ligature.json](https://github.com/tboenig/gt-guidelines/blob/gh-pages/rules/ruleset_ligature.json)\n   - [ruleset_roman_digits.json](https://github.com/tboenig/gt-guidelines/blob/gh-pages/rules/ruleset_roman_digits.json)\n\n\n\n## Description of the rules\n\n### JSON Format\n\nAll JSON files (both the pure MUFI rules and the final result) follow the same schema.\n\n**Example:**\n\n```JSON\n {\"ruleset\":[\n       ...\n       {\"rule\": [\"ä\", \"aͤ\", \"\"], \"type\": \"level\"}\n       ...\n]}\n```\n\n- Each rule has a key: `rule` and a list of values\n- The values define the character representation on each of the 3 transcription levels:\n  - Level 1 is at the first position\n  - Level 2 is in the second place\n  - Level 3 is in the third place\n- Additional key-value combinations: ...\n- Character values can be empty to signify there is no definition (representation) at that level.\n\n\n### XML Format\n\n```XML\n\u003clevelrules\u003e\n    \u003cruleset\u003e\n        \u003crange\u003eAlphPresForm\u003c/range\u003e\n        \u003cdesc\u003eLATIN SMALL LIGATURE FF\u003c/desc\u003e\n        \u003crule\u003eff\u003c/rule\u003e\n        \u003crule\u003eff\u003c/rule\u003e\n        \u003crule\u003eﬀ\u003c/rule\u003e\n        \u003ctype\u003elevel\u003c/type\u003e\n    \u003c/ruleset\u003e\n\u003c/levelrules\u003e\n```\n - **Elements**\n  - `\u003clevelrules\u003e` = root element of a gt-MufiLevelRules dataset\n    - `\u003cruleset\u003e`  = root element of a ruleset\n        - `\u003crange\u003e` = category of characters\n        - `\u003cdesc\u003e`  = general description of the sign or symbol\n        - `\u003crule\u003e`\n          - Level 1: rule[position() = 1]\n          - Level 2: rule[position() = 2]\n          - Level 3: rule[position() = 3]\n\nThe category of characters `\u003crange\u003e` and the general description of the sign or symbol `\u003cdesc\u003e` were imported from the MUFI dataset. \n\nThe JSONPaths are:\n - range : `$['..']['range']`\n - desc  : `$['..']['description']`\n\n## See Also\n\n- MUFI: The Medieval Unicode Font Initiative https://mufi.info/\n- MUFI's data as JSON export https://gefin.ku.dk/q.php?q=mufiexport\n- OCR-D Ground Truth Transcription Guidelines  https://ocr-d.de/en/gt-guidelines/trans/\n- Ground Truth level overview https://ocr-d.de/en/gt-guidelines/trans/trLevels.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fgt-mufilevelrules","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focr-d%2Fgt-mufilevelrules","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focr-d%2Fgt-mufilevelrules/lists"}