{"id":21659839,"url":"https://github.com/slub/mets-mods2tei","last_synced_at":"2025-04-11T22:40:29.814Z","repository":{"id":41279304,"uuid":"185565368","full_name":"slub/mets-mods2tei","owner":"slub","description":"Convert bibliographic meta data in MODS format to TEI headers","archived":false,"fork":false,"pushed_at":"2025-02-12T10:41:02.000Z","size":1083,"stargazers_count":8,"open_issues_count":18,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-25T18:41:01.111Z","etag":null,"topics":["conversion","mets","mods","tei"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slub.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-08T08:32:51.000Z","updated_at":"2025-02-03T12:23:57.000Z","dependencies_parsed_at":"2025-01-29T15:35:26.181Z","dependency_job_id":null,"html_url":"https://github.com/slub/mets-mods2tei","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fmets-mods2tei","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fmets-mods2tei/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fmets-mods2tei/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fmets-mods2tei/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slub","download_url":"https://codeload.github.com/slub/mets-mods2tei/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248125636,"owners_count":21051766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversion","mets","mods","tei"],"created_at":"2024-11-25T09:31:42.719Z","updated_at":"2025-04-11T22:40:29.789Z","avatar_url":"https://github.com/slub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mets-mods2tei\n\n[![CircleCI](https://circleci.com/gh/slub/mets-mods2tei.svg?style=svg)](https://circleci.com/gh/slub/mets-mods2tei)\n[![codecov](https://codecov.io/gh/slub/mets-mods2tei/branch/master/graph/badge.svg)](https://codecov.io/gh/slub/mets-mods2tei)\n[![PyPI version](https://badge.fury.io/py/mets-mods2tei.svg)](https://badge.fury.io/py/mets-mods2tei)\n\nConvert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.\n\n## Background\n\n[MODS](http://www.loc.gov/standards/mods/) is the de-facto standard for encoding bibliographic\nmeta data in libraries. It is usually included as a separate section into\n[METS](http://www.loc.gov/standards/mets/) XML files. Physical and logical structure of a document\nare expressed in terms of structural mappings (`structMap` elements).\n\n[TEI](https://tei-c.org/) is the de-facto standard for representing digital text for research\npurposes. It usually includes detailed bibliographic meta data in its\n[header](https://tei-c.org/release/doc/tei-p5-doc/de/html/ref-teiHeader.html).\n\nSince these standards contain a considerable amount of degrees of freedom, the conversion uses\nwell-defined subsets. For MODS, this is the\n[*MODS Anwendungsprofil für digitalisierte Medien*](https://dfg-viewer.de/fileadmin/groups/dfgviewer/MODS-Anwendungsprofil_2.3.1.pdf).\nFor METS, the [METS Anwendungsprofil für digitalisierte Medien 2.1](https://www.zvdd.de/fileadmin/AGSDD-Redaktion/METS_Anwendungsprofil_2.1.pdf) is consulted.\nFor the TEI Header, the conversion is roughly based on the [*DTA base format*](https://github.com/deutschestextarchiv/dtabf).\n\n`mets-mods2tei` is developed at the [Saxon State and University Library in Dresden](https://www.slub-dresden.de).\n\n## Installation\n\n`mets-mods2tei` is implemented in Python 3. In the following, we assume a working Python 3\n(tested versions 3.5, 3.6 and 3.7) installation.\n\n### Setup Python\n\nUsing [virtual environments](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments) is highly recommended,\nalthough not strictly necessary for installing `mets-mods2tei`.\n\nTo create a virtual environement in a subdirectory of your choice (e.g. `env`), run\n\n    python3 -m venv env\n\n(once) and then activate it (each time you open the shell) via\n\n    . env/bin/activate\n\nDepending on how old the packages are which your base system provides,\nyou might have to update pip first:\n\n    pip install -U pip setuptools\n\n### Get Python package\n\n`mets-mods2tei` can be installed via `pip3` directly.\nYou can install from either the repository sources or the\nprebuilt distribution on PyPI:\n\n#### From repository\n\nIf you have an active virtual environment, do\n\n    pip install mets-mods2tei\n\nOtherwise, try\n\n    pip3 install --user mets-mods2tei\n\n#### From source\n\nGet the repository:\n\n    git clone https://github.com/slub/mets-mods2tei.git\n    cd mets-mods2tei\n\nIf you have an active virtual environment, do\n\n    pip install .\n\nOtherwise, try\n\n    pip3 install --user .\n\n## Testing\n\n`mets-mods2tei` uses `pytest`-based testing.\n\nTo install the prerequisites for testing, (in your venv), do\n\n    pip install -r requirements-test.txt\n\n(once) and then run the tests via:\n\n    pytest\n\n## Code coverage\n\nDetermine code coverage by running\n\n    make coverage\n\n## Usage\n\n### mm2tei\n\nInstalling `mets-mods2tei` makes the command-line tool `mm2tei` available:\n\n\u003cdetails\u003e\u003csummary\u003emm2tei --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm2tei [OPTIONS] METS\n\n  METS: File containing or URL pointing to the METS/MODS XML to be converted\n\n  Parse given METS and its meta-data, and convert it to TEI.\n\n  If `--ocr` is given, then also read the ALTO full-text files from the\n  fileGrp in `--text-group`, and convert page contents accordingly (in\n  physical order).\n\n  Decorate page boundaries with image and page numbers. Moreover, if `--add-\n  refs` contains `page`, then reference the corresponding base image files (by\n  file name) from `--img-group`. Likewise, if `--add-refs` contains `line`,\n  then reference the corresponding textline segments (by XML ID) from `--text-\n  group`.\n\n  Output XML to `--output (use '-' for stdout), log to stderr.`\n\nOptions:\n  -O, --output FILENAME           File path to write TEI output to\n  -o, --ocr                       Serialize OCR into resulting TEI\n  -T, --text-group TEXT           File group which contains the full-text\n  -I, --img-group TEXT            File group which contains the images\n  -r, --add-refs [page|line]\n  -l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]\n  -h, --help                      Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\nIt reads METS XML via URL or file argument and prints the resulting TEI,\nincluding the extracted information from the MODS part of the METS.\n\n\nExample:\n\n    mm2tei -O tei.xml \"https://digital.slub-dresden.de/oai/?verb=GetRecord\u0026metadataPrefix=mets\u0026identifier=oai:de:slub-dresden:db:id-453779263\"\n\n\n### mm-update\n\nInstalling `mets-mods2tei` also provides the command-line multi-cmd tool `mm-update`:\n\n\u003cdetails\u003e\u003csummary\u003emm-update --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update [OPTIONS] COMMAND [ARGS]...\n\n  Entry-point of multi-purpose CLI for DFG Viewer compliant METS updates\n\nOptions:\n  --version                       Show the version and exit.\n  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]\n                                  Log level\n  -d, --directory WORKSPACE_DIR   Changes the workspace folder location\n                                  [default: METS_URL directory or .]\"\n  -m, --mets METS_URL             The path/URL of the METS file [default:\n                                  WORKSPACE_DIR/mets.xml]\n  --backup                        Backup METS whenever it is saved.\n  --help                          Show this message and exit.\n\nCommands:\n  add-agent     add agent headers, optionally from external METS\n  add-file      add a file reference, optionally as URL\n  download      download files into subdirectories, as path or URL\n  remove-file   remove all file references for a specific location,...\n  remove-files  remove all file references for a specific fileGrp / MIME...\n  validate      custom OcrdWorkspaceValidator\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update add-agent --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update add-agent [OPTIONS]\n\n  add agent headers, optionally from external METS\n\nOptions:\n  -m, --mets TEXT  copy metsHdr/agent from this file, too\n  --help           Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update add-file --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update add-file [OPTIONS] PATH\n\n  add a file reference, optionally as URL\n\nOptions:\n  -G, --file-grp FILE_GRP  fileGrp to add to  [required]\n  -m, --mimetype TYPE      Media type of the file. Guessed from extension if\n                           not provided\n  -g, --page-id PAGE_ID    ID of the physical page (or empty if document-\n                           global)\n  -u, --url-prefix TEXT    URL prefix to add to path before storing references\n                           (or else keep local file refs)\n  --help                   Show this message and exit.\n\n\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update remove-file --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update remove-file [OPTIONS] PATH\n\n  remove all file references for a specific location, optionally as URL\n\nOptions:\n  -u, --url-prefix TEXT  URL prefix to add to path before removing references\n                         (or else search verbatim file refs)\n  --help                 Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update remove-files --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update remove-files [OPTIONS]\n\n  remove all file references for a specific fileGrp / MIME type / page ID\n  combination\n\nOptions:\n  -G, --file-grp FILE_GRP  fileGrp to add to  [required]\n  -m, --mimetype TYPE      Media type of the file. Guessed from extension if\n                           not provided\n  -g, --page-id PAGE_ID    ID of the physical page (or empty if document-\n                           global)\n  --help                   Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update validate --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update validate [OPTIONS]\n\n  custom OcrdWorkspaceValidator\n\nOptions:\n  -u, --url-prefix TEXT  validate each file has this URL prefix\n  --help                 Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003emm-update download --help\u003c/summary\u003e\n\u003cp\u003e\n\n```\nUsage: mm-update download [OPTIONS]\n\n  download files into subdirectories, as path or URL\n\nOptions:\n  -G, --file-grp FILE_GRP         fileGrp USE (or empty if all fileGrps)\n  -g, --page-id PAGE_ID           ID of the physical page (or empty if all\n                                  pages)\n  -p, --path-names [URL|GRP/ID.SUF]\n                                  how to generate local path names (from URL\n                                  or from fileGrp, file ID and suffix)\n                                  [default: URL]\n  -u, --url-prefix TEXT           URL prefix to remove from path before\n                                  storing downloaded files (to avoid creating\n                                  host directories)\n  -r, --reference [no-change|replace-by-local|insert-local|append-local]\n                                  whether and how to update the FLocat\n                                  reference in METS  [default: no-change]\n  --help                          Show this message and exit.\n```\n\n\u003c/p\u003e\u003c/details\u003e\n\nExample:\n\n    # dump files (without changing METS):\n    mm-update download -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/\n    ...\n    # add TEI\n    mm-update add-file -G TEI -m application/tei+xml -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ tei.xml\n    ...\n    # remove old PDF:\n    mm-update remove-files -G DOWNLOAD\n    # add new PDF:\n    mm-update add-file -G DOWNLOAD -m application/pdf -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0001 pdf/file_0001.pdf\n    mm-update add-file -G DOWNLOAD -m application/pdf -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0002 pdf/file_0002.pdf\n    mm-update add-file -G DOWNLOAD -m application/pdf -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0003 pdf/file_0003.pdf\n    mm-update add-file -G DOWNLOAD -m application/pdf -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ pdf/all.pdf\n    ...\n    # remove old ALTO:\n    mm-update remove-files -G FULLTEXT -g PHYS_0001\n    mm-update remove-files -G FULLTEXT -g PHYS_0002\n    mm-update remove-files -G FULLTEXT -g PHYS_0003\n    # add new ALTO:\n    mm-update add-file -G FULLTEXT -m text/xml -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0001 ocr/alto_0001.xml\n    mm-update add-file -G FULLTEXT -m text/xml -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0002 ocr/alto_0002.xml\n    mm-update add-file -G FULLTEXT -m text/xml -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/ -g PHYS_0003 ocr/alto_0003.xml\n    ...\n    # validate:\n    mm-update validate -u https://digital.slub-dresden.de/data/kitodo/GottDie_453779263/\n\n## Maintainers\n\nIf you have any questions or encounter any problems, please do not hesitate to contact us.\n\n- [Kay-Michael Würzner](https://github.com/wrznr)\n- [Robert Sachunsky](https://github.com/bertsky)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fmets-mods2tei","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslub%2Fmets-mods2tei","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fmets-mods2tei/lists"}