{"id":36898217,"url":"https://github.com/kuleuven/mango-mdconverter","last_synced_at":"2026-01-12T15:43:52.207Z","repository":{"id":265422248,"uuid":"895956890","full_name":"kuleuven/mango-mdconverter","owner":"kuleuven","description":"Metadata conversion for ManGO","archived":false,"fork":false,"pushed_at":"2025-04-07T13:10:30.000Z","size":37,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"development","last_synced_at":"2025-09-28T12:35:56.918Z","etag":null,"topics":["mango","rdm-kuleuven"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuleuven.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-29T09:02:51.000Z","updated_at":"2025-05-07T13:23:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"cdefc61a-7b6f-47df-81d2-cb80336b4f6f","html_url":"https://github.com/kuleuven/mango-mdconverter","commit_stats":null,"previous_names":["kuleuven/mango-mdconverter"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/kuleuven/mango-mdconverter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuleuven%2Fmango-mdconverter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuleuven%2Fmango-mdconverter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuleuven%2Fmango-mdconverter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuleuven%2Fmango-mdconverter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuleuven","download_url":"https://codeload.github.com/kuleuven/mango-mdconverter/tar.gz/refs/heads/development","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuleuven%2Fmango-mdconverter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28341580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T12:22:26.515Z","status":"ssl_error","status_checked_at":"2026-01-12T12:22:10.856Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mango","rdm-kuleuven"],"created_at":"2026-01-12T15:43:51.610Z","updated_at":"2026-01-12T15:43:52.202Z","avatar_url":"https://github.com/kuleuven.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Convert iRODS Metadata into a Python dictionary\n\n\nThe `md2dict` module of `mango_mdconverter` creates Python dictionaries\nby flattening namespaced iRODS metadata items. This can be done:\n\n- naively with regards to the semantics, simply unnesting the\n  namespacing\n  - also ignoring units\n  - returning value-units tuples if units exist\n- reorganizing the dictionary to bring ManGO schemas together and\n  “analysis” metadata together\n\nYou can install this package with `pip`:\n\n``` python\npip install mango-mdconverter\n```\n\n    Requirement already satisfied: mango-mdconverter in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (0.0.8)\n    Requirement already satisfied: mango-mdschema\u003e=1.0.3 in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from mango-mdconverter) (1.0.3)\n    Requirement already satisfied: python-irodsclient==2.2.0 in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from mango-mdconverter) (2.2.0)\n    Requirement already satisfied: PrettyTable\u003e=0.7.2 in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from python-irodsclient==2.2.0-\u003emango-mdconverter) (3.12.0)\n    Requirement already satisfied: defusedxml in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from python-irodsclient==2.2.0-\u003emango-mdconverter) (0.7.1)\n    Requirement already satisfied: six\u003e=1.10.0 in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from python-irodsclient==2.2.0-\u003emango-mdconverter) (1.16.0)\n    Requirement already satisfied: validators\u003e=0.22.0 in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from mango-mdschema\u003e=1.0.3-\u003emango-mdconverter) (0.22.0)\n    Requirement already satisfied: wcwidth in /home/mariana/repos/github/kuleuven/mango-mdconverter/venv/lib/python3.12/site-packages (from PrettyTable\u003e=0.7.2-\u003epython-irodsclient==2.2.0-\u003emango-mdconverter) (0.2.13)\n\n    [notice] A new release of pip is available: 24.3.1 -\u003e 25.0\n    [notice] To update, run: python -m pip install --upgrade pip\n    Note: you may need to restart the kernel to use updated packages.\n\nThe module can be imported like so:\n\n``` python\nfrom mango_mdconverter import md2dict\n\n# from mango_mdconverter.md2dict import convert_metadata_to_dict # to import a specific function\n```\n\n## Example\n\nTo understand this better, let’s look at some examples. We’ll simulate a\nset of metadata from an iRODS item, and it looks like so:\n\n``` python\nfrom irods.meta import iRODSMeta\n\nmetadata_items = [\n    iRODSMeta(\"mgs.book.author.name\", \"Fulano De Tal\", \"1\"),\n    iRODSMeta(\"mgs.book.author.age\", \"50\", \"1\"),\n    iRODSMeta(\"mgs.book.author.pet\", \"cat\", \"1\"),\n    iRODSMeta(\"mgs.book.author.name\", \"Jane Doe\", \"2\"),\n    iRODSMeta(\"mgs.book.author.age\", \"29\", \"2\"),\n    iRODSMeta(\"mgs.book.author.pet\", \"cat\", \"2\"),\n    iRODSMeta(\"mgs.book.author.pet\", \"parrot\", \"2\"),\n    iRODSMeta(\"mgs.book.title\", \"A random book title\"),\n    iRODSMeta(\"mg.mime_type\", \"text/plain\"),\n    iRODSMeta(\"page_n\", \"567\", \"analysis/reading\"),\n    iRODSMeta(\"chapter_n\", \"15\", \"analysis/reading\"),\n]\n```\n\n## Naive conversion\n\nThe `unflatten_namespace_into_dict()` function updates a dictionary with\nthe name-value pairs of an AVU, and optionally with the units as well.\nGiven a dictionary `metadict`, we can provide it an AVU name and value\nto either add the respective keys and values to the dictionary or, if\nthe key already exists, to append the value to the list of values.\n\n``` python\nmetadict = {}\nmd2dict.unflatten_namespace_into_dict(metadict, \"AVU_name\", \"AVU_value\")\nmetadict\n```\n\n    {'AVU_name': 'AVU_value'}\n\nMetadata names with dots will be assumed to be namespaced: they will be\nsplit and their values will become dictionaries themselves.\n\n``` python\nmetadict = {}\nmd2dict.unflatten_namespace_into_dict(metadict, \"level1.level2.level3\", \"AVU_value\")\nmetadict\n```\n\n    {'level1': {'level2': {'level3': 'AVU_value'}}}\n\nFor a full list of metadata items, such as the output of the\n`.metadata.items()` method of an iRODS data object or collection, we\ncould loop over the iterable:\n\n``` python\nmetadict = {}\nfor avu in metadata_items:\n    md2dict.unflatten_namespace_into_dict(metadict, avu.name, avu.value)\nmetadict\n```\n\n    {'mgs': {'book': {'author': {'name': ['Fulano De Tal', 'Jane Doe'],\n        'age': ['50', '29'],\n        'pet': ['cat', 'cat', 'parrot']},\n       'title': 'A random book title'}},\n     'mg': {'mime_type': 'text/plain'},\n     'page_n': '567',\n     'chapter_n': '15'}\n\nAs you can see from the example, the function can work ignoring units.\nThis functionality is sufficient for the opensearch indexing.\n\nFor ManGO schemas, however, we want to use the units to keep track of\nrepeatable composite fields. In order to achieve that, we just have to\nalso provide the unit and set the `use_units` argument to `True`.\n\u003c!-- TODO: Probably the argument is unnecessary? --\u003e\n\nThe `unpack_metadata_to_dict()` is a wrapper around this function that\nalways uses units and takes the whole `irods.meta.iRODSMeta` object as\nan argument instead of the name, value and units separately.\n\n``` python\nmetadict = {}\nfor avu in metadata_items:\n    md2dict.unpack_metadata_into_dict(metadict, avu)\nmetadict\n```\n\n    {'mgs': {'book': {'author': {'name': [('Fulano De Tal', '1'),\n         ('Jane Doe', '2')],\n        'age': [('50', '1'), ('29', '2')],\n        'pet': [('cat', '1'), ('cat', '2'), ('parrot', '2')]},\n       'title': 'A random book title'}},\n     'mg': {'mime_type': 'text/plain'},\n     'page_n': ('567', 'analysis/reading'),\n     'chapter_n': ('15', 'analysis/reading')}\n\nNow items with units are rendered as tuples of values and units, but\nthese are not interpreted in the context of ManGO. This is why this\napproach is the “naïve” one: in order to reorganize this dictionary into\nsomething that makes sense given how ManGO uses schemas and units, we\nneed to use another function.\n\n## ManGO-specific conversion\n\nThe `convert_metadata_to_dict()` function takes an iterable of\n`irods.meta.iRODSMeta` instances and returns a nested dictionary based\non the namespacing of the metadata names as well as the units. It works\nupon the result of `unpack_metadata_into_dict()` and then reformats the\ndictionary to group all metadata schemas under the “schemas” key\n(instead of “mgs”) and to group all items with units starting with\n“analysis/” under the “analysis” key. In addition, the repeatable\ncomposite fields of schemas are reorganized properly based on their\nunits.\n\n``` python\nreorganized_dict = md2dict.convert_metadata_to_dict(metadata_items)\nreorganized_dict\n```\n\n    {'schema': {'book': {'author': [{'age': '50',\n         'name': 'Fulano De Tal',\n         'pet': 'cat'},\n        {'age': '29', 'name': 'Jane Doe', 'pet': ['cat', 'parrot']}],\n       'title': 'A random book title'}},\n     'mg': {'mime_type': 'text/plain'},\n     'analysis': {'reading': {'page_n': '567', 'chapter_n': '15'}}}\n\nThis function is to be used when converting ManGO metadata into a\ndictionary, in order to export it to a sidecar file, for downloading, or\nin the context of cold storage.\n\n## Dictionary filtering\n\nThe `filter_metadata_dict()` function allows you to filter the metadata\ndictionary (naive or otherwise) based on the (nested) keys of interest.\n\nLet’s say that from the naive metadata dictionary `metadict` we only\nwant the metadata fields from the “mgs” and “mg” namespaces - in that\ncase, the second argument is an array with the desired keys:\n\n``` python\nmd2dict.filter_metadata_dict(metadict, [\"mgs\", \"mg\"])\n```\n\n    {'mgs': {'book': {'author': {'name': [('Fulano De Tal', '1'),\n         ('Jane Doe', '2')],\n        'age': [('50', '1'), ('29', '2')],\n        'pet': [('cat', '1'), ('cat', '2'), ('parrot', '2')]},\n       'title': 'A random book title'}},\n     'mg': {'mime_type': 'text/plain'}}\n\nWe could do the same with the ManGO-specific organization, to for\nexample select the ManGO schemas and the analysis fields:\n\n``` python\nmd2dict.filter_metadata_dict(reorganized_dict, [\"schema\", \"analysis\"])\n```\n\n    {'schema': {'book': {'author': [{'age': '50',\n         'name': 'Fulano De Tal',\n         'pet': 'cat'},\n        {'age': '29', 'name': 'Jane Doe', 'pet': ['cat', 'parrot']}],\n       'title': 'A random book title'}},\n     'analysis': {'reading': {'page_n': '567', 'chapter_n': '15'}}}\n\nThis level of filtering is equivalent to doing the following:\n\n``` python\n{k: v for k, v in reorganized_dict.items() if k in [\"schema\", \"analysis\"]}\n```\n\n    {'schema': {'book': {'author': [{'age': '50',\n         'name': 'Fulano De Tal',\n         'pet': 'cat'},\n        {'age': '29', 'name': 'Jane Doe', 'pet': ['cat', 'parrot']}],\n       'title': 'A random book title'}},\n     'analysis': {'reading': {'page_n': '567', 'chapter_n': '15'}}}\n\nWhere this function comes in particularly handy is when you want to\nfilter nested fields. Say, for example, that you want to only retrieve\nspecific schemas and/or specific analysis fields. While our example has\nonly one schema, we can illustrate by selecting only the “title” of the\n“book” schema, discarding the “author”:\n\n``` python\nmd2dict.filter_metadata_dict(reorganized_dict, {\"schema\": {\"book\": [\"title\"]}})\n```\n\n    {'schema': {'book': {'title': 'A random book title'}}}\n\nWe can combine these partial dictionaries with full dictionaries\n(e.g. all of “analysis”) by providing an empty dictionary when we don’t\nwant to filter further:\n\n``` python\nmd2dict.filter_metadata_dict(\n    reorganized_dict, {\"schema\": {\"book\": [\"title\"]}, \"analysis\": {}}\n)\n```\n\n    {'schema': {'book': {'title': 'A random book title'}},\n     'analysis': {'reading': {'page_n': '567', 'chapter_n': '15'}}}\n\nThis also works with repeatable composite fields. For example, by\nselecting only the “pet” and “name” of the “author” composite field,\nwe’ll get an array of dictionaries with only the “pet” and “name” keys:\n\n``` python\nmd2dict.filter_metadata_dict(\n    reorganized_dict, {\"schema\": {\"book\": {\"author\": [\"name\", \"pet\"]}}}\n)\n```\n\n    {'schema': {'book': {'author': [{'name': 'Fulano De Tal', 'pet': 'cat'},\n        {'name': 'Jane Doe', 'pet': ['cat', 'parrot']}]}}}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuleuven%2Fmango-mdconverter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuleuven%2Fmango-mdconverter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuleuven%2Fmango-mdconverter/lists"}