{"id":25895504,"url":"https://github.com/diamondlightsource/hdfmap","last_synced_at":"2025-03-02T22:31:46.383Z","repository":{"id":247237537,"uuid":"819476379","full_name":"DiamondLightSource/hdfmap","owner":"DiamondLightSource","description":"Map objects within an HDF file and create a dataset namespace","archived":false,"fork":false,"pushed_at":"2025-02-05T15:32:52.000Z","size":25702,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-05T16:36:39.783Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DiamondLightSource.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-24T15:23:27.000Z","updated_at":"2025-02-05T15:32:56.000Z","dependencies_parsed_at":"2024-08-05T14:17:54.094Z","dependency_job_id":"8368e309-692f-419f-85f4-f44bffd8fa59","html_url":"https://github.com/DiamondLightSource/hdfmap","commit_stats":null,"previous_names":["danporter/hdfmap"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DiamondLightSource%2Fhdfmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DiamondLightSource%2Fhdfmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DiamondLightSource%2Fhdfmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DiamondLightSource%2Fhdfmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DiamondLightSource","download_url":"https://codeload.github.com/DiamondLightSource/hdfmap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241582515,"owners_count":19985845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-02T22:31:41.736Z","updated_at":"2025-03-02T22:31:46.364Z","avatar_url":"https://github.com/DiamondLightSource.png","language":"Python","readme":"# hdfmap\nMap objects within an HDF file and create a dataset namespace.\n\n[![PyPI](https://img.shields.io/pypi/v/dls-dodal.svg)](https://pypi.org/project/hdfmap)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![](https://img.shields.io/github/forks/DiamondLightSource/hdfmap?label=GitHub%20Repo\u0026style=social)](https://github.com/DiamondLightSource/hdfmap)\n\n**Version 0.8**\n\n| By Dan Porter        | \n|----------------------|\n| Diamond Light Source |\n| 2024-2025            |\n\n### Documentation\n[![Docs](https://img.shields.io/badge/Material_for_MkDocs-526CFE?style=for-the-badge\u0026logo=MaterialForMkDocs\u0026logoColor=white)](https://diamondlightsource.github.io/hdfmap/)\n[diamondlightsource.github.io/hdfmap](https://diamondlightsource.github.io/hdfmap/)\n\n### TL;DR - Usage\n\n```python\nfrom hdfmap import create_nexus_map, load_hdf\n\n# HdfMap from NeXus file - get dataset paths:\nm = create_nexus_map('file.nxs')\nm['energy']  # \u003e\u003e '/entry/instrument/monochromator/energy'\nm['signal']  # \u003e\u003e '/entry/measurement/sum'\nm['axes0']  # \u003e\u003e '/entry/measurement/theta'\nm.get_image_path()  # \u003e\u003e '/entry/instrument/pil3_100k/data'\n\n# load dataset data\nwith load_hdf('file.nxs') as nxs:\n    path = m.get_path('scan_command')\n    cmd = nxs[path][()]  # returns bytes data direct from file\n    cmd = m.get_data(nxs, 'scan_command')  # returns converted str output\n    string = m.format_hdf(nxs, \"the energy is {energy:.2f} keV\")\n    d = m.get_dataholder(nxs)  # classic data table, d.scannable, d.metadata\n\n# Shortcuts - single file reloader class\nfrom hdfmap import NexusLoader\n\nscan = NexusLoader('file.hdf')\n[data1, data2] = scan.get_data(['dataset_name_1', 'dataset_name_2'])\ndata = scan.eval('dataset_name_1 * 100 + 2')\nstring = scan.format('my data is {dataset_name_1:.2f}')\n\n# Shortcuts - multifile load data (generate map from first file)\nfrom hdfmap import hdf_data, hdf_eval, hdf_format, hdf_image\n\nall_data = hdf_data([f'file{n}.nxs' for n in range(100)], 'dataset_name')\nnormalised_data = hdf_eval(filenames, 'total / Transmission / (rc / 300.)')\ndescriptions = hdf_format(filenames, 'Energy: {en:5.3f} keV')\nimage_stack = hdf_image(filenames, index=31)\n```\n\n### Installation\n*Requires:* Python \u003e=3.10, Numpy, h5py\n\n#### from PyPI\n```bash\npython -m pip install hdfmap\n```\n\n#### from GitHub\n```bash\npython -m pip install --upgrade git+https://github.com/DiamondLightSource/hdfmap.git\n```\n\n### Description\nAnother generic hdf reader but the idea here is to build up a namespace dict of `{'name': 'path'}` \nfor every dataset, then group them in hopefully a useful way. \n\nObjects within the HDF file are separated into Groups and Datasets. Each object has a\ndefined 'path' and 'name' paramater, as well as other attributes\n\n - path -\u003e '/entry/measurement/data' -\u003e the location of an object within the file\n - name -\u003e 'data' -\u003e an path expressed as a simple variable name\n\nPaths are unique locations within the file but can be used to identify similar objects in other files\nNames may not be unique within a file and are generated from the path.\n\n|               | **name**                     | **path**                             |\n|---------------|------------------------------|--------------------------------------|\n| *Description* | simple identifier of dataset | hdf path built from position in file |\n| *Example*     | `'scan_command'`             | `'/entry/scan_command'`              |\n\nNames of different types of datasets are stored for arrays (size \u003e 0) and values (size 0)\nNames for scannables relate to all arrays of a particular size\nA combined list of names is provided where scannables \u003e arrays \u003e values\n\n### HdfMap Attributes\n|                |                                                        |\n|----------------|--------------------------------------------------------|\n| map.groups     | stores attributes of each group by path                |\n| map.classes    | stores list of group paths by nx_class                 |\n| map.datasets   | stores attributes of each dataset by path              |\n| map.arrays     | stores array dataset paths by name                     |\n| map.values     | stores value dataset paths by name                     |\n| map.scannables | stores array dataset paths with given size, by name    |\n| map.combined   | stores array and value paths (arrays overwrite values) |\n| map.image_data | stores dataset paths of image data                     |\n\n#### E.G.\n```python\nmap.groups = {'/hdf/group': ('class', 'name', {attrs}, [datasets])}\nmap.classes = {'class_name': ['/hdf/group1', '/hdf/group2']}\nmap.datasets = {'/hdf/group/dataset': ('name', size, shape, {attrs})}\nmap.arrays = {'name': '/hdf/group/dataset'}\nmap.values = {'name': '/hdf/group/dataset'}\nmap.scannables = {'name': '/hdf/group/dataset'}\nmap.image_data = {'name': '/hdf/group/dataset'}\n```\n\n\n### HdfMap Methods\n|                                                   |                                                                             |\n|---------------------------------------------------|-----------------------------------------------------------------------------|\n| `map.populate(h5py.File)`                         | populates the dictionaries using the  given file                            |\n| `map.generate_scannables(array_size)`             | populates scannables namespace with arrays of same size                     |\n| `map.most_common_size()`                          | returns the most common dataset size \u003e 1                                    |\n| `map.get_attr('name_or_path', 'attr')`            | return value of dataset attribute                                           |\n| `map.get_path('name_or_group_or_class')`          | returns path of object with name                                            |\n| `map.get_image_path()`                            | returns default path of detector dataset (or largest dataset)               |\n| `map.get_group_path('name_or_path_or_class')`     | return path of group with class                                             |\n| `map.get_group_datasets('name_or_path_or_class')` | return list of dataset paths in class                                       |\n| `map.find_groups(*names_or_classes)`              | return list of group paths matching given group names or classes            |\n| `map.find_datasets(*names_or_classes)`            | return list of dataset paths matching given names, classes or attributes    |\n| `map.find_paths('string')`                        | return list of dataset paths containing string                              |\n| `map.find_names('string')`                        | return list of dataset names containing string                              |\n| `map.find_attr('attr_name')`                      | return list of paths of groups or datasets containing attribute 'attr_name' |\n\n\n### HdfMap File Methods\n|                                          |                                                       |\n|------------------------------------------|-------------------------------------------------------|\n| `map.get_metadata(h5py.File)`            | returns dict of value datasets                        |\n| `map.get_scannables(h5py.File)`          | returns dict of scannable datasets                    |\n| `map.get_scannalbes_array(h5py.File)`    | returns numpy array of scannable datasets             |\n| `map.get_dataholder(h5py.File)`          | returns dict like object with metadata and scannables |\n| `map.get_image(h5py.File, index)`        | returns image data                                    |\n| `map.get_data(h5py.File, 'name')`        | returns data from dataset                             |\n| `map.eval(h5py.File, 'expression')`      | returns output of expression using dataset names      |\n| `map.format(h5py.File, 'string {name}')` | returns output of str expression                      |\n\n\n### NeXus Files\nFiles using the [NeXus Format](https://www.nexusformat.org/) can generate special NexusMap objects.\nThese work in the same way as the general HdfMaps but contain additional special names in the namespace:\n\n|                |                                    |\n|----------------|------------------------------------|\n| `'axes'`       | returns path of default NXaxes     |\n| `'signal'`     | returns path of default NXsignal   |\n\nIn addition, the `map.scannables` dict will be populated automatically by the names given in the \"scan_fields\" dataset\nor by datasets from the first *NXdata* group. The default *image* data will be taken from the first \n*NXdetector* dataset.\n\n\n## Examples\n### scan data \u0026 metadata\nSeparate datasets in a NeXus file into Diamond's classic scannables and metadata, similar to what was in the old\n'*.dat' files.\n\n```python\nfrom hdfmap import create_nexus_map, load_hdf\n\n# HdfMap from NeXus file:\nhmap = create_nexus_map('file.nxs')\nwith load_hdf('file.nxs') as nxs:\n    scannables = hmap.get_scannables_array(nxs)  # creates 2D numpy array\n    labels = scannables.dtype.names\n    metadata = hmap.get_metadata(nxs)  # {'name': value}\n    d = hmap.get_dataholder(nxs)  # classic data table, d.scannable, d.metadata\nd.theta == d['theta']  # scannable array 'theta'\nd.metadata.scan_command == d.metadata['scan_command']  # single value 'scan_command'\n\n# OR, use the shortcut:\nfrom hdfmap import nexus_data_block\n\nd = nexus_data_block('file.nxs')\n\n# The data loader class removes the need to open the files:\nfrom hdfmap import NexusLoader\n\nscan = NexusLoader('file.nxs')\nmetadata = scan.get_metadata()\nscannables = scan.get_scannables()\n```\n\n### automatic default plot axes\nIf defined in the nexus file, 'axes' and 'signal' will be populated automatically\n\n```python\nimport matplotlib.pyplot as plt\nfrom hdfmap import create_nexus_map, load_hdf\n\n# HdfMap from NeXus file:\nhmap = create_nexus_map('file.nxs')\nwith load_hdf('file.nxs') as nxs:\n    axes = hmap.get_data(nxs, 'axes')\n    signal = hmap.get_data(nxs, 'signal')\n    title = hmap.format_hdf(nxs, \"{entry_identifier}\\n{scan_command}\")\naxes_label = hmap.get_path('axes')\nsignal_label = hmap.get_path('signal')\n# plot the data (e.g. using matplotlib)\nplt.figure()\nplt.plot(axes, signal)\nplt.xlabel(axes_label)\nplt.ylabel(signal_label)\nplt.title(title)\n\n# Or, using NexusLoader:\nfrom hdfmap import NexusLoader\n\nscan = NexusLoader('file.nxs')\naxes, signal = scan('axes, signal')\naxes_label, signal_label = scan('_axes, _signal')\ntitle = scan.format(\"{entry_identifier}\\n{scan_command}\")\n```\n\n### Automatic image data\nGet images from the first detector in a NeXus file\n\n```python\nfrom hdfmap import create_nexus_map, load_hdf\n\n# HdfMap from NeXus file:\nhmap = create_nexus_map('file.nxs')\nimage_location = hmap.get_image_path()  # returns the hdf path chosen for the default detector\nwith load_hdf('file.nxs') as nxs:\n    middle_image = hmap.get_image(nxs)  # returns single image from index len(dataset)//2\n    first_image = hmap.get_image(nxs, 0)  # returns single image from dataset[0, :, :]\n    volume = hmap.get_image(nxs, ())  # returns whole volume as array\n    roi = hmap.get_image(nxs, (0, slice(5, 10, 1), slice(5, 10, 1)))  # returns part of dataset\n\n# Or, using NexusLoader:\nfrom hdfmap import NexusLoader\n\nscan = NexusLoader('file.nxs')\nimage = scan.get_image(index=0)  # using index as defined above\n```\n\n### Multi-scan metadata string\nGenerate a metadata string from every file in a directory very quickly. The HdfMap is only created for the first file,\nthe remaining files are treated as having identical structure.\n```python\nfrom hdfmap import list_files, hdf_format\n\nformat_string = \"#{entry_identifier}: {start_time} : E={incident_energy:.3f} keV : {scan_command}\"\nfiles = list_files('/directoy/path', extension='.nxs')\nstrings_list = hdf_format(files, format_string)\nprint('\\n'.join(strings_list))\n\n# other multi-file readers:\nfrom hdfmap import hdf_data, hdf_image, hdf_eval\n\ndata_list = hdf_data(files, 'incident_energy')\nimage_list = hdf_image(files, index=0)\ndata_list = hdf_eval(files, 'signal / Transmission')\n```\n\n### Metadata Evaluation \nFunctionality for namespace evaluation of the hdf file allows for a number of rules allowing easy extraction\nof formatted metadata. The Evaluation functions are:\n\n - `HdfMap.eval(hdfobj, 'name')` -\u003e value\n - `HdfMap.format_hdf(hdfobj, '{name}')` -\u003e string\n - `HdfLoader('eval')` -\u003e value\n - `HdfLoader.eval('eval')` -\u003e value\n - `HdfLoader.format('{name}')` -\u003e string\n - `hdf_eval([files], 'name')` -\u003e list[values]\n - `hdf_format([files], '{name}')` -\u003e list[string]\n\n#### eval vs format\nEvaluation functions evaluate the expression as given, replacing names in the hdfmap namespace with their associated\nvalues, or using the rules below. The format functions allow the input of python \n[f-strings](https://docs.python.org/3/tutorial/inputoutput.html#fancier-output-formatting),\nallowing precise formatting to be applied and returning a string.\n\n#### Rules\nThe following patterns are allowed in any expression:\n - 'filename': str, name of hdf_file\n - 'filepath': str, full path of hdf_file\n - '_*name*': str hdf path of *name*\n - '__*name*': str internal name of *name* (e.g. for 'axes')\n - 's_*name*': string representation of dataset (includes units if available)\n - '*name*@attr': returns attribute of dataset *name*\n - '*name*?(default)': returns default if *name* doesn't exist\n - '(name1|name2|name3)': returns the first available of the names\n - '(name1|name2@(default))': returns the first available name or default\n\n#### Examples\n```python\nfrom hdfmap import create_nexus_map, load_hdf\n\n# HdfMap from NeXus file:\nhmap = create_nexus_map('file.nxs')\nwith load_hdf('file.nxs') as nxs:\n    # mathematical array expressions (using np as Numpy)\n    data = hmap.eval(nxs, 'int(np.max(total / Transmission / count_time))')\n    # return the path of a name\n    path = hmap.eval(nxs, '_axes')  # -\u003e '/entry/measurement/h'\n    # return the real name of a variable\n    name = hmap.eval(nxs, '__axes')  # -\u003e 'h'\n    # return label, using dataset attributes\n    label = hmap.eval(nxs, 's_ppy')  # example uses @decimals and @units\n    # return dataset attributes\n    attr = hmap.eval(nxs, 'idgap@units')  # -\u003e 'mm'\n    # return first available dataset\n    cmd = hmap.eval(nxs, '(cmd|title|scan_command)')  # -\u003e 'scan hkl ...'\n    # return first available or default value\n    atten = hmap.eval(nxs, '(gains_atten|atten?(0))')  # -\u003e 0\n    # python expression using multiple parameters\n    pol = hmap.eval(nxs, '\"pol in\" if abs(delta_offset) \u003c 0.1 and abs(thp) \u003e 20 else \"pol out\"')\n    # formatted strings\n    title = hmap.format_hdf(nxs, '{filename}: {scan_command}')\n    hkl = hmap.format_hdf(nxs, '({np.mean(h):.3g},{np.mean(k):.3g},{np.mean(l):.3g})')\n\n# Or, using NexusLoader:\nfrom hdfmap import NexusLoader\n\nscan = NexusLoader('file.nxs')\n# normalised default-signal\nprint(scan('signal / count_time / Transmission / (rc / 300.)'))\n# axes label\nprint(scan.format('{__axes} [{axes@units}]'))\n\n# Or, for multiple-files:\nfrom hdfmap import hdf_eval, hdf_format, list_files\n\nfiles = [f\"file{n}.nxs\" for n in range(10)]\n\nenergy_values = hdf_eval(files, '(en|energy@(8))')\nlist_scans = hdf_format(files, '{filename}: ({np.mean(h):.3g},{np.mean(k):.3g},{np.mean(l):.3g}) : {scan_command})')\nprint('\\n'.join(list_scans))\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiamondlightsource%2Fhdfmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiamondlightsource%2Fhdfmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiamondlightsource%2Fhdfmap/lists"}