{"id":25019704,"url":"https://github.com/djeada/hdf5-examples","last_synced_at":"2025-03-30T10:11:52.226Z","repository":{"id":114371816,"uuid":"327933434","full_name":"djeada/Hdf5-Examples","owner":"djeada","description":"This repository contains a collection of code examples demonstrating various techniques and methods for working with HDF5 (Hierarchical Data Format version 5) files. These examples are designed to help developers and data scientists efficiently manage, process, and analyze large datasets stored in HDF5 format.","archived":false,"fork":false,"pushed_at":"2022-05-08T23:19:17.000Z","size":6351,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-05T11:51:56.306Z","etag":null,"topics":["compression","hdf5-format","serialization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/djeada.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-08T15:01:10.000Z","updated_at":"2024-06-04T21:13:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"38482452-e0d6-4d54-80d8-1ad949afca56","html_url":"https://github.com/djeada/Hdf5-Examples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djeada%2FHdf5-Examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djeada%2FHdf5-Examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djeada%2FHdf5-Examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djeada%2FHdf5-Examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/djeada","download_url":"https://codeload.github.com/djeada/Hdf5-Examples/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246301997,"owners_count":20755514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","hdf5-format","serialization"],"created_at":"2025-02-05T11:51:27.201Z","updated_at":"2025-03-30T10:11:52.221Z","avatar_url":"https://github.com/djeada.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HDF5\nCode examples for processing HDF5 files.\n\n\u003ch1\u003eAbout HDF5 \u003c/h1\u003e\nHDF5 is a file format for storing data that is highly extensible and flexible.\nFor example, you can store a large number of images in a single HDF5 file.\n\n* Stands for \"Hierarichal Data Format\".\n* Current version is 5.\n* It is a file format for storing data that is highly extensible and flexible.\n* Open-source and free.\n* We may directly use the core implementation in C, C++, and Java. There are wrappers for several other languages, including Python. \n\n\u003ch1\u003eUsing HDF5 \u003c/h1\u003e\nTo use HDF5, you need to install the h5py module.\nThen you can use it to read and write HDF5 files.\nFor example, to read a file called \"myfile.h5\"\n\n```Python\n  import h5py\n  f = h5py.File('myfile.h5', 'r')\n  print(f.keys())\n  print(f['data'].shape)\n  print(f['data'][:])\n  f.close()\n```\n\nTo save a file, you need to create a new file object.\nFor example, to create a new file called \"myfile.h5\"\n\n```Python\n    import h5py\n    f = h5py.File('myfile.h5', 'w')\n    data_set = f.create_dataset('data', (100,), dtype='i')\n    data_set[:] = np.arange(100)\n    f.close()\n```\n\n\u003ch1\u003eStructure \u003c/h1\u003e\n\n* Groups (a concept similar to directories)\n  - Groups can contain datasets and other groups.\n  \n* Datasets (a concept similar to files)\n  - Shape (ex. 1D, 2D, 5D)\n  - Datatype (ex. float, int32)\n  - Attributes (ex. compression, chunking, compression)\n  - Data (ex. data[:])\n  - Subdatasets (ex. subdataset[:])\n\n\u003ch1\u003eLinear vs Chunked \u003c/h1\u003e\nThis concept diffrentiaties HDF5 from other data formats. \nChunked datasets are stored in a more compact way.\nIt allows for faster access to data.\n\nLinear:\n  - Data is stored in a single file.\n  - Data is stored in a single chunk.\n  - Data is stored in a single block.\n\nChunked:\n  - Data is stored in multiple chunks.\n  - Data is stored in multiple blocks.\n  - Data is stored in multiple files.\n\nChunk size must strike a balance:\n - maximizing i/o speed.\n - minimizing non-used data i/o.\n - minimizing chunking i/o overhead cost.\n\n\u003ch1\u003eFilter\u003c/h1\u003e\n\nFilter is a way to compress data.\n  - Can be applied to datasets.\n  - It is a layer betwen program and data.\n\nProgram \u003c- Filter (CPU) \u003c- data (Disk).\n\nExamples:\n* Gzip (compression filter)\n* ScaleOffset (stores data subtracted by median, then while reading median is added back)\n* Szip (compression filter)\n* Shuffle (shuffles data)\n* Fletcher32 (checksum)\n\n\u003ch1\u003eCode Samples\u003c/h1\u003e\n\n* \u003ca href=\"https://github.com/djeada/Hdf5/blob/main/src/basic_io.py\"\u003eBasic IO\u003c/a\u003e\n* \u003ca href=\"https://github.com/djeada/Hdf5/blob/main/src/groups.py\"\u003eGroups\u003c/a\u003e\n* \u003ca href=\"https://github.com/djeada/Hdf5/blob/main/src/compression.py\"\u003eCompression\u003c/a\u003e\n* \u003ca href=\"https://github.com/djeada/Hdf5/blob/main/src/attributes.py\"\u003eAttributes\u003c/a\u003e\n* \u003ca href=\"https://github.com/djeada/Hdf5/blob/main/src/custom_class.py\"\u003eCustom Class\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjeada%2Fhdf5-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdjeada%2Fhdf5-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjeada%2Fhdf5-examples/lists"}