{"id":21962520,"url":"https://github.com/biocpy/hdf5array","last_synced_at":"2025-09-05T22:32:52.815Z","repository":{"id":182987783,"uuid":"656768533","full_name":"BiocPy/HDF5Array","owner":"BiocPy","description":"HDF5 File-backed arrays for Python","archived":false,"fork":false,"pushed_at":"2025-04-21T16:28:35.000Z","size":1385,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-21T17:35:36.869Z","etag":null,"topics":["delayedarray","hdf5"],"latest_commit_sha":null,"homepage":"https://biocpy.github.io/HDF5Array/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BiocPy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-21T15:43:10.000Z","updated_at":"2025-03-05T01:03:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"d5cd57bf-e670-4a7a-9945-db08f832d226","html_url":"https://github.com/BiocPy/HDF5Array","commit_stats":null,"previous_names":["biocpy/filebackedarray","biocpy/hdf5array"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FHDF5Array","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FHDF5Array/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FHDF5Array/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FHDF5Array/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BiocPy","download_url":"https://codeload.github.com/BiocPy/HDF5Array/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250515783,"owners_count":21443485,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delayedarray","hdf5"],"created_at":"2024-11-29T10:42:51.153Z","updated_at":"2025-04-23T21:24:26.607Z","avatar_url":"https://github.com/BiocPy.png","language":"Python","readme":"\u003c!-- These are examples of badges you might want to add to your README:\n     please update the URLs accordingly\n\n[![Built Status](https://api.cirrus-ci.com/github/\u003cUSER\u003e/hdf5array.svg?branch=main)](https://cirrus-ci.com/github/\u003cUSER\u003e/hdf5array)\n[![ReadTheDocs](https://readthedocs.org/projects/hdf5array/badge/?version=latest)](https://hdf5array.readthedocs.io/en/stable/)\n[![Coveralls](https://img.shields.io/coveralls/github/\u003cUSER\u003e/hdf5array/main.svg)](https://coveralls.io/r/\u003cUSER\u003e/hdf5array)\n[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/hdf5array.svg)](https://anaconda.org/conda-forge/hdf5array)\n[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026label=Twitter)](https://twitter.com/hdf5array)\n--\u003e\n\n[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)\n[![PyPI-Server](https://img.shields.io/pypi/v/hdf5array.svg)](https://pypi.org/project/hdf5array/)\n[![Monthly Downloads](https://pepy.tech/badge/hdf5array/month)](https://pepy.tech/project/hdf5array)\n![Unit tests](https://github.com/BiocPy/hdf5array/actions/workflows/pypi-test.yml/badge.svg)\n\n\n# hdf5array\n\n## Introduction\n\nThis is the Python equivalent of Bioconductor's [**HDF5Array**](https://bioconductor.org/packages/HDF5Array) package,\nproviding a representation of HDF5-backed arrays within the [**delayedarray**](https://github.com/BiocPy/delayedarray) framework.\nThe idea is to allow users to store, manipulate and operate on large datasets without loading them into memory,\nin a manner that is trivially compatible with other data structures in the [**BiocPy**](https::/github.com/BiocPy) ecosystem.\n\n## Installation\n\nThis package can be installed from [PyPI](https://pypi.org/project/hdf5array/) with the usual commands:\n\n```shell\npip install hdf5array\n```\n\n## Quick start\n\nLet's mock up a dense array:\n\n```python\nimport numpy\ndata = numpy.random.rand(40, 50, 100)\n\nimport h5py\nwith h5py.File(\"whee.h5\", \"w\") as handle:\n    handle.create_dataset(\"yay\", data=data)\n```\n\nWe can now represent it as a `Hdf5DenseArray`:\n\n```python\nimport hdf5array\narr = hdf5array.Hdf5DenseArray(\"whee.h5\", \"yay\", native_order=True)\n## \u003c40 x 50 x 100\u003e Hdf5DenseArray object of type 'float64'\n## [[[0.63008796, 0.34849183, 0.75621679, ..., 0.07343495, 0.63095765,\n##    0.625732  ],\n##   [0.68123095, 0.91403054, 0.74737122, ..., 0.17344344, 0.82254404,\n##    0.58158815],\n##   [0.83287116, 0.40738123, 0.89887551, ..., 0.34936481, 0.76600276,\n##    0.91991967],\n##   ...,\n```\n\nThis is just a subclass of a `DelayedArray` and can be used anywhere in the BiocPy framework.\nParts of the NumPy API are also supported - for example, we could apply a variety of delayed operations:\n\n```python\nscaling = numpy.random.rand(100)\ntransformed = numpy.log1p(arr / scaling)\n## \u003c40 x 50 x 100\u003e DelayedArray object of type 'float64'\n## [[[0.58803887, 0.3458478 , 0.82700531, ..., 0.08224734, 0.65678967,\n##    0.56893312],\n##   [0.62348907, 0.7341526 , 0.82040225, ..., 0.18437718, 0.7932422 ,\n##    0.53784637],\n##   [0.72176703, 0.39407341, 0.92788307, ..., 0.34205035, 0.75487196,\n##    0.75456938],\n##   ...,\n```\n\nCheck out the [documentation](https://biocpy.github.io/hdf5array/) for more details.\n\n## Handling sparse matrices\n\nWe support a variety of compressed sparse formats where the non-zero elements are held inside three separate datasets -\nusually `data`, `indices` and `indptr`, based on the [10X Genomics sparse HDF5 format](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices).\nTo demonstrate, let's mock up some sparse data using **scipy**:\n\n```python\nimport scipy.sparse\nmock = scipy.sparse.random(1000, 200, 0.1).tocsc()\n\nwith h5py.File(\"sparse_whee.h5\", \"w\") as handle:\n    handle.create_dataset(\"sparse_blah/data\", data=mock.data, compression=\"gzip\")\n    handle.create_dataset(\"sparse_blah/indices\", data=mock.indices, compression=\"gzip\")\n    handle.create_dataset(\"sparse_blah/indptr\", data=mock.indptr, compression=\"gzip\")\n```\n\nWe can then create a sparse HDF5-backed matrix.\nNote that there is some variation in this HDF5 compressed sparse format, notably where the dimensions are stored and whether it is column/row-major.\nThe constructor will not do any auto-detection so we need to provide this information explicitly:\n\n```python\nimport hdf5array\narr = hdf5array.Hdf5CompressedSparseMatrix(\n    \"sparse_whee.h5\",\n    \"sparse_blah\",\n    shape=(100, 200),\n    by_column=True\n)\n## \u003c100 x 200\u003e sparse Hdf5CompressedSparseMatrix object of type 'float64'\n## [[0.        , 0.        , 0.26563417, ..., 0.        , 0.        ,\n##   0.        ],\n##  [0.        , 0.        , 0.        , ..., 0.23896924, 0.        ,\n##   0.        ],\n##  [0.        , 0.        , 0.        , ..., 0.42236848, 0.3585153 ,\n##   0.        ],\n##  ...,\n##  [0.        , 0.        , 0.3363087 , ..., 0.        , 0.        ,\n##   0.        ],\n##  [0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n##   0.        ],\n##  [0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n##   0.        ]]\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocpy%2Fhdf5array","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiocpy%2Fhdf5array","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocpy%2Fhdf5array/lists"}