{"id":21962531,"url":"https://github.com/biocpy/delayedarray","last_synced_at":"2026-02-28T05:53:24.794Z","repository":{"id":188814638,"uuid":"679428055","full_name":"BiocPy/DelayedArray","owner":"BiocPy","description":"DelayedArrays, in Python","archived":false,"fork":false,"pushed_at":"2024-10-23T17:08:57.000Z","size":1152,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-23T19:37:55.975Z","etag":null,"topics":["delayedarray"],"latest_commit_sha":null,"homepage":"https://biocpy.github.io/DelayedArray/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BiocPy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-16T20:26:10.000Z","updated_at":"2024-10-23T17:06:28.000Z","dependencies_parsed_at":"2023-08-17T02:06:46.042Z","dependency_job_id":"633003b3-76d9-4e19-af4f-b73fd2fbe24e","html_url":"https://github.com/BiocPy/DelayedArray","commit_stats":null,"previous_names":["biocpy/delayedarray"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FDelayedArray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FDelayedArray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FDelayedArray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FDelayedArray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BiocPy","download_url":"https://codeload.github.com/BiocPy/DelayedArray/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227101946,"owners_count":17731223,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delayedarray"],"created_at":"2024-11-29T10:43:17.047Z","updated_at":"2026-02-28T05:53:19.759Z","avatar_url":"https://github.com/BiocPy.png","language":"Python","readme":"\u003c!-- These are examples of badges you might want to add to your README:\n     please update the URLs accordingly\n\n[![Built Status](https://api.cirrus-ci.com/github/\u003cUSER\u003e/DelayedArray.svg?branch=main)](https://cirrus-ci.com/github/\u003cUSER\u003e/DelayedArray)\n[![ReadTheDocs](https://readthedocs.org/projects/DelayedArray/badge/?version=latest)](https://DelayedArray.readthedocs.io/en/stable/)\n[![Coveralls](https://img.shields.io/coveralls/github/\u003cUSER\u003e/DelayedArray/main.svg)](https://coveralls.io/r/\u003cUSER\u003e/DelayedArray)\n[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/DelayedArray.svg)](https://anaconda.org/conda-forge/DelayedArray)\n[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026label=Twitter)](https://twitter.com/DelayedArray)\n--\u003e\n\n[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)\n[![PyPI-Server](https://img.shields.io/pypi/v/DelayedArray.svg)](https://pypi.org/project/DelayedArray/)\n[![Monthly Downloads](https://pepy.tech/badge/DelayedArray/month)](https://pepy.tech/project/DelayedArray)\n![Unit tests](https://github.com/BiocPy/DelayedArray/actions/workflows/pypi-test.yml/badge.svg)\n\n# DelayedArrays, in Python\n\n## Introduction\n\nThis package implements classes for delayed array operations, mirroring the [Bioconductor package](https://bioconductor.org/packages/DelayedArray) of the same name.\nIt allows BiocPy-based packages to easily inteoperate with delayed arrays from the Bioconductor ecosystem,\nwith focus on serialization to/from file with [**chihaya**](https://github.com/ArtifactDB/chihaya)/[**rds2py**](https://github.com/BiocPy/rds2py)\nand entry into [**tatami**](https://github.com/tatami-inc/tatami)-compatible C++ libraries via [**mattress**](https://github.com/BiocPy/mattress).\n\n## Quick start\n\nThis package is published to [PyPI](https://pypi.org/project/delayedarray/) and can be installed via the usual methods:\n\n```shell\npip install delayedarray\n```\n\nWe can create a `DelayedArray` from any object that respects the seed contract,\ni.e., has the `shape`/`dtype` properties and supports NumPy slicing.\nFor example, a typical NumPy array qualifies:\n\n```python\nimport numpy\nx = numpy.random.rand(100, 20)\n```\n\nWe can wrap this in a `DelayedArray` class:\n\n```python\nimport delayedarray\nd = delayedarray.wrap(x)\n## \u003c100 x 20\u003e DelayedArray object of type 'float64'\n## [[0.58969193, 0.36342181, 0.03111773, ..., 0.72036247, 0.40297173,\n##   0.48654955],\n##  [0.96346008, 0.57956493, 0.24247029, ..., 0.49717933, 0.589535  ,\n##   0.22806832],\n##  [0.61699438, 0.02493104, 0.87487081, ..., 0.44039656, 0.13967301,\n##   0.57966883],\n##  ...,\n##  [0.91583856, 0.94079754, 0.47546576, ..., 0.46866948, 0.87952439,\n##   0.81316896],\n##  [0.68721591, 0.22789395, 0.51079888, ..., 0.86483248, 0.43933065,\n##   0.84304794],\n##  [0.47763457, 0.54973367, 0.01159327, ..., 0.47338943, 0.86443755,\n##   0.2047926 ]]\n```\n\nAnd then we can use it in a variety of operations.\nFor example, in genomics, a typical quality control task is to slice the matrix to remove uninteresting features (rows) or samples (columns):\n\n```python\nfiltered = d[1:100:2,1:8]\nfiltered.shape\n## (50, 7)\n```\n\nWe then divide by the total sum of each column to compute normalized values between samples.\n\n```python\ntotal = filtered.sum(axis=0)\nnormalized = filtered / total\nnormalized.dtype\n## dtype('float64')\n```\n\nAnd finally we compute a log-transformation to get some log-normalized values for visualization.\n\n```python\ntransformed = numpy.log1p(normalized)\ntransformed[1:5,:]\n## \u003c4 x 7\u003e DelayedArray object of type 'float64'\n## [[0.03202309, 0.03256592, 0.02281872, ..., 0.03193778, 0.01735653,\n##   0.02323571],\n##  [0.02668759, 0.0152978 , 0.03818753, ..., 0.00280113, 0.00737041,\n##   0.00852137],\n##  [0.02125275, 0.01473594, 0.01299548, ..., 0.03092256, 0.01225808,\n##   0.0030042 ],\n##  [0.02334768, 0.00499055, 0.01804982, ..., 0.00467121, 0.02921965,\n##   0.02118322]]\n```\n\nEach operation just returns a `DelayedArray` with an increasing stack of delayed operations, without evaluating anything or making any copies.\nCheck out the [documentation](https://biocpy.github.io/DelayedArray/) for more information.\n\n## Extracting data\n\n### Block processing\n\nA `DelayedArray` is typically used by iteratively extracting blocks into memory for further calculations.\nThis \"block processing\" strategy improves memory efficiency by only realizing the delayed operations for a subset of the data.\nFor example, to iterate over the rows with 100 MB blocks:\n\n```python\nblock_size = delayedarray.choose_block_size_for_1d_iteration(d, dimension=0, memory=1e8)\nblock_coords = [ None, range(d.shape[1]) ]\n\nfor start in range(0, d.shape[0], block_size):\n    end = min(d.shape[0], start + block_size)\n    block_coords[0] = range(start, end)\n    current = delayedarray.extract_dense_array(d, (*block_coords,))\n```\n\nEach call to `extract_dense_array()` yields a NumPy array containing the the specified rows and columns.\nIf the `DelayedArray` might contain masked values, a NumPy `MaskedArray` is returned instead;\nthis can be determined by checking whether `is_masked(d)` returns `True`.\n\nThe above iteration can be simplified with the `apply_over_dimension()` function, which handles the block coordinate calculations for us.\nWe could also use the `apply_over_blocks()` function to iterate over arbitrary block shapes, which may be more efficient if the best dimension for iteration is not known.\n\n```python\n# To iterate over a single dimension:\ndelayedarray.apply_over_dimension(\n    d,\n    dimension=0,\n    fun=some_user_supplied_function,\n    block_size=block_size,\n)\n\n# To iterate over arbitrary blocks.\ndelayedarray.apply_over_blocks(\n    d,\n    fun=another_user_supplied_function,\n    block_shape=(20, 100),\n)\n```\n\n### Handling sparse data\n\nIf the `DelayedArray` contains sparse data, `is_sparse(d)` will return `True`.\nThis allows callers to instead use the `extract_sparse_array()` function for block processing:\n\n```python\nif delayedarray.is_sparse(d):\n    current = delayedarray.extract_sparse_array(d, (*block_coords,))\n```\n\nThis returns a `SparseNdarray` consisting of a tree of sparse vectors for the specified block.\nUsers can retrieve the sparse vectors by inspecting the `contents` property of the `SparseNdarray`:\n\n- In the one-dimensional case, this is a tuple of two 1-dimensional NumPy arrays storing data about the non-zero elements.\n  The first array contains sorted indices while the secon array contains the associated values.\n  If `is_masked(d)` returns `True`, the values will be represented as NumPy `MaskedArray` objects.\n- For the two-dimensional case, this is a list of such tuples, with one tuple per column.\n  This is roughly analogous to a compressed sparse column matrix.\n  An entry of the list may also be `None`, indicating that no non-zero elements are present in that column.\n- For higher-dimensionals, the tree is a nested list of lists of tuples.\n  Each nesting level corresponds to a dimension; the outermost level contains elements of the last dimension,\n  the next nesting level contains elements of the second-last dimension, and so on,\n  with the indices in the tuple referring to the first dimension.\n  Any list element may be `None` indicating that the corresponding element of the dimension has no non-zero elements.\n- In all cases, it is possible for `contents` to be `None`, indicating that there are no non-zero elements in the entire array.\n\nThe `apply_over_*` functions can also be instructed to iteratively extract blocks as `SparseNdarray` objects.\nThis only occurs if the input array is sparse (as specified by `is_sparse`).\n\n```python\n# To iterate over a single dimension:\ndelayedarray.apply_over_dimension(\n    d,\n    dimension=0,\n    fun=some_user_supplied_function,\n    block_size=block_size,\n    allow_sparse=True,\n)\n```\n\n### Other coercions\n\nA `DelayedArray` can be converted to a (possibly masked) NumPy array with the `to_dense_array()` function.\nSimilarly, sparse `DelayedArray`s can be converted to `SparseNdarray`s with the `to_sparse_array()` function.\n\n```python\ndelayedarray.to_dense_array(d)\ndelayedarray.to_sparse_array(d)\n```\n\nUsers can easily convert a 2-dimensional `SparseNdarray` to some of the common SciPy sparse matrix classes downstream calculations.\n\n```python\ndelayedarray.to_scipy_sparse_matrix(current, \"csc\")\n```\n\nMore simply, users can just call `numpy.array()` to realize the delayed operations into a standard NumPy array for consumption.\nNote that this discards any masking information so should not be called if `is_masked()` returns `True`.\n\n```python\nsimple = numpy.array(n)\ntype(simple)\n## \u003cclass 'numpy.ndarray'\u003e\n```\n\nUsers can also call `delayedarray.create_dask_array()`, to obtain a **dask** array that contains the delayed operations:\n\n```python\n# Note: requires installation as 'delayedarray[dask]'.\ndasky = delayedarray.create_dask_array(n)\ntype(dasky)\n## \u003cclass 'dask.array.core.Array'\u003e\n```\n\n## Interoperability with other packages \n\nThe general idea is that `DelayedArray`s should be a drop-in replacement for NumPy arrays, at least for [BiocPy](https://github.com/BiocPy) applications.\nSo, for example, we can stuff the `DelayedArray` inside a `SummarizedExperiment`:\n\n```python\nimport summarizedexperiment as SE\nse = SE.SummarizedExperiment({ \"counts\": filtered, \"lognorm\": transformed })\nprint(se)\n## Class SummarizedExperiment with 50 features and 7 samples\n##   assays: ['counts', 'lognorm']\n##   features: []\n##   sample data: []\n```\n\nOne of the main goals of the **DelayedArray** package is to make it easier for Bioconductor developers to inspect the delayed operations.\n(See the [developer notes](https://biocpy.github.io/DelayedArray/developers.html) for some comments on **dask**.)\nFor example, we can pull out the \"seed\" object underlying our `DelayedArray` instance:\n\n```python\nn.seed\n## \u003cdelayedarray.Subset.Subset object at 0x11cfbe690\u003e\n```\n\nEach layer has its own specific attributes that define the operation, e.g.,\n\n```python\nn.seed.subset\n## (range(1, 5), range(0, 20))\n```\n\nRecursively drilling through the object will eventually reach the underlying array(s):\n\n```python\nn.seed.seed.seed.seed.seed\n## array([[0.78811524, 0.87684408, 0.56980128, ..., 0.92659988, 0.8716243 ,\n##         0.8855508 ],\n##        [0.96611119, 0.36928726, 0.30364589, ..., 0.14349135, 0.92921468,\n##         0.85097595],\n##        [0.98374144, 0.98197003, 0.18126507, ..., 0.5854122 , 0.48733974,\n##         0.90127042],\n##        ...,\n##        [0.05566008, 0.24581195, 0.4092705 , ..., 0.79169303, 0.36982844,\n##         0.59997214],\n##        [0.81744194, 0.78499666, 0.80940409, ..., 0.65706498, 0.16220355,\n##         0.46912681],\n##        [0.41896894, 0.58066043, 0.57069833, ..., 0.61640286, 0.47174326,\n##         0.7149704 ]])\n```\n\nAll attributes required to reconstruct a delayed operation are public and considered part of the stable `DelayedArray` interface.\n\n## Developing seeds\n\nAny array-like object can be used as a \"seed\" in a `DelayedArray` provided it has the following:\n\n- `dtype` and `shape` properties, like those in NumPy arrays.\n- a method for the `extract_dense_array()` generic.\n- a method for the `is_masked()` generic.\n- a method for the `chunk_grid()` generic.\n\nIf the object may contain sparse data, it should also implement:\n\n- a method for the `is_sparse()` generic.\n- a method for the `extract_sparse_generic()` generic.\n\nIt may also be desirable to implement:\n\n- a method for the `create_dask_array()` generic.\n- a method for the `wrap()` generic.\n\nDevelopers are referred to the [documentation for each generic](https://biocpy.github.io/DelayedArray/api/delayedarray.html) for more details.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocpy%2Fdelayedarray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiocpy%2Fdelayedarray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocpy%2Fdelayedarray/lists"}