{"id":37502479,"url":"https://github.com/wayscience/ome-arrow","last_synced_at":"2026-04-08T22:01:51.061Z","repository":{"id":322714881,"uuid":"1088979629","full_name":"WayScience/ome-arrow","owner":"WayScience","description":"Using OME specifications with Apache Arrow for fast, queryable, and language agnostic bioimage data.","archived":false,"fork":false,"pushed_at":"2026-04-08T16:36:20.000Z","size":39155,"stargazers_count":5,"open_issues_count":5,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-08T18:30:39.575Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://wayscience.github.io/ome-arrow/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WayScience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-03T18:07:19.000Z","updated_at":"2026-04-08T16:35:28.000Z","dependencies_parsed_at":"2026-01-16T12:00:33.309Z","dependency_job_id":null,"html_url":"https://github.com/WayScience/ome-arrow","commit_stats":null,"previous_names":["d33bs/ome-arrow","wayscience/ome-arrow"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/WayScience/ome-arrow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayScience%2Fome-arrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayScience%2Fome-arrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayScience%2Fome-arrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayScience%2Fome-arrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WayScience","download_url":"https://codeload.github.com/WayScience/ome-arrow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayScience%2Fome-arrow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31575755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-16T07:51:18.954Z","updated_at":"2026-04-08T22:01:51.056Z","avatar_url":"https://github.com/WayScience.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg width=\"600\" src=\"https://raw.githubusercontent.com/wayscience/ome-arrow/main/docs/src/_static/logo.png?raw=true\"\u003e\n\n![PyPI - Version](https://img.shields.io/pypi/v/ome-arrow)\n[![Build Status](https://github.com/wayscience/ome-arrow/actions/workflows/run-tests.yml/badge.svg?branch=main)](https://github.com/wayscience/ome-arrow/actions/workflows/run-tests.yml?query=branch%3Amain)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![Software DOI badge](https://zenodo.org/badge/DOI/10.5281/zenodo.17664969.svg)](https://doi.org/10.5281/zenodo.17664969)\n\n# Open, interoperable, and queryable microscopy images with OME Arrow\n\nOME-Arrow uses [Open Microscopy Environment (OME)](https://github.com/ome) specifications through [Apache Arrow](https://arrow.apache.org/) for fast, queryable, and language agnostic bioimage data.\n\n\u003cimg height=\"200\" src=\"https://raw.githubusercontent.com/wayscience/ome-arrow/main/docs/src/_static/references_to_files.png\"\u003e\n\n__Images are often left behind from the data model, referenced but excluded from databases.__\n\n\u003cimg height=\"200\" src=\"https://raw.githubusercontent.com/wayscience/ome-arrow/main/docs/src/_static/various_ome_arrow_schema.png\"\u003e\n\n__OME-Arrow brings images back into the story.__\n\nOME Arrow enables image data to be stored alongside metadata or derived data such as single-cell morphology features.\nImages in OME Arrow are composed of mutlilayer [structs](https://arrow.apache.org/docs/python/generated/pyarrow.struct.html) so they may be stored as values within tables.\nThis means you can store, query, and build relationships on data from the same location using any system which is compatible with Apache Arrow (including Parquet) through common data interfaces (such as SQL and DuckDB).\n\n## Project focus\n\nThis package is intentionally dedicated to work at a per-image level and not large batch handling (though it may be used for those purposes by users or in other projects).\n\n- For visualizing OME Arrow and OME Parquet data in Napari, please see the [`napari-ome-arrow`](https://github.com/WayScience/napari-ome-arrow) Napari plugin.\n- For more comprehensive handling of many images and features in the context of the OME Parquet format please see the [`CytoDataFrame`](https://github.com/cytomining/CytoDataFrame) project (and relevant [example notebook](https://github.com/cytomining/CytoDataFrame/blob/main/docs/src/examples/cytodataframe_at_a_glance.ipynb)).\n\n## Installation\n\nInstall OME Arrow from PyPI or from source:\n\n```sh\n# install from pypi\npip install ome-arrow\n\n# install directly from source\npip install git+https://github.com/wayscience/ome-arrow.git\n```\n\n## Quick start\n\nSee below for a quick start guide.\nPlease also reference an example notebook: [Learning to fly with OME-Arrow](https://github.com/wayscience/ome-arrow/tree/main/docs/src/examples/learning_to_fly_with_ome-arrow.ipynb).\n\n```python\nfrom ome_arrow import OMEArrow\n\n# Ingest a tif image through a convenient OME Arrow class\n# We can also ingest OME-Zarr or NumPy arrays.\noa_image = OMEArrow(\n    data=\"your_image.tif\"\n)\n\n# Access the OME Arrow struct itself\n# (compatible with Arrow-compliant data storage).\noa_image.data\n\n# Show information about the image.\noa_image.info()\n\n# Display the image with matplotlib.\noa_image.view(how=\"matplotlib\")\n\n# Display the image with pyvista\n# (great for ZYX 3D images; install extras: `pip install 'ome-arrow[viz]'`).\noa_image.view(how=\"pyvista\")\n\n# Export to OME-Parquet.\n# We can also export OME-TIFF, OME-Zarr or NumPy arrays.\noa_image.export(how=\"ome-parquet\", out=\"your_image.ome.parquet\")\n\n# Export to Vortex (install extras: `pip install 'ome-arrow[vortex]'`).\noa_image.export(how=\"vortex\", out=\"your_image.vortex\")\n```\n\n## Tensor view (DLPack)\n\nFor tensor-focused workflows (PyTorch/JAX), use `tensor_view` and DLPack export.\n\n```python\nfrom ome_arrow import OMEArrow\n\noa = OMEArrow(\"your_image.ome.parquet\")\n\n# Spatial ROI per plane (YX convention)\nview = oa.tensor_view(t=0, z=0, roi=(32, 32, 128, 128), layout=\"CYX\")\n\n# Convenience 3D ROI (x, y, z, w, h, d)\nview3d = oa.tensor_view(roi3d=(32, 32, 2, 128, 128, 4), layout=\"TZCYX\")\n\n# 3D tiled iteration over (z, y, x)\nfor cap in view3d.iter_tiles_3d(tile_size=(2, 64, 64), mode=\"numpy\"):\n    pass\n```\n\nLazy scan-style convention (Polars-like):\n\n```python\nfrom ome_arrow import OMEArrow\n\noa = OMEArrow.scan(\"your_image.ome.parquet\")  # deferred load\n# First: queue lazy spatial/index slicing\nlazy_crop = oa.slice_lazy(0, 512, 0, 512).slice_lazy(64, 256, 64, 256)\ncropped = lazy_crop.collect()\n\n# slice_lazy returns a new OMEArrow plan; collect does not mutate `oa`.\n# Build tensor_view from the returned sliced object to reuse that plan.\ntensor_view_result = cropped.tensor_view(t=0, z=slice(0, 4), roi=(0, 0, 192, 192))\narr = tensor_view_result.to_numpy()\n```\n\nAdvanced options:\n\n- `chunk_policy=\"auto\" | \"combine\" | \"keep\"` controls ChunkedArray handling.\n- `channel_policy=\"error\" | \"first\"` controls behavior when dropping `C` from layout.\n\nSee full docs: [`docs/src/dlpack.md`](docs/src/dlpack.md)\n\n## Tensor ingest (PyTorch/JAX)\n\nYou can ingest torch or JAX arrays directly with `OMEArrow(...)`.\nYou can also use explicit helper functions from `ome_arrow.ingest`.\n\nWhy this is useful:\n\n- It reduces compute overhead by removing conversion code boilerplate in separate model/data pipelines that already use torch or JAX tensors (i.e., it provides a direct port of OME-arrow into popular deep learning libraries).\n- However, this is more about clean interoperability than dramatic end-to-end speedups (although we expect fewer handoffs to result in speedups). Specifically:\n- It makes it easier for a user to update dimension ordering input in the same place without requiring separate functionality (see argument `dim_order`).\n- This smooths handoffs and reduces mistakes when moving between tensor layouts and OME-Arrow records. For example, CPU torch tensors often expose a NumPy view without an extra copy.\n- Ingest still materializes OME-Arrow planes/chunks.\n\n```python\nfrom ome_arrow import OMEArrow\n\n# Direct constructor support:\n# inferred defaults are rank-based:\n# 2D -\u003e \"YX\", 3D -\u003e \"ZYX\", 4D -\u003e \"TCYX\", 5D -\u003e \"TCZYX\"\noa_torch = OMEArrow(torch_tensor)\noa_jax = OMEArrow(jax_array)\n\n# Optional: override dim order when shape is ambiguous\noa_zyx = OMEArrow(torch_volume, dim_order=\"ZYX\")\n```\n\n```python\nfrom ome_arrow.ingest import from_torch_array, from_jax_array\n\nscalar_torch = from_torch_array(torch_tensor, dim_order=\"TCYX\")\nscalar_jax = from_jax_array(jax_array, dim_order=\"TCYX\")\n```\n\nNotes:\n\n- Torch/JAX support is optional.\n- Install extras as needed:\n  `pip install \"ome-arrow[dlpack-torch]\"` or `pip install \"ome-arrow[dlpack-jax]\"`.\n- Torch tensors are detached and converted on CPU for ingest.\n- `dim_order` is accepted only for NumPy/torch/JAX array inputs.\n- Ingest now passes flattened NumPy pixel buffers directly to Arrow.\n- This avoids materializing Python `list` payloads per plane/chunk.\n\n## Benchmarking lazy reads\n\nUse the lightweight benchmark utility in `benchmarks/` to compare lazy tensor\nread paths (TIFF source-backed, Parquet planes, Parquet chunks):\n\n```bash\nuv run python benchmarks/benchmark_lazy_tensor.py --repeats 5 --warmup 1\n```\n\nNotes:\n\n- This benchmark is for local iteration and relative comparisons.\n- It is not part of CI pass/fail checks.\n- CI also runs this benchmark in a dedicated `benchmark_canary` job and\n  uploads `benchmark-results.json` as a workflow artifact.\n\nRecalibrating `benchmarks/ci-baseline.json`:\n\n1. Run the benchmark on `main` a few times (for example 3-5 runs):\n   `uv run python benchmarks/benchmark_lazy_tensor.py --repeats 7 --warmup 2 --json-out benchmark-results.json`\n1. For each case, collect the observed `median_ms` values.\n1. Update `benchmarks/ci-baseline.json` with stable medians from those runs\n   (prefer a conservative value near the slower side, not the fastest sample).\n1. Keep CI canary tolerance (`regression_factor` + `absolute_slack_ms`) unchanged\n   unless you have repeated false positives.\n\n## Contributing, Development, and Testing\n\nPlease see our [contributing documentation](https://github.com/wayscience/ome-arrow/tree/main/CONTRIBUTING.md) for more details on contributions, development, and testing.\n\n## Related projects\n\nOME Arrow is used or inspired by the following projects, check them out!\n\n- [`napari-ome-arrow`](https://github.com/WayScience/napari-ome-arrow): enables you to view OME Arrow and related images.\n- [`nViz`](https://github.com/WayScience/nViz): focuses on ingesting and visualizing various 3D image data.\n- [`CytoDataFrame`](https://github.com/cytomining/CytoDataFrame): provides a DataFrame-like experience for viewing feature and microscopy image data within Jupyter notebook interfaces and creating OME Parquet files.\n- [`coSMicQC`](https://github.com/cytomining/coSMicQC): performs quality control on microscopy feature datasets, visualized using CytoDataFrames.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwayscience%2Fome-arrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwayscience%2Fome-arrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwayscience%2Fome-arrow/lists"}