https://github.com/cubed-dev/cubed-xarray

Interface for using cubed with xarray
https://github.com/cubed-dev/cubed-xarray

Last synced: 11 months ago
JSON representation

Interface for using cubed with xarray

Host: GitHub
URL: https://github.com/cubed-dev/cubed-xarray
Owner: cubed-dev
License: apache-2.0
Created: 2023-03-24T15:38:04.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-04-30T08:25:23.000Z (about 2 years ago)
Last Synced: 2024-06-11T16:54:04.215Z (about 2 years ago)
Language: Python
Size: 32.2 KB
Stars: 18
Watchers: 9
Forks: 1
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          Note: this is a proof-of-concept, and many things are incomplete, untested, or don't work.

# cubed-xarray

Interface for using [cubed](https://github.com/cubed-dev/cubed) with [xarray](https://github.com/pydata/xarray).

## Requirements

- Cubed version >=0.17.0

- Xarray version >=2024.09.0

## Installation

Install via pip

`pip install cubed-xarray`

or conda

`conda install -c conda-forge cubed-xarray`

## Importing

You don't need to import this package in user code. Once poperly installed, xarray should automatically become aware of this package via the magic of entrypoints.

## Usage

Xarray objects backed by cubed arrays can be created either by:

1. Passing existing `cubed.Array` objects to the `data` argument of xarray constructors,

2. Calling `.chunk` on xarray objects,

3. Passing a `chunks` argument to `xarray.open_dataset`.

In (2) and (3) the choice to use `cubed.Array` instead of `dask.array.Array` is made by passing the keyword argument `chunked_array_type='cubed'`.

To pass arguments to the constructor of `cubed.Array` you should pass them via the dictionary `from_array_kwargs`, e.g. `from_array_kwargs={'spec': cubed.Spec(allowed_mem='2GB')}`.

If cubed and cubed-xarray are installed but dask is not, then specifying `chunked_array_type` is not necessary,

as the entrypoints system will then default to the only chunked parallel backend available (i.e. cubed).

## Sharp Edges 🔪

Some things almost certainly won't work yet:

- Certain operations called in xarray but not implemented in cubed, for instance `pad` (see https://github.com/tomwhite/cubed/issues/193)

- Array operations involving NaNs - for now use `skipna=True` to avoid eager loading (see https://github.com/pydata/xarray/issues/7243)

- Using `parallel=True` with `xr.open_mfdataset` won't work because cubed doesn't implement a version of `dask.Delayed` (see https://github.com/pydata/xarray/issues/7810)

- Groupby (see https://github.com/tomwhite/cubed/issues/223 and https://github.com/xarray-contrib/flox/issues/224)

- `xarray.map_blocks` does not actually dispatch to `cubed.map_blocks` yet, and will always use Dask.

- Certain operations using `cumreduction` (e.g. `ffill` and `bfill`) are [not hooked up to the `ChunkManager` yet](https://github.com/tomwhite/cubed/issues/277#issuecomment-1648567431), so will attempt to call dask.

and some other things _might_ work but have not yet been tried:

- Saving to formats other than zarr

In general a bug could take the form of an error, or of a silent attempt to coerce the array type to numpy by immediately computing the underlying array.

## Tests

Integration tests for wrapping cubed with xarray also live in this repository.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cubed-dev/cubed-xarray

Awesome Lists containing this project

README