https://github.com/cubed-dev/cubed-xarray
Interface for using cubed with xarray
https://github.com/cubed-dev/cubed-xarray
Last synced: 11 months ago
JSON representation
Interface for using cubed with xarray
- Host: GitHub
- URL: https://github.com/cubed-dev/cubed-xarray
- Owner: cubed-dev
- License: apache-2.0
- Created: 2023-03-24T15:38:04.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-30T08:25:23.000Z (about 2 years ago)
- Last Synced: 2024-06-11T16:54:04.215Z (about 2 years ago)
- Language: Python
- Size: 32.2 KB
- Stars: 18
- Watchers: 9
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Note: this is a proof-of-concept, and many things are incomplete, untested, or don't work.
# cubed-xarray
Interface for using [cubed](https://github.com/cubed-dev/cubed) with [xarray](https://github.com/pydata/xarray).
## Requirements
- Cubed version >=0.17.0
- Xarray version >=2024.09.0
## Installation
Install via pip
`pip install cubed-xarray`
or conda
`conda install -c conda-forge cubed-xarray`
## Importing
You don't need to import this package in user code. Once poperly installed, xarray should automatically become aware of this package via the magic of entrypoints.
## Usage
Xarray objects backed by cubed arrays can be created either by:
1. Passing existing `cubed.Array` objects to the `data` argument of xarray constructors,
2. Calling `.chunk` on xarray objects,
3. Passing a `chunks` argument to `xarray.open_dataset`.
In (2) and (3) the choice to use `cubed.Array` instead of `dask.array.Array` is made by passing the keyword argument `chunked_array_type='cubed'`.
To pass arguments to the constructor of `cubed.Array` you should pass them via the dictionary `from_array_kwargs`, e.g. `from_array_kwargs={'spec': cubed.Spec(allowed_mem='2GB')}`.
If cubed and cubed-xarray are installed but dask is not, then specifying `chunked_array_type` is not necessary,
as the entrypoints system will then default to the only chunked parallel backend available (i.e. cubed).
## Sharp Edges 🔪
Some things almost certainly won't work yet:
- Certain operations called in xarray but not implemented in cubed, for instance `pad` (see https://github.com/tomwhite/cubed/issues/193)
- Array operations involving NaNs - for now use `skipna=True` to avoid eager loading (see https://github.com/pydata/xarray/issues/7243)
- Using `parallel=True` with `xr.open_mfdataset` won't work because cubed doesn't implement a version of `dask.Delayed` (see https://github.com/pydata/xarray/issues/7810)
- Groupby (see https://github.com/tomwhite/cubed/issues/223 and https://github.com/xarray-contrib/flox/issues/224)
- `xarray.map_blocks` does not actually dispatch to `cubed.map_blocks` yet, and will always use Dask.
- Certain operations using `cumreduction` (e.g. `ffill` and `bfill`) are [not hooked up to the `ChunkManager` yet](https://github.com/tomwhite/cubed/issues/277#issuecomment-1648567431), so will attempt to call dask.
and some other things _might_ work but have not yet been tried:
- Saving to formats other than zarr
In general a bug could take the form of an error, or of a silent attempt to coerce the array type to numpy by immediately computing the underlying array.
## Tests
Integration tests for wrapping cubed with xarray also live in this repository.