https://github.com/thewtex/shardedstore
Provides a sharded Zarr store
https://github.com/thewtex/shardedstore
python sharding zarr
Last synced: 6 months ago
JSON representation
Provides a sharded Zarr store
- Host: GitHub
- URL: https://github.com/thewtex/shardedstore
- Owner: thewtex
- License: apache-2.0
- Created: 2022-05-07T02:44:05.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-08-31T22:14:46.000Z (about 3 years ago)
- Last Synced: 2025-04-12T04:57:45.772Z (6 months ago)
- Topics: python, sharding, zarr
- Language: Python
- Homepage:
- Size: 34.2 KB
- Stars: 4
- Watchers: 4
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# shardedstore
[](https://pypi.python.org/pypi/shardedstore/)
[](https://github.com/thewtex/shardedstore/actions/workflows/test.yml)
[](https://zenodo.org/badge/latestdoi/489549406)Provides a sharded Zarr store.
## Features
- For large Zarr stores, avoid an excessive number of objects or extremely large objects, which bypasses filesystem inode usage and object store limitations.
- Performance-sensitive implementation.
- Use existing Zarr v2 stores.
- Mix and match shard store types.
- Serialize and deserialize the ShardedStore in JSON.
- Shard groups or array chunks.
- Easily run transformations on store shards.## Installation
```sh
pip install shardedstore
```## Example
```python
from shardedstore import ShardedStore, array_shard_directory_store, to_zip_store_with_prefixfrom zarr.storage import DirectoryStore
# xarray example, but works with zarr in general
import xarray as xr
from datatree import DataTree, open_datatree
import json
import numpy as np
import os
```### Create component shard stores
```python
base_store = DirectoryStore("base.zarr")
shard1 = DirectoryStore("shard1.zarr")
shard2 = DirectoryStore("shard2.zarr")
array_shards1 = array_shard_directory_store("array_shards1")
array_shards2 = array_shard_directory_store("array_shards2")
```### Generate data for the example
```python
# xarray-datatree Quick Overview
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
# Sharded array dimensions must have a chunk shape of 1.
data = data.chunk([1,2])
ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi))
ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]})
ds2 = ds2.chunk({'x':1, 'y':2})
ds3 = xr.Dataset(
dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])),
coords={"species": "human"},
)
dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3})
```### A monolithic store
```python
single_store = DirectoryStore("single.zarr")
dt.to_zarr(single_store)
```
### A sharded store demonstrating sharding on groups and arrays.Arrays are sharded over 1 dimension.
```python
sharded_store = ShardedStore(base_store,
{'people': shard1, 'species': shard2},
{'simulation/coarse/foo': (1, array_shards1), 'simulation/fine/foo': (1, array_shards2)})
dt.to_zarr(sharded_store)
```### Serialize / deserialize
```python
config = sharded_store.get_config()
config_str = json.dumps(config)
config = json.loads(config_str)
sharded_store = ShardedStore.from_config(config)
```### Validate
```python
from_single = open_datatree(single_store, engine='zarr').compute()
from_sharded = open_datatree(sharded_store, engine='zarr').compute()
assert from_single.identical(from_sharded)
```### Run transformations over component shards with `map_shards`
```python
to_zip_stores = to_zip_store_with_prefix("zip_stores")
zip_sharded_stores = sharded_store.map_shards(to_zip_stores)
```## Development
Contributions are welcome and appreciated.
```
git clone https://github.com/thewtex/shardedstore
cd shardedstore
pip install -e ".[test]"
pytest
```