An open API service indexing awesome lists of open source software.

https://github.com/kalisio/kazarr

Lightweight service for reading and transforming Zarr data
https://github.com/kalisio/kazarr

access processing python3 zarr

Last synced: 5 months ago
JSON representation

Lightweight service for reading and transforming Zarr data

Awesome Lists containing this project

README

          

# kazarr

A lightweight **FastAPI** service that exposes endpoints to interact with **Zarr datasets** stored in a **Simple Storage Service (S3)**:

- a **datasets** endpoint to explore available multi-dimensional arrays,
- an **extraction** endpoint to slice and dice data,
- a **probe** endpoint to query specific values at given coordinates,
- an **isoline** endpoint to compute contour lines dynamically.

## API

> [!TIP]
> You can find auto-generated documentation about API at endpoints `/docs` or `/redoc`

### /health (GET)

Check for service's health, return a json object with a single member `status`.

### /datasets (GET)

Return a list of all available Zarr datasets with their id and description.

### /datasets/{dataset} (GET)

Return metadata (dimensions, variables, attributes) for a specific Zarr dataset.
The `dataset` parameter is expected to be the dataset id, that can be found with the previous endpoint.

### /datasets/{dataset}/extract (GET)

Extracts a subset of the data based on a bounding box and a specific variable.

> [!WARNING]
> Large extractions may impact performance. Be mindful of the bounding box size for high-resolution datasets.

The `extract` endpoint accepts the following query parameters:

| Name | Description | Optional |
|------------|----------------------------------------------|:--------:|
| `variable` | The variable to extract. | ✗ |
| `lon_min` | Minimum longitude of the bounding box. | ✓ |
| `lat_min` | Minimum latitude of the bounding box. | ✓ |
| `lon_max` | Maximum longitude of the bounding box. | ✓ |
| `lat_max` | Maximum latitude of the bounding box. | ✓ |
| `time` | The time value/slice to extract. | ✓ |

> [!IMPORTANT]
> You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with `&my_additional_variable={VALUE}`

### /datasets/{dataset}/probe (GET)

Retrieves the values of specified variables at a specific geographical location (point query).

The `probe` endpoint accepts the following query parameters:

| Name | Description | Optional |
|-------------|----------------------------------------------|:--------:|
| `variables` | The list of variables to probe. | ✗ |
| `lon` | The longitude coordinate to probe. | ✗ |
| `lat` | The latitude coordinate to probe. | ✗ |
| `height` | The height coordinate to probe (if 3D data). | ✓ |

> [!IMPORTANT]
> You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with `&my_additional_variable={VALUE}`

> [!TIP]
> You can request multiple variables at once by repeating the `variables` parameter in the query string (e.g., `?variables=temp&variables=wind`).

### /datasets/{dataset}/isoline (GET)

Computes isolines (contour lines) for a given variable and specific levels.

The `isoline` endpoint accepts the following query parameters:

| Name | Description | Optional |
|------------|---------------------------------------------------------------|:--------:|
| `variable` | The variable to generate isolines for. | ✗ |
| `levels` | Comma-separated list of levels for isoline generation. | ✗ |
| `time` | The time value to use for isoline generation. | ✓ |

> [!IMPORTANT]
> You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with `&my_additional_variable={VALUE}`

## Configuring

### Environment variables

| Variable | Description | Default value |
|-----------------------|----------------------------------------------------------|---------------|
| PORT | The port to be used when exposing the service | 8000 |
| HOSTNAME | The hostname to be used when exposing the service | localhost |
| AWS_ACCESS_KEY_ID | Access key ID of the S3 in which zarr data is stored | |
| AWS_SECRET_ACCESS_KEY | Secret access key of the S3 in which zarr data is stored | |
| AWS_REGION | Region of the S3 in which zarr data is stored | |
| AWS_ENDPOINT_URL | Endpoint URL of the S3 in which zarr data is stored | |
| BUCKET_NAME | The name of the bucket in which zarr data is stored | |

## Usage

### Manual build

You can build the image with the following command:

```bash
docker build -t .
```

And then start the service with:

```bash
docker run -p 8000:8000
```

### Run locally

You will need to install multiple Python packages to run this app.
To simplify, you can install Anaconda and run these commands :

```bash
conda create -y -n kazarr_env python=3.11
```

```bash
conda install -y -n kazarr_env -c conda-forge \
fastapi \
uvicorn \
xarray \
zarr \
cfgrib \
numpy \
pyproj \
dask \
s3fs \
matplotlib \
pyvista
```

```bash
conda activate kazarr_env
```

```bash
python main.py start-api
```

## Contributing

Please read the [Contributing file](https://github.com/kalisio/k2/blob/master/.github/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.

## Versioning

We use [SemVer](https://semver.org/) for versioning. For the versions available, see the tags on this repository.

## Authors

This project is sponsored by

![Kalisio](https://s3.eu-central-1.amazonaws.com/kalisioscope/kalisio/kalisio-logo-black-256x84.png)

## License

This project is licensed under the MIT License - see the [license file](./LICENSE.md) for details.