https://github.com/dhi/datacatalogue
Leveraging Zarr store and retrieve data e.g. from simulations with metadata
https://github.com/dhi/datacatalogue
Last synced: 2 months ago
JSON representation
Leveraging Zarr store and retrieve data e.g. from simulations with metadata
- Host: GitHub
- URL: https://github.com/dhi/datacatalogue
- Owner: DHI
- Created: 2024-12-10T15:48:30.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-03-28T14:26:03.000Z (2 months ago)
- Last Synced: 2025-03-28T15:29:57.447Z (2 months ago)
- Language: Python
- Size: 72.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# zarrcatalogue
A Python package for converting and managing model results (e.g. flexible mesh from MIKE) using Zarr storage format.
## repository structure
```
zarrcatalogue/
├── src/
│ ├── zarrcatalogue/
│ │ ├── __init__.py
│ │ ├── converters/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ └── mike.py
│ │ ├── catalog.py
│ │ └── manager.py
│ ├── tests/
│ │ ├── __init__.py
│ │ └── test_mike_converter.py
│ └── setup.py
├── notebooks/
│ ├── 01_basic_conversion.ipynb
└── README.mddata/
data_zarr/ # only temporary for testing of conversion
catalog/ # zarr data along with json catalogue
```## Features
### MIKE Model Converter
Currently supports conversion of MIKE flexible mesh files (dfsu) to Zarr format, handling:
- 2D and 3D flexible mesh geometries
- Mixed element types (triangular and quadrilateral elements)
- Multiple variables and time steps
- Mesh topology and element data
- Comprehensive metadata storage#### Data Structure
The converted Zarr store follows this structure:
```
simulation.zarr/
├── data/
│ ├── variable1 # (n_timesteps, n_elements) array
│ ├── variable2 # (n_timesteps, n_elements) array
│ └── time # (n_timesteps,) array of timestamps
└── topology/
├── nodes # (n_nodes, 3) node coordinates
├── elements # (n_elements, max_nodes) connectivity
└── element_coordinates # (n_elements, 3) element centers
```#### Metadata
Stores comprehensive metadata including:
- Model information (type, version)
- Geometry details (nodes, elements, projection)
- Mesh characteristics (element types, counts)
- Time information (start, end, timestep)
- Variable attributes (units, descriptions)
- Conversion details (timestamp, software versions)#### Performance Features
- Configurable chunking for efficient data access
- Compression options for reduced storage
- Optimized for both temporal and spatial queries## Usage
Basic conversion:
```python
from zarrcatalogue.converters.mike import MIKEConverter
from pathlib import Path# Initialize converter
converter = MIKEConverter()# Convert MIKE dfsu file to Zarr
metadata = converter.to_zarr(
input_file=Path("simulation.dfsu"),
zarr_path=Path("output.zarr"),
chunks={'time': 100, 'elements': 1000},
compression_level=5
)# Validate conversion
validation = converter.validate_conversion(
original_ds="simulation.dfsu",
zarr_path=Path("output.zarr")
)
```## Requirements
* Python 3.x
* mikeio >= 2.2.0
* zarr
* numpy## Current Limitations
Currently supports MIKE dfsu files only
Element types limited to triangles and quadrilaterals## Future Development
### Planned features:
* conversion back zarr2mike (works for 2d already)
* embedded data storage as alternative to raw data. Should this be in zarr or as a new file format?
* Advanced querying and filtering
* Statistics from Zarr
* leverage mikeio plotting (e.g. https://holoviews.org/user_guide/Geometry_Data.html) after conversion back* Export capabilities to other formats
* Support for additional MIKE file formats (MIKE SHE, FEFLOW)