https://github.com/spencerkclark/fv3dataset
A tool for interacting with history files output from SHiELD run scripts
https://github.com/spencerkclark/fv3dataset
Last synced: 3 months ago
JSON representation
A tool for interacting with history files output from SHiELD run scripts
- Host: GitHub
- URL: https://github.com/spencerkclark/fv3dataset
- Owner: spencerkclark
- Created: 2021-04-05T14:53:07.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-09-19T00:59:22.000Z (over 1 year ago)
- Last Synced: 2025-01-23T19:17:20.538Z (5 months ago)
- Language: Python
- Size: 34.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fv3dataset
[](https://github.com/spencerkclark/fv3dataset/actions/workflows/tests.yaml)
This is a library meant for interacting with data produced by running SHiELD.
SHiELD outputs can sometimes be complicated to deal with, because outputs for
specific variables are distributed across multiple tile and/or subtile files,
and across multiple time segments. Therefore accessing the full spatial time
series of a variable involves accessing many separate files.Xarray is capable of lazily opening all of these files -- for their metadata
only -- and combining them into a single coherent data structure that makes
things look like you opened them from a single netCDF file. Here we make this
especially convenient for outputs from SHiELD.## Example
The basic usage of this package is the following. All you need to provide to
create an `fv3dataset.FV3Dataset` object is a path to the root directory of the
output of a simulation:```
>>> import fv3dataset
>>> root = "/lustre/f2/scratch/Spencer.Clark/SHiELD/20200120.00Z.C48.L79z12.2021-04-04"
>>> fv3ds = fv3dataset.FV3Dataset(root)
```If there are any sets of history files that have been
`mppnccombine`'d for all segments of the run, `fv3dataset` will recognize it,
and combine those sets into lazy datasets:```
>>> datasets = fv3ds.datasets
>>> list(datasets.keys())
['demo_coarse_inst', 'grid_spec_coarse', 'demo_ave', 'demo_inst', 'demo_coarse_ave']
>>> datasets["demo_ave"]Dimensions: (grid_xt: 48, grid_yt: 48, nv: 2, pfull: 79, phalf: 80, tile: 6, time: 4)
Coordinates:
* time (time) object 2020-01-20 06:00:00 ... 2020-01-21 18:00:00
* tile (tile) int64 0 1 2 3 4 5
* grid_xt (grid_xt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* grid_yt (grid_yt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* pfull (pfull) float64 4.514 8.301 12.45 16.74 ... 989.5 994.3 998.3
* nv (nv) float64 1.0 2.0
* phalf (phalf) float64 3.0 6.467 10.45 14.69 ... 992.2 996.5 1e+03
Data variables:
z200 (tile, time, grid_yt, grid_xt) float32 dask.array
ucomp (tile, time, pfull, grid_yt, grid_xt) float32 dask.array
average_T1 (time) datetime64[ns] 2020-01-20 ... 2020-01-21T12:00:00
average_T2 (time) datetime64[ns] 2020-01-20T12:00:00 ... 2020-01-22
average_DT (time) timedelta64[ns] 12:00:00 12:00:00 12:00:00 12:00:00
time_bnds (time, nv) timedelta64[ns] 0 days 00:00:00 ... 2 days 00:00:00
```Note the first call to `fv3ds.datasets` may take a couple seconds -- it takes
some time to go through and open all the files -- but the result will be
cached, so future accesses will be fast. For analysis on Gaea or PP/AN, this
view of the data may be enough; however, if you would like to convert the
dataset to a zarr store, one way to do so would be to simply use xarray's built
in `to_zarr` method:```
>>> datasets["demo_ave"].to_zarr("/path/to/store/demo_ave.zarr")
```Note for large datasets there are more efficient ways to do this in a
distributed fashion. For that, see
[`xpartition`](https://github.com/spencerkclark/xpartition).On PP/AN, with the tape archive, you may not want to access every tape in a
dataset at a time. Instead you might want to read in data from a single tape.
For this you can use the `FV3Dataset.tape_to_dask` method:```
>>> fv3ds.tape_to_dask("demo_ave")Dimensions: (grid_xt: 48, grid_yt: 48, nv: 2, pfull: 79, phalf: 80, tile: 6, time: 4)
Coordinates:
* time (time) object 2020-01-20 06:00:00 ... 2020-01-21 18:00:00
* tile (tile) int64 0 1 2 3 4 5
* grid_xt (grid_xt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* grid_yt (grid_yt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* pfull (pfull) float64 4.514 8.301 12.45 16.74 ... 989.5 994.3 998.3
* nv (nv) float64 1.0 2.0
* phalf (phalf) float64 3.0 6.467 10.45 14.69 ... 992.2 996.5 1e+03
Data variables:
z200 (tile, time, grid_yt, grid_xt) float32 dask.array
ucomp (tile, time, pfull, grid_yt, grid_xt) float32 dask.array
average_T1 (time) datetime64[ns] 2020-01-20 ... 2020-01-21T12:00:00
average_T2 (time) datetime64[ns] 2020-01-20T12:00:00 ... 2020-01-22
average_DT (time) timedelta64[ns] 12:00:00 12:00:00 12:00:00 12:00:00
time_bnds (time, nv) timedelta64[ns] 0 days 00:00:00 ... 2 days 00:00:00
```## Chunk sizes
By default, datasets will be chunked with a target chunk size of 128 MB each.
This can be configured in the `FV3Dataset` constructor using the
`target_chunk_size` argument:```
>>> fv3ds = fv3dataset.FV3Dataset(root, "10Mi")
>>> fv3ds.tape_to_dask("demo_ave")Dimensions: (grid_xt: 48, grid_yt: 48, nv: 2, pfull: 79, phalf: 80, tile: 6, time: 4)
Coordinates:
* time (time) object 2020-01-20 06:00:00 ... 2020-01-21 18:00:00
* tile (tile) int64 0 1 2 3 4 5
* grid_xt (grid_xt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* grid_yt (grid_yt) float64 1.0 2.0 3.0 4.0 5.0 ... 45.0 46.0 47.0 48.0
* pfull (pfull) float64 4.514 8.301 12.45 16.74 ... 989.5 994.3 998.3
* nv (nv) float64 1.0 2.0
* phalf (phalf) float64 3.0 6.467 10.45 14.69 ... 992.2 996.5 1e+03
Data variables:
z200 (tile, time, grid_yt, grid_xt) float32 dask.array
ucomp (tile, time, pfull, grid_yt, grid_xt) float32 dask.array
average_T1 (time) datetime64[ns] 2020-01-20 ... 2020-01-21T12:00:00
average_T2 (time) datetime64[ns] 2020-01-20T12:00:00 ... 2020-01-22
average_DT (time) timedelta64[ns] 12:00:00 12:00:00 12:00:00 12:00:00
time_bnds (time, nv) timedelta64[ns] 0 days 00:00:00 ... 2 days 00:00:00
```Here we can see that with a target chunk size of 10 MB, the chunk size of
`ucomp` along the tile dimension was cut in half.## Installation
To install `fv3dataset`, checkout the source code from GitHub and use `pip`:
```
$ git clone https://github.com/spencerkclark/fv3dataset.git
$ cd fv3dataset
$ pip install -e .
```