Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dimfalk/kostra2020

R interface for KOSTRA-DWD-2020 dataset
https://github.com/dimfalk/kostra2020

dwd kostra precipitation rainfall

Last synced: 3 months ago
JSON representation

R interface for KOSTRA-DWD-2020 dataset

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# kostra2020

[![R-CMD-check](https://github.com/dimfalk/kostra2020/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/dimfalk/kostra2020/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/dimfalk/kostra2020/branch/main/graph/badge.svg)](https://app.codecov.io/gh/dimfalk/kostra2020?branch=main)

--- As of 01.01.2023, kostra2020 officially replaces [kostra2010R](https://github.com/dimfalk/kostra2010R). ---

The main goal of kostra2020 is to provide access to KOSTRA-DWD-2020 dataset from within R.

Abstract (slightly modified) from the [official dataset description](https://opendata.dwd.de/climate_environment/CDC/grids_germany/return_periods/precipitation/KOSTRA/KOSTRA_DWD_2020/Datensatzbeschreibung_KOSTRA-DWD-2020_en.pdf):

> This vector dataset contains statistical precipitation values as a function of duration and return period. The scope of the data is the engineering dimensioning of water management structures. These include, sewerage networks, sewage treatment plants, pumping stations and retention basins. They are also often used for the dimensioning of drainage and infiltration systems. With the help of the data, however, it is also possible to estimate the precipitation level of severe heavy precipitation events with regard to their return periods. This estimation is often used to assess damage events.

> The dataset encompasses values of statistical precipitation (HN) and uncertainty range (UC) for 22 duration levels D (5 min - 7 days) and 9 return periods Tn (1-100 a). Here, the dataset has been filtered for grid cells with actual information available, reducing the full grid of 300 x 300 cells at a spatial resolution of 5 km to a relevant subset of 15,989 cells. Also, explicit rain donation (RN) information was removed since this data can be reproduced on the fly. INDEX_RC describes the unique identifier of a grid cell.

Further information can be found in the support documents at https://www.dwd.de/kostra

## Installation

You can install the development version of kostra2020 with:

``` r
# install.packages("devtools")
devtools::install_github("dimfalk/kostra2020")
```

and load the package via

```{r}
library(kostra2020)
```

## Getting started

### Get "INDEX_RC" based on row and column information

Sometimes identification of grid cells is not accomplished using "INDEX_RC" directly but rather using a combination of X and Y information (e.g. row 49, column 125). This information can easily be used to generate the necessary "INDEX_RC" field.

```{r}
# Generate "INDEX_RC" based on row and column information.
idx_build(row = 117, col = 111)
```

If you wanted to check whether this constructed "INDEX_RC" field is really present in the dataset (or you found an ID in some report and are not sure, if it is still being used), make use of the following function.

```{r}
# Is the following "INDEX_RC" entry present in the dataset?
idx_exists("117111")
```

### Get "INDEX_RC" based on spatial information

The most common use case will be to get the relevant "INDEX_RC" based on coordinates provided, e.g. for the location of a precipitation station in order to be able to classify duration-specific precipitation depths in terms of return periods.

```{r}
# Sf objects created based on specified coordinates. Don't forget to pass the CRS.
p1 <- get_centroid(c(6.19, 50.46), crs = "epsg:4326")
p1

p2 <- get_centroid(c(367773, 5703579), crs = "epsg:25832")
p2
```

For convenience, it is also possible to provide municipality names or postal codes to derive coordinates.

```{r}
# Sf objects created based on Nominatim API response. Internet access required!
p3 <- get_centroid("40477")
p3

p4 <- get_centroid("Freiburg im Breisgau")
p4

p5 <- get_centroid("Kronprinzenstr. 24, 45128 Essen")
p5
```

These coordinates can be used subsequently to spatially query the relevant grid index.

```{r}
# Get indices by topological intersection between location point and grid cells.
get_idx(p1)
get_idx(p2)
get_idx(p3)
get_idx(p4)
get_idx(p5)
```

### Construct cell-specific statistics from KOSTRA-DWD-2020 grid

Now that we have messed a little with the grid cell identifiers, let's get a sneak peek into the dataset itself based on the "INDEX_RC" specified.

```{r}
# Build a tibble containing statistical precipitation depths as a function of
# duration and return periods for the grid cell specified.
stats <- get_stats("117111")

stats
```

Some describing attributes have been assigned to the tibble.

```{r}
attr(stats, "id")
attr(stats, "period")
attr(stats, "returnperiods_a")
attr(stats, "source")
```

### Get precipitation depths and uncertainties, calculate precipitation yield

If we now wanted to know the statistical precipitation depth e.g. for an event of 4 hours duration corresponding to a recurrence interval in 1:100 years, it's just a matter of indexing. However, there is a function helping you out.

```{r}
# So we are interested in the rainfall amount [mm] for an event lasting 240 min
# with a return period of 100 a.
get_depth(stats, d = 240, tn = 100)
```

In order to respect estimated grid cell specific uncertainties now additionally included in kostra2020, make use of `uc = TRUE` to get an interval centered around the single value above.

```{r}
# Same data, but with uncertainties considered.
get_depth(stats, d = 240, tn = 100, uc = TRUE)
```

Uncertainties are caused on the one hand by the statistical procedures themselves, but on the other hand also by the regionalisation procedure and by erroneous or missing observations. The uncertainties vary regionally and depend on the duration level and return period. For example, the rarer an event occurs statistically, the greater the uncertainties.

Specific uncertainty data is available for each grid cell and for each combination of duration level D and return period Tn. Please note that uncertainty ranges are given in ± %.

```{r}
# Inspect raw uncertainties used above.
get_uncertainties("117111")
```

If you need precipitation yield values [l/(s\*ha)] instead of precipitation depth [mm] or vice versa, make use of the following helper function.

```{r}
as_yield(55.6, d = 240)

as_depth(38.6, d = 240)
```

### Get return periods

Finally, we want to determine the return period according to the dataset for a precipitation depth and duration given.

```{r}
# Let's assume we measured 86.2 mm in 24 h.
get_returnp(stats, hn = 86.2, d = 1440)
```

Accordingly, the approximate corresponding recurrence interval resp. annuality of this event amounts to something between 30 and 50 years as per KOSTRA-DWD-2020.

The following edge cases are to be mentioned:

```{r}
# 1) In case a class boundary is hit, the return period is replicated.
get_returnp(stats, hn = 41.8, d = 1440)
```

```{r}
# 2) In case the return period tn is smaller than 1, interval opens with 0.
get_returnp(stats, hn = 32.1, d = 1440)
```

```{r}
# 3) In case the return period tn is larger than 100, interval closes with Inf.
get_returnp(stats, hn = 110.8, d = 1440)
```

### Return period interpolation

Although it may be somewhat questionable from a scientific perspective, you might nevertheless be interested in the return period estimated using linear interpolation between adjacent nodes:

```{r}
# Using the same example as above, previously resulting in 30 a < tn < 50 a.
get_returnp(stats, hn = 86.2, d = 1440, interpolate = TRUE)
```

### Further utilization

Data can now additionally be visualized as intensity-duration-frequency curves using `plot_idf()`, underpinned by `{ggplot2}` ...

```{r, fig.width = 9}
plot_idf(stats, log10 = TRUE)
```

... or exported to disk using `write_stats()` based on `write.table()`.

## Contributing

See [here](https://github.com/dimfalk/kostra2020/blob/main/.github/CONTRIBUTING.md) if you'd like to contribute.