Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hypertidy/gdalio

consider mdsumner/whatarelief instead WIP (helpers for gdal warper, and wrappers for various packages)
https://github.com/hypertidy/gdalio

Last synced: 3 months ago
JSON representation

consider mdsumner/whatarelief instead WIP (helpers for gdal warper, and wrappers for various packages)

Awesome Lists containing this project

README

        

---
output: github_document
editor_options:
chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
fig.width=12, fig.height=8
)

```

# gdalio

![r-universe](https://hypertidy.r-universe.dev/badges/gdalio)
[![R-CMD-check](https://github.com/hypertidy/gdalio/workflows/R-CMD-check/badge.svg)](https://github.com/hypertidy/gdalio/actions)

The goal of gdalio is to read data direct with GDAL warp, with a simple configuration - specify the *target grid* once. This saves us from a lot of complication, juggling of formats, objects, and extra code.

We have these functions to easily read raster data via the GDAL warp library:

* `gdalio_data()` read data directly from a *data source name* i.e. file path, url, or database connection

* `gdalio_set_default_grid()` specify a grid (extent, dimension, projection) to use for all subsequent operations
* `gdalio_get_default_grid()` get the grid currently in use
* `vrt()` simple function (not actually VRT, but doing similar in a limited way) to *augment* data sources that have missing or incorrect *extent* or *projection* metadata
* `gdalio_matrix()`, `gdalio_array()`, and `gdalio_graphics()` which reformat the data into commonly used R types for images
* `gdalio_data_hex()`, `gdalio_data_rgb()` special cases of gdalio_data() to read 3 or 4 bands, convert to text hex codes
* `gdalio_local_grid()` a helper to create a local projected region around a longlat (optional width extent, dimension, projection family)
* `gdalio_format_source()` a helper to *return the file path* to source code, to define formatters for spatial package types (raster, stars, terra, spatstat).

In this readme we illustrate the use of these from some online and local raster data sources, and provide helpers for reading into particular formats used in R (base matrix, raster package, stars package, spatstat package, terra package).

## Installation

You can install gdalio with:

```r
#install.packages("remotes")
## currently need this branch to avoid dev problems in vapour
remotes::install_github("hypertidy/vapour@stable-2022")
remotes::install_github("hypertidy/gdalio")
```

```{r, include = FALSE,eval=F}
# Enable this universe
options(repos = c(
hypertidy = 'https://hypertidy.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))

# Install some packages
install.packages(c('vapour', 'gdalio'))
```

## Target grid specification

Key is having a *target grid*, we nominate it upfront and then any data we request from GDAL will *fill that grid* by GDAL's warp magic.

A *grid* is an abstract description of an *image* data set:

* extent `(xmin, xmax, ymin, ymax)` in some coordinate system
* dimension (number of columns, number of rows)
* projection - the actual coordinate system

Often we have an actual object in some format that records this information, but this can be much simpler by working with just 6 numbers and one character string.

## Example

This works best for data you have access to locally, and in the simplest case you could use gdalio like this, but more nuanced use requires some effort to define the structure of the output data (which we explore below).

```{r simple, eval=FALSE}
library(gdalio)
vals <- gdal_data("myhugefile.tif")
```

For a real example we use a file that's on the internet and requires a little extra prep.

A sea surface temperature data set, we need GDAL's subdataset syntax for a file at a URL and *augment* our file address with what we know is the projection of the data.

```{r oisst-source}
library(gdalio)

## online data, daily ocean temperature surface product, one layer in longlat 0.25 degree res
f <- vrt('NETCDF:"/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/201809/oisst-avhrr-v02r01.20180929.nc":sst',
projection = "+proj=longlat +datum=WGS84")
```

```{r grid-spec}
## we set up a grid (this is a *raster* in abstraction)
grid0 <- list(extent = c(-1e6, 1e6, -5e5, 5e5 ),
dimension = c(512, 256),
projection = "+proj=laea +lon_0=147 +lat_0=-42")
gdalio_set_default_grid(grid0)
```

And now start reading data.

```{r warp}

## then we get GDAL to get a value for every pixel in our grid
pix <- gdalio_data(f)
## we have a list vector for each band (just one here)
plot(pix[[1]], pch = ".")

```

Those look a little simplified (it's because we are asking for quite high resolution from a low resolution source). So let's use a different *resampling* algorithm (default is nearest neighbour, no interpolation).

```{r bilinear}
pix_interp <- gdalio_data(f, resample = "bilinear")
## use resampling we get quite a different result
plot(pix_interp[[1]], pch = ".")

```

Normally of course we want a bit more convenience, and actually fill a format in R or some package that has spatial types. So we define those helpers here.

## Raster formats

This function is equivalent to a number of others defined just below, to format the data into objects used by various packages.

```{r formats}
## simple list format used by graphics::image() - we can only handle one band/layer
gdalio_base <- function(dsn, ...) {
v <- gdalio_data(dsn, ...)
g <- gdalio_get_default_grid()
list(x = seq(g$extent[1], g$extent[2], length.out = g$dimension[1]),
y = seq(g$extent[3], g$extent[4], length.out = g$dimension[2]),
z = matrix(v[[1]], g$dimension[1])[, g$dimension[2]:1])
}

```

R for a long time had this powerful list(x,y,z) format for `image()`:

```{r image}
xyz <- gdalio_base(f)
image(xyz)
```

There are now several different formats used by various packages that are
equivalent. At root they specify *extent*, *dimension*, *projection* the core
concepts of our *target grid* (the less specialized ones ignore the projection).
Some of these kinds of functions require those format-specific packages, but
they are easy enough to write so we list these here as examples.

There are functions `gdalio_matrix()`, `gdalio_array()`, and `gdalio_graphics()` in this package that
put the data into the native R image form.

```{r raster-format, echo=FALSE}
knitr::read_chunk("inst/raster_format/raster_format.codeR")
```

```{r raster-format-chunks}
<>
<>
<>
<>
```

To obtain all of those functions you can do, but note the entire dependency requirement includes at least raster, stars, spatstat.geom, terra, so simply use the definitions as needed.

```{r source_format-chunks, eval=TRUE}
source(gdalio_format_source())
```

Note that for each format there is nothing of consequence that is different,
from the perspective of gdalio they all take the same set of pixel values, there
are just tiny differences in how the extent and projection metadata are handled,
and in storage orientation for the data itself.

To prove the point we now read the same data but into our format of choice. We
are *re-reading* data here (it all exists in `pix` above, but you get the idea).

```{r example1}
op <- par(mfrow = c(2, 2))
#plot(matrix(g$extent, ncol = 2), type = "n", asp = 1, xlab = "x", ylab = "y", main = "stars")
image(gdalio_stars(f), main = "stars")
raster::plot(gdalio_raster(f), col = hcl.colors(26), main = "raster")
terra::plot(gdalio_terra(f), main = "terra")
plot(gdalio_im(f), main = "\nspatstat")
par(op)
```

### Resampling algorithm

In the same way, we can also use different methods of resampling and easily see the effect.

```{r example-resample}
op <- par(mfrow = c(2, 2), mar = par("mar")/3)
image(cs <- gdalio_stars(f, resample = "cubicspline"), col = hcl.colors(26))
image(lz <- gdalio_stars(f, resample = "near"), col = hcl.colors(26))
image(cs - lz)
par(op)
```

## Imagery

This works as well for online image sources (like photos or street maps).

```{r imagery}
virtualearth_imagery <- tempfile(fileext = ".xml")
writeLines('

http://a${server_num}.ortho.tiles.virtualearth.net/tiles/a${quadkey}.jpeg?g=90

4

', virtualearth_imagery)

img <- gdalio_raster(virtualearth_imagery, bands = 1:3)
raster::plotRGB(img)

## let's really zoom in on somewhere cool
grid1 <- list(extent = c(-1, 1, -1, 1) * 2e3,
dimension = c(512, 512),
projection = "+proj=laea +lon_0=147.325 +lat_0=-42.880556")
gdalio_set_default_grid(grid1)
img <- gdalio_raster(virtualearth_imagery, bands = 1:3)
raster::plotRGB(img)

```

To obtain the raw data values we can use `gdalio_data_rgb()` or
`gdalio_data_hex()` without specifying the number of bands, but these are a
little experimental for now.

```{r rgb}
rgbvals <- gdalio_data_rgb(virtualearth_imagery)
hexvals <- gdalio_data_hex(virtualearth_imagery)

```

We can use this to drive R's own raster type, the `grDevices::as.raster()` array.

```{r grDevices-raster}
arr <- gdalio_graphics(virtualearth_imagery)
grid1 <- gdalio_get_default_grid()
plot(matrix(grid1$extent, 2), type = "n", asp = 1)
graphics::rasterImage(arr, grid1$extent[1], grid1$extent[3], grid1$extent[2], grid1$extent[4])
```

Using R itself we can `plot()` but we must use a different idiom for the
extent.

*Don't use `plot(arr, xlim = ,ylim = )` it does not work as claimed we can only
get sensible extents at R version 4.1.0 with `rasterImage()`.*

```{r dum-raster}
plot(arr)
par("usr")
```

Another great example off twitter.

```{r twitter}
u <- "WMS:https://tiles.maps.eox.at/?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=s2cloudless-2019&SRS=EPSG:4326&BBOX=-180.000000,-90.000000,180.000000,90.000000&FORMAT=image/png&TILESIZE=256&OVERVIEWCOUNT=17&MINRESOLUTION=0.0000053644180298&TILED=true"

library(gdalio) ## https://github.com/hypertidy/gdalio
gdalio_set_default_grid(list(extent = c(-1, 1, -1, 1) * 1e6,
dimension = c(1024, 1024),
projection = "+proj=laea +lon_0=139 +lat_0=39"))
x <- gdalio_graphics(u)
plot(x)
```

## Default grid (there is one)

Say we don't set a grid at all, just go a default. Currently gdalio has a default for an entire world longlat grid.
This means we can read from any source and we'll get something (though we might not see anything if the source is a tiny region).

```{r default}
gdalio_set_default_grid()
terra::plot(gdalio_terra(f))

```

## Miscellaneous

Some sources, files in spData, image servers, etc.

```{r example-data}
elevation.tiles.prod <- tempfile(fileext = ".xml")
writeLines('

https://s3.amazonaws.com/elevation-tiles-prod/geotiff/${z}/${x}/${y}.tif


-20037508.34
20037508.34
20037508.34
-20037508.34
14
1
1
top

EPSG:3857
512
512
1
Int16
403,404

-32768


', elevation.tiles.prod)

## we'll use this as a grid specification, not the actual data for anything
sfiles <- list.files(system.file("raster", package = "spDataLarge", mustWork = TRUE), full.names = TRUE)

## we don't take raster objects, just the spec: extent, dim, projection
ri <- vapour::vapour_raster_info(sfiles[1])

gdalio_set_default_grid(list(extent = ri$extent,
dimension = ri$dimXY,
projection = ri$projection))
s <- gdalio_stars(elevation.tiles.prod)
library(stars); plot(s)

## we can do this anywhere, in any projection but it depends on what our source *has* of course
## but, it's pretty general and powerful
gdalio_set_default_grid(list(extent = c(-1, 1, -1, 1) * 3e6,
dimension = c(768, 813),
projection = "+proj=stere +lat_0=-65 +lon_0=147"))
p <- gdalio_im(elevation.tiles.prod)
plot(p)

```

My favourite projection family (I think) is Oblique Mercator. For a long time I've wanted this kind of freedom and convenience for working with spatial data ... rather than constantly juggling objects and formats and plumbing, more to come. :)

```{r omerc}
omerc <- "+proj=omerc +lonc=147 +gamma=9 +alpha=9 +lat_0=-10 +ellps=WGS84"

gdalio_set_default_grid(list(extent = c(-1, 1, -1, 1) * 7e6,
dimension = c(768, 813),
projection = omerc))

o <- gdalio_raster(elevation.tiles.prod)
raster::plot(o, col = hcl.colors(52))
xy <- reproj::reproj(raster::coordinates(o), "+proj=longlat", source = raster::projection(o))
xy[xy[,1] < 0, 1] <- xy[xy[,1] < 0, 1] + 360
library(raster)
contour(raster::setValues(o, xy[,1]), add = TRUE, col = "white")
contour(setValues(o, xy[,2]), add = TRUE, col = "white")

```

The fun part is I can now change the source, I've already set up the map I want and I can simply ask for a new set of pixels. The topography data happens to be in Mercator from a tiled and level of detail image service, while the SST is from a model output format (NetCDF) at a single resolution. GDAL doesn't care! Let's make them the same:

```{r netcdf}
sst <- gdalio_raster(f)
raster::plot(sst, col = palr::sst_pal(26))
contour(setValues(o, xy[,1]), add = TRUE, col = "white")
contour(setValues(o, xy[,2]), add = TRUE, col = "white")
contour(o, add = TRUE, labels = "", col = "black", levels = seq(0, 4500, by = 500))
```

## Code of Conduct

Please note that the gdalio project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.