An open API service indexing awesome lists of open source software.

https://github.com/brownag/rogrsql

Experimental DBI Backend for Geospatial Data Abstraction Library (GDAL) 'OGRSQL' Dialect
https://github.com/brownag/rogrsql

Last synced: 3 months ago
JSON representation

Experimental DBI Backend for Geospatial Data Abstraction Library (GDAL) 'OGRSQL' Dialect

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# {ROGRSQL}

[![R-CMD-check](https://github.com/brownag/ROGRSQL/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/brownag/ROGRSQL/actions/workflows/R-CMD-check.yaml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/MIT/)
[![ROGRSQL Manual](https://img.shields.io/badge/docs-HTML-informational)](https://humus.rocks/ROGRSQL/)

The goal of {ROGRSQL} is to provide a basic proof-of-concept for limited DBI-compatible queries with the [Geospatial Data Abstraction Library (GDAL)](https://gdal.org/) ['OGRSQL'](https://gdal.org/user/ogr_sql_dialect.html) dialect.

## Installation

You can install the development version of {ROGRSQL} like so:

``` r
remotes::install_github("brownag/ROGRSQL")
```

## Example

Here is a basic demonstration using a sample dataset from the {terra} package.

The shapefile is written to a new GeoPackage, and the GeoPackage is used as the data source which is queried using GDAL OGRSQL.

```{r example-r}
library(ROGRSQL)

path <- sample_gpkg_path()

db <- dbConnect(ROGRSQL::OGRSQL(), path)
dbGetQuery(db, "SELECT ST_Centroid(geom) FROM lux LIMIT 2")
```

We can make more complex queries, but there are limitations in OGRSQL with the way that geometries versus attributes are interpreted.

```{r}
db <- dbConnect(ROGRSQL::OGRSQL(), path)
dbGetQuery(db, "SELECT ST_Centroid(geom), ST_ConvexHull(geom) FROM lux LIMIT 2")
```

Note that in above case two geometry columns are calculated in the result, but only one is selected by GDAL as the _main_ geometry column for the layer/query result. The result of `ST_ConvexHull(geom)` is then read as if it were a non-spatial attribute.

# RStudio Interface and RMarkdown

You can use the `OGRSQL()` DBI driver to dynamically preview your results in the RStudio SQL editor. This simply requires using a special comment at the top of your SQL script, like so:

```{sql example-sql, eval=FALSE}
-- !preview conn=DBI::dbConnect(ROGRSQL::OGRSQL(), ROGRSQL::sample_gpkg_path())

SELECT ST_Centroid(geom) FROM lux LIMIT 1;
```

```{sql example-sql2, eval=TRUE, echo=FALSE}
--| connection=DBI::dbConnect(ROGRSQL::OGRSQL(), ROGRSQL::sample_gpkg_path())

SELECT ST_Centroid(geom) FROM lux LIMIT 1;
```

In an {rmarkdown} code chunk, you can similarly do:
`--| connection=DBI::dbConnect(ROGRSQL::OGRSQL(), ROGRSQL::sample_gpkg_path())`.

# Constructing OGRSQL Queries with {dbplyr}

With `ROGRSQL::OGRSQL()` we are able to make full use of the OGRSQL dialect. This is unlike using the {RSQLite} `SQLite()` driver you might use for querying a GeoPackage, or using `sf::st_read(..., query=)` for other OGR source. For example, we can use spatial functions such as `ST_Centroid()` within our query to the data source. There are several other OGRSQL specific conveniences which are described in greater detail in the GDAL documentation:

Since the _GDALOGRSQLConnection_ object `db` encapsulates a _DBIConnection_, we can utilize {dbplyr} to construct SQL queries "lazily" with R code. Once the query has been built, we can execute it with `dbplyr::collect()`. Note that the same functionality does not work with the standard {RSQLite} driver.

```{r, error=TRUE}
library(ROGRSQL)
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

db1 <- dbConnect(RSQLite::SQLite(), system.file("extdata", "lux.gpkg", package = "ROGRSQL"))
db2 <- dbConnect(ROGRSQL::OGRSQL(), system.file("extdata", "lux.gpkg", package = "ROGRSQL"))

# error with ST_Centroid
tbl(db1, "lux") |>
group_by(NAME_1) |>
filter(ID_2 %% ID_1 == 0) |>
summarize(ST_Centroid(geom))

# works
tbl(db2, "lux") |>
group_by(NAME_1) |>
filter(ID_2 %% ID_1 == 0) |>
ungroup() |>
summarize(ST_Centroid(geom)) -> res

res |>
collect()

# inspect generated query
show_query(res)
```

# Similar work

I whipped this up as a proof-of-concept to better understand the workings of the GDAL API (here accessed via the excellent [{vapour}](https://github.com/hypertidy/vapour/) package by @mdsumner), as well as to learn more about developing {DBI}/{dbplyr} backends.

I was exploring this topic in the context of adding some new OGRSQL-specific functionality to [{gpkg}](https://github.com/brownag/gpkg/) which would utilize {vapour} rather than {terra}. It does seem as if there may not currently be a package on CRAN that provides a modern (dbplyr >= 2.0.0), fully DBI-compatible interface tailored to the GDAL OGRSQL dialect. So, perhaps there is a space for a package like this, but it is also not entirely clear it is "needed". I will point out that much of the basic SQL evaluation/spatial queries etc. can be readily achieved with {sf}+{DBI} or {terra}.

After getting the basics minimally working, and thinking about pushing the repo up to GitHub, I realized that folks have tread in this space many years prior to me, namely: @mdsumner (https://github.com/mdsumner/RGDALSQL); and @etiennebr (https://github.com/r-spatial/sfdbi). These packages both appear to be excellent implementations, and mostly working, though they have not been updated recently, and there have been some further developments in the {dbplyr} space in the last several years.

It is not terribly likely I will spend much more time developing this package, but if you are interested in seeing this functionality made more available get in contact with me. Feel free to ask questions/post issues in the [issue tracker](https://github.com/brownag/ROGRSQL/issues/).