An open API service indexing awesome lists of open source software.

https://github.com/geoarrow/geoarrow-r

Extension types for geospatial data for use with 'Arrow'
https://github.com/geoarrow/geoarrow-r

Last synced: 4 months ago
JSON representation

Extension types for geospatial data for use with 'Arrow'

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# geoarrow

[![Codecov test coverage](https://codecov.io/gh/geoarrow/geoarrow-r/branch/main/graph/badge.svg)](https://app.codecov.io/gh/geoarrow/geoarrow-r?branch=main)

The goal of geoarrow is to leverage the features of the [arrow](https://arrow.apache.org/docs/r/) package and larger [Apache Arrow](https://arrow.apache.org/) ecosystem for geospatial data. The geoarrow package provides an R implementation of the [GeoParquet](https://github.com/opengeospatial/geoparquet) file format of and the draft [geoarrow data specification](https://geoarrow.org), defining extension array types for vector geospatial data.

## Installation

You can install the released version of geoarrow from [CRAN](https://cran.r-project.org/) with:

``` r
install.packages("geoarrow")
```

You can install the development version of geoarrow from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("geoarrow/geoarrow-r")
```

## Example

The geoarrow package implements conversions to/from various geospatial types (e.g., sf, sfc, s2, wk) with various Arrow representations (e.g., arrow, nanoarrow). The most useful conversions are between the **arrow** and **sf** packages, which in most cases allow sf objects to be passed to **arrow** functions directly after `library(geoarrow)` or `requireNamespace("geoarrow")` has been called.

```{r example}
library(geoarrow)
library(arrow, warn.conflicts = FALSE)
library(sf)

nc <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))
tf <- tempfile(fileext = ".parquet")

nc |>
tibble::as_tibble() |>
write_parquet(tf)

open_dataset(tf) |>
dplyr::filter(startsWith(NAME, "A")) |>
dplyr::select(NAME, geom) |>
st_as_sf()
```

By default, arrow objects are converted to a neutral wrapper around chunked Arrow memory, which in turn implements conversions to most spatial types:

```{r}
df <- read_parquet(tf)
df$geom
st_as_sfc(df$geom)
```

The entry point to creating arrays is `as_geoarrow_vctr()`:

```{r}
as_geoarrow_vctr(c("POINT (0 1)", "POINT (2 3)"))
```

By default these do not attempt to create a new storage type; however, you can request a storage type or infer one from the data:

```{r}
as_geoarrow_vctr(c("POINT (0 1)", "POINT (2 3)"), schema = geoarrow_native("POINT"))

vctr <- as_geoarrow_vctr(c("POINT (0 1)", "POINT (2 3)"))
as_geoarrow_vctr(vctr, schema = infer_geoarrow_schema(vctr))
```

There are a number of files to use as examples at that can be read with `arrow::read_ipc_file()`:

```{r}
url <- "https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.arrow"
tab <- read_ipc_file(url, as_data_frame = FALSE)
tab$geometry$type
```