Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wcjochem/sfarrow

R package for reading/writing `sf` objects from/to parquet files with `arrow`.
https://github.com/wcjochem/sfarrow

Last synced: about 2 months ago
JSON representation

R package for reading/writing `sf` objects from/to parquet files with `arrow`.

Awesome Lists containing this project

README

        

---
output:
github_document:
html_preview: false
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/REAsDME-",
out.width = "100%"
)
```

# sfarrow: Read/Write Simple Feature Objects (`sf`) with 'Apache' 'Arrow'

`sfarrow` is a package for reading and writing Parquet and Feather files with
`sf` objects using `arrow` in `R`.

Simple features are a popular format for representing spatial vector data using
`data.frames` and a list-like geometry column, implemented in the `R` package
[`sf`](https://r-spatial.github.io/sf/). Apache Parquet files are an
open-source, column-oriented data storage format
([https://parquet.apache.org/](https://parquet.apache.org/)) which enable
efficient read/writing for large files. Parquet files are becoming popular
across programming languages and can be used in `R` using the package
[`arrow`](https://github.com/apache/arrow/).

The `sfarrow` implementation translates simple feature data objects using
well-known binary (WKB) format for geometries and reads/writes Parquet/Feather
files. A key goal of the package is for interoperability of the files
(particularly with Python `GeoPandas`), so coordinate reference system
information is maintained in a standard metadata format
([https://github.com/geopandas/geo-arrow-spec](https://github.com/geopandas/geo-arrow-spec)).
Note to users: this metadata format is not yet stable for production uses and
may change in the future.

## Installation

`sfarrow` is available through CRAN with:

```{r, eval=FALSE}
install.packages('sfarrow')
```

or it can be installed from Github with:

```{r eval=FALSE}
devtools::install_github("wcjochem/sfarrow@main")
```

Load the library to begin using it.

```{r}
library(sfarrow)
```

### `arrow` package

The installation requires the Arrow library which should be installed with the
`R` package `arrow` dependency. However, some systems may need to follow
additional steps to enable full support of that library. Please refer to the
`arrow`
[documentation](https://CRAN.R-project.org/package=arrow/vignettes/install.html).

## Basic usage

Reading Parquet data of spatial files created with Python `GeoPandas`.
```{r}
# load Natural Earth low-res dataset.
# Created in Python with geopandas.to_parquet()
path <- system.file("extdata", "world.parquet", package = "sfarrow")

world <- st_read_parquet(path)

world
plot(sf::st_geometry(world))
```

Writing `sf` objects to Parquet format files. These Parquet files created with
`sfarrow` can be read within Python using `GeoPandas`.
```{r}
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet=TRUE)

st_write_parquet(obj=nc, dsn=file.path(tempdir(), "nc.parquet"))

# read back into R
nc_p <- st_read_parquet(file.path(tempdir(), "nc.parquet"))

nc_p
plot(sf::st_geometry(nc_p))
```

For additional examples please see the vignettes.

## Contributions
Contributions, questions, ideas, and issue reports are welcome. Please raise an
issue to discuss or submit a pull request.

## Acknowledgements
This work benefited from the work by developers in the GeoPandas, Arrow, and
r-spatial teams. Thank you to the teams for their excellent, open-source work.