Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brownag/labtaxa

Analysis-ready Rocker RStudio Server-based Container for the USDA-NRCS-NCSS Kellogg Soil Survey Lab Data Mart 'SQLite' Snapshot
https://github.com/brownag/labtaxa

database docker geopackage kssl lab ncss r rstudio rstudio-server soil soil-survey

Last synced: 8 days ago
JSON representation

Analysis-ready Rocker RStudio Server-based Container for the USDA-NRCS-NCSS Kellogg Soil Survey Lab Data Mart 'SQLite' Snapshot

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# labtaxa

[![R-CMD-check](https://github.com/brownag/labtaxa/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/brownag/labtaxa/actions/workflows/R-CMD-check.yaml)
[![docker-publish](https://github.com/brownag/labtaxa/actions/workflows/docker-publish.yml/badge.svg)](https://github.com/brownag/labtaxa/actions/workflows/docker-publish.yml)
[![Docker Pulls](https://badgen.net/docker/pulls/brownag/labtaxa?icon=docker&label=pulls)](https://hub.docker.com/r/brownag/labtaxa/)
[![ghcn.io](https://ghcr-badge.egpl.dev/brownag/labtaxa/size)](https://github.com/users/brownag/packages/container/package/labtaxa)

The goal of {labtaxa} is to provide 'Lab Data Mart' () database snapshots for use in R. This repository is an R package used to download and cache the contents of the Lab Data Mart GeoPackage database.

## Docker

To get up and running quickly you can use the Docker container. The `labtaxa` container is based on [`"rocker/rstudio:latest"`](https://hub.docker.com/r/rocker/rstudio). In addition to the standard RStudio tools, the container has the cached Lab Data Mart GeoPackage (containing lab and spatial data) and the companion morphologic database (derived from NASIS pedon descriptions) pre-downloaded. Also, the results of `soilDB::fetchLDM()` called on the LDM GeoPackage data source are cached as an R object file (.rds). Finally, a variety of useful R packages are pre-installed. All can be accessed via an RStudio Server web browser interface.

From Docker Hub:

``` sh
docker pull brownag/labtaxa:latest
```

Or from GitHub:

``` sh
docker pull ghcr.io/brownag/labtaxa:latest
```

Once you have a local copy of the `labtaxa` container you can run:

``` sh
docker run -d -p 8787:8787 -e PASSWORD=mypassword -v ~/Documents:/home/rstudio/Documents -e ROOT=TRUE brownag/labtaxa
```

Then open your web browser and navigate to `http://localhost:8787`. The default username is `rstudio` and the default password is `mypassword`.

## R Package Installation

You can install the development version of {labtaxa} from GitHub:

``` r
if (!require("labtaxa"))
remotes::install_github("brownag/labtaxa")
```

## Example

Download (and cache) the latest Lab Data Mart SQLite snapshot from like so:

```{r example, eval = FALSE}
library(labtaxa)
ldm <- get_LDM_snapshot()
```

Downloaded and derived files will be cached in platform-specific directory specified by `ldm_data_dir()` using `cache_labtaxa()`

In the Docker container the snapshot has already been created and cached from the latest data (as of the last time the container was built). Updates to the method used to create the cache, as well as scheduled (monthly) updates occur.

The cached data help to get off and running quickly analyzing the entire KSSL database using the [{aqp}](https://cran.r-project.org/package=aqp) R package toolchain.

The lab data are pre-loaded in a large SoilProfileCollection object (over 65,000 profiles). In only a few seconds from when you have the Docker container loaded, you can be filtering and processing the lab data object. Downloading archives of the complete databases can take 10s of minutes to a couple hours (depending on internet connection). Only in cases where the absolute most recent data are needed would require doing a cache update.

The downloaded databases (GeoPackage, SQLite) are queried locally using {soilDB} functions `fetchLDM()` and `fetchNASIS()`. The {soilDB} functions can take a couple minutes to process on larger databases like this, so the container building process front loads these more costly processing steps. Querying the data using a method like this essentially precedes all analyses. soilDB provides standard aggregation methods that produce {aqp} SoilProfileCollections, which provide a convenient data structure for working with horizon and site level data associated with specific soil profiles.

When you start up {labtaxa} in the Docker container you hwill ave the latest database and the first-step data object (as if you ran the {soilDB} functions) readily available for post-processing for answering specific questions.

```{r example2, eval=FALSE}
ldm <- load_labtaxa()

length(ldm)
#> [1] 65403
```

If you are running on your own machine you will have to run `get_LDM_snapshot()` at least once (as above) before the `load_labtaxa()` command works. In future runs you will not need to re-download or prepare the data unless you need to update the cache.