Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/AustralianAntarcticDivision/blueant

Environmental data for Antarctic and Southern Ocean science
https://github.com/AustralianAntarcticDivision/blueant

Last synced: about 1 month ago
JSON representation

Environmental data for Antarctic and Southern Ocean science

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.path = "man/figures/README-")

```

[![R-CMD-check](https://github.com/AustralianAntarcticDivision/blueant/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AustralianAntarcticDivision/blueant/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/AustralianAntarcticDivision/blueant/branch/master/graph/badge.svg)](https://codecov.io/gh/AustralianAntarcticDivision/blueant?branch=master)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)

# Blueant

Blueant provides a set of data source configurations to use with the [bowerbird](https://github.com/AustralianAntarcticDivision/bowerbird) package. These data sources are themed around Antarctic and Southern Ocean data, and include a range of oceanographic, meteorological, topographic, and other environmental data sets. Blueant will allow you to download data from these external data providers to your local file system, and to keep that data collection up to date.

## Installing

```{r eval=FALSE}
## Download and install blueant in R
install.packages("blueant", repos = c(SCAR = "https://scar.r-universe.dev",
CRAN = "https://cloud.r-project.org"))

## or
## install.packages("remotes") ## if needed
remotes::install_github("AustralianAntarcticDivision/blueant")

```

## Usage overview

### Configuration

Build up a configuration by first defining global options such as the destination on your local file system. Usually you would choose this destination data directory to be a persistent location, suitable for a data library. For demonstration purposes here we'll just use a temporary directory:

```{r include = FALSE}
## code not shown in the README: (1) use load_all() instead of library() for convenience, (2) use anonymous "temporary" dir, and (3) hide the progress bar
##library(blueant)
devtools::load_all()
my_data_dir <- "/tmp/data"
if (!dir.exists(my_data_dir)) dir.create(my_data_dir, recursive = TRUE)
if (dir.exists(file.path(my_data_dir, "data.aad.gov.au/eds/file/4494"))) unlink(file.path(my_data_dir, "data.aad.gov.au/eds/file/4494"), recursive = TRUE)
cf <- bb_config(local_file_root = my_data_dir)
mysrc <- sources("George V bathymetry") %>% bb_modify_source(method = list(show_progress = FALSE))
cf <- cf %>% bb_add(mysrc)
```

```{r eval = FALSE}
library(blueant)
my_data_dir <- tempdir()
cf <- bb_config(local_file_root = my_data_dir)
```

Add data sources from those provided by blueant. A summary of these sources is given at the end of this document. Here we'll use the "George V bathymetry" data source as an example:

```{r eval = FALSE}
mysrc <- sources("George V bathymetry")
cf <- cf %>% bb_add(mysrc)
```

This data source is fairly small (around 200MB, see `mysrc$collection_size`). Be sure to check the `collection_size` parameter of your chosen data source before running the synchronization. Some of these collections are quite large (see the summary table at the bottom of this document).

### Synchronization

Once the configuration has been defined and the data source added to it, we can run the sync process. We set `verbose = TRUE` here so that we see additional progress output:

```{r eval = FALSE}
status <- bb_sync(cf, verbose = TRUE)
```
```{r echo = FALSE}
## actually run this one so we don't get asked for confirmation
status <- bb_sync(cf, verbose = TRUE, confirm_downloads_larger_than = -1)
```

Congratulations! You now have your own local copy of this data set. The files in this data set have been stored in a data-source-specific subdirectory of our local file root, with details given by the returned `status` object:

```{r}
myfiles <- status$files[[1]]
myfiles
```

The data sources provided by blueant can be read, manipulated, and plotted using a range of other R packages, including [RAADTools](https://github.com/AustralianAntarcticDivision/raadtools) and [raster](https://cran.r-project.org/package=raster). In this case the data files are netcdf, which can be read by `raster`:

```{r rasterplot, message = FALSE, warning = FALSE}
library(raster)
x <- raster(myfiles$file[grepl("gvdem500m_v3", myfiles$file)])
plot(x)
```

## Nuances

### Choosing a data directory

It's up to you where you want your data collection kept, and to provide that location to bowerbird. A common use case for bowerbird is maintaining a central data collection for multiple users, in which case that location is likely to be some sort of networked file share. However, if you are keeping a collection for your own use, you might like to look at https://github.com/r-lib/rappdirs to help find a suitable directory location.

### Authentication

Some data providers require users to log in. These are indicated by the `authentication_note` column in the configuration table. For these sources, you will need to provide your user name and password, e.g.:

```{r eval=FALSE}
src <- sources(name="CMEMS global gridded SSH reprocessed (1993-ongoing)")
src$user <- "yourusername"
src$password <- "yourpassword"
cf <- bb_add(cf, src)

## or, using the pipe operator
mysrc <- bb_example_sources("CMEMS global gridded SSH reprocessed (1993-ongoing)") %>%
bb_modify_source(user = "yourusername", password = "yourpassword")
cf <- cf %>% bb_add(mysrc)
```

### Writing and modifying data sources

The bowerbird documentation is a good place to start to find out more about writing your own data sources or modifying existing ones.

#### Reducing download sizes

Sometimes you might only want part of a data collection. Perhaps you only want a few years from a long-term collection, or perhaps the data are provided in multiple formats and you only need one. If the data source uses the `bb_handler_rget` method, you can restrict what is downloaded by modifying the arguments passed through the data source's `method` parameter, particularly the `accept_follow`, `reject_follow`, `accept_download`, and `reject_download` options.

For example, the CERSAT SSM/I sea ice concentration data are arranged in yearly directories, and so it is fairly easy to restrict ourselves to, say, only the 2017 data:

```{r eval=FALSE}
mysrc <- sources("CERSAT SSM/I sea ice concentration")

## first make sure that the data source doesn't already have an accept_follow parameter defined
"accept_follow" %in% names(mysrc$method[[1]])

## nope, so we can safely go ahead and impose our own
mysrc$method[[1]]$accept_follow <- "/2017"
cf <- cf %>% bb_add(mysrc)
```

Alternatively, for data sources that are divided into subdirectories, one could replace the whole-data-source `source_url` with one or more that point to specific yearly (or other) subdirectories. For example, the default `source_url` for the CERSAT sea ice data above is `ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/` (which has yearly subdirectories). So e.g. for 2016 and 2017 data we could do:

```{r eval=FALSE}
mysrc <- sources("CERSAT SSM/I sea ice concentration")
mysrc$source_url[[1]] <- c(
"ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/2016/",
"ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/2017/")
cf <- cf %>% bb_add(mysrc)
```

#### Defining new data sources

If the blueant data sources don't cover your needs, you can define your own using the `bb_source` function. See the bowerbird documentation.

## Data source summary

These are the data source definitions that are provided as part of the blueant package.

```{r sources_summary, echo = FALSE, message = FALSE, warning = FALSE, results = "asis"}
library(blueant) ##devtools::load_all("../bowerbird"); devtools::load_all()
##library(DT)
cf <- bb_config("/your/data/root/") %>% bb_add(sources())
##cfsum <- do.call(rbind, lapply(seq_len(nrow(cf$data_sources)), function(z) cf$data_sources[z, c("data_group", "name", "description", "authentication_note", "collection_size", "doc_url")]))
##cfsum <- setNames(cfsum[order(cfsum$data_group), ], c("Data group", "Data source name", "Description", "Authentication note", "Approximate size (GB)", "Documentation link"))
##datatable(cfsum, rownames = FALSE, options = list(
## autoWidth = TRUE, columnDefs = list(list(className = "dt-head-left", targets = "_all")))) %>% formatStyle(1:6, "vertical-align" = "top") %>% formatStyle(1:6, "text-align" = "left")
sf <- bb_summary(cf, format = "rmd", inc_license = FALSE, inc_access_function = FALSE, inc_path = FALSE)
stxt <- readLines(sf)
stxt <- stxt[(grep("Last updated:", stxt) + 1):length(stxt)]
stxt <- gsub("^#", "##", stxt) ## push each header level down one
stxt <- gsub("^\\-", "\n-", stxt)
for (k in stxt) cat(k, "\n")
```