An open API service indexing awesome lists of open source software.

https://github.com/atlasoflivingaustralia/corella

Check and prepare data to comply with Darwin Core standards in R
https://github.com/atlasoflivingaustralia/corella

Last synced: 3 months ago
JSON representation

Check and prepare data to comply with Darwin Core standards in R

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# corella corella website

[![CRAN status](https://www.r-pkg.org/badges/version/corella)](https://CRAN.R-project.org/package=corella)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

## Overview

`corella` is an R package that helps users standardize their data using the
[*Darwin Core*](https://dwc.tdwg.org) data standard, used for biodiversity data like species occurrences. `corella` provides tools to prepare, manipulate and validate data against the standard's criteria. Once standardized, data can be subsequently shared as a [*Darwin Core Archive*](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide#what-is-darwin-core-archive-dwc-a) and published to open data infrastructures like the [Atlas of Living Australia](https://www.ala.org.au) and [GBIF](https://www.gbif.org/).

`corella` was built, and is maintained, by the [Science & Decision Support Team](https://labs.ala.org.au) at the [Atlas of Living Australia](https://www.ala.org.au) (ALA). It is named for the Little Corella ([_Cacatua sanguinea_](https://bie.ala.org.au/species/https%3A//biodiversity.org.au/afd/taxa/34b31e86-7ade-4cba-960f-82a6ae586206)). The logo was designed by [Dax Kellie](https://daxkellie.com/).

If you have any comments, questions or suggestions, please [contact us](mailto:[email protected]).

## Installation

You can install the development version of `corella` from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("AtlasOfLivingAustralia/corella")
```

## Usage

Here we have a small sample of example data containing observations of cockatoos. Using corella we can convert our data to use Darwin Core Standard.

```{r}
library(corella)
library(tibble)

# A simple example of species occurrence data
df <- tibble(
species = c("Callocephalon fimbriatum", "Eolophus roseicapilla"),
latitude = c(-35.310, "-35.273"), # deliberate error for demonstration purposes
longitude = c(149.125, 149.133),
eventDate = c("14-01-2023", "15-01-2023"),
status = c("present", "present")
)

df
```

One of the most important aspects of Darwin Core Standard is using standard column names (Darwin Core *terms*). We can update column names in our data to match Darwin Core terms with `set_` functions.

Each `set_` function name corresponds to the type of data, and argument names correspond to the available Darwin Core terms to use as column names. `set_` functions support data wrangling operations & `dplyr::mutate()` functionality, meaning columns can be changed or fixed in your pipe. `set_` functions will indicate if anything needs fixing because they also automatically run checks on each column data to make sure each column is in the correct format.

```{r}
suppressMessages( # for readability

df |>
set_coordinates(
decimalLatitude = as.numeric(latitude), # fix latitude
decimalLongitude = longitude
) |>
set_scientific_name(
scientificName = species
) |>
set_datetime(
eventDate = lubridate::dmy(eventDate) # specify date format
) |>
set_occurrences(occurrenceStatus = status)

)
```

Not sure where to start? Use `suggest_workflow()` to know what steps you need to make to make your data Darwin Core compliant.

```{r}
df |>
suggest_workflow()
```

Or, if your data is nearly ready and you want to run checks over all columns that match Darwin Core terms, run `check_dataset()`.

```{r}
df |>
check_dataset()
```

## Citing corella

To generate a citation for the package version you are using, you can
run:

``` r
citation(package = "corella")
```

The current recommended citation is:

> Kellie D, Balasubramaniam S & Westgate MJ (2024) corella:
> Tools to standardize biodiversity data to Darwin Core. R Package version
> 0.1.0.