Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ropensci-archive/finch

:warning: ARCHIVED :warning: Read Darwin Core Archive files
https://github.com/ropensci-archive/finch

biodiversity darwin-core darwincore gbif r r-package rstats

Last synced: 12 days ago
JSON representation

:warning: ARCHIVED :warning: Read Darwin Core Archive files

Awesome Lists containing this project

README

        

finch
=====

[![R-check](https://github.com/ropensci/finch/workflows/R-check/badge.svg)](https://github.com/ropensci/finch/actions?query=workflow%3AR-check)
[![cran checks](https://cranchecks.info/badges/worst/finch)](https://cranchecks.info/pkgs/finch)
[![codecov](https://codecov.io/gh/ropensci/finch/branch/master/graph/badge.svg)](https://codecov.io/gh/ropensci/finch)
[![cran version](https://www.r-pkg.org/badges/version/finch)](https://cran.r-project.org/package=finch)

`finch` parses Darwin Core simple and archive files

Docs:

* Darwin Core description at Biodiversity Information Standards site
* Darwin Core at Wikipedia

## Install

Stable version

```r
install.packages("finch")
```

Development version, from GitHub

```r
remotes::install_github("ropensci/finch")
```

```r
library("finch")
```

## Parse

To parse a simple darwin core file like

```


urn:catalog:YPM:VP.057488
PhysicalObject
2009-02-12T12:43:31
en
FossilSpecimen
YPM
VP
VP.057488
1

North America
United States
US
Montana
Garfield
Tyrannosourus rex
Tyrannosourus
rex
Creataceous
Creataceous
Late Cretaceous
Late Cretaceous

```

This file is in this package as an example file, get the file, then `simple()`

```r
file <- system.file("examples", "example_simple_fossil.xml", package = "finch")
out <- simple_read(file)
```

Index to `meta`, `dc` or `dwc`

```r
out$dc
#> [[1]]
#> [[1]]$type
#> [1] "PhysicalObject"
#>
#>
#> [[2]]
#> [[2]]$modified
#> [1] "2009-02-12T12:43:31"
#>
#>
#> [[3]]
#> [[3]]$language
#> [1] "en"
```

## Parse Darwin Core Archive

To parse a Darwin Core Archive like can be gotten from GBIF use `dwca_read()`

There's an example Darwin Core Archive:

```r
file <- system.file("examples", "0000154-150116162929234.zip", package = "finch")
(out <- dwca_read(file, read = TRUE))
#>
#> Package ID: 6cfaaf9c-d518-4ca3-8dc5-f5aadddc0390
#> No. data sources: 10
#> No. datasets: 3
#> Dataset occurrence.txt: [225 X 443]
#> Dataset multimedia.txt: [15 X 1]
#> Dataset verbatim.txt: [209 X 443]
```

List files in the archive

```r
out$files
#> $xml_files
#> [1] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/meta.xml"
#> [2] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/metadata.xml"
#>
#> $txt_files
#> [1] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/citations.txt"
#> [2] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/multimedia.txt"
#> [3] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/occurrence.txt"
#> [4] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/rights.txt"
#> [5] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/verbatim.txt"
...
```

High level metadata for the whole archive

```r
out$emlmeta
#> additionalMetadata:
#> metadata:
#> gbif:
#> citation:
#> identifier: 0000154-150116162929234
#> citation: GBIF Occurrence Download 0000154-150116162929234
#> physical:
#> objectName: []
#> characterEncoding: UTF-8
#> dataFormat:
#> externallyDefinedFormat:
#> formatName: Darwin Core Archive
#> distribution:
#> online:
#> url:
#> function: download
#> url: http://api.gbif.org/v1/occurrence/download/request/0000154-150116162929234.zip
#> dataset:
#> title: GBIF Occurrence Download 0000154-150116162929234
#> creator:
...
```

High level metadata for each data file, there's many files, but we'll just look at one

```r
hm <- out$highmeta
head( hm$occurrence.txt )
#> index term delimitedBy
#> 1 0 http://rs.gbif.org/terms/1.0/gbifID
#> 2 1 http://purl.org/dc/terms/abstract
#> 3 2 http://purl.org/dc/terms/accessRights
#> 4 3 http://purl.org/dc/terms/accrualMethod
#> 5 4 http://purl.org/dc/terms/accrualPeriodicity
#> 6 5 http://purl.org/dc/terms/accrualPolicy
```

You can get the same metadata as above for each dataset that went into the tabular dataset downloaded

```r
out$dataset_meta[[1]]
```

View one of the datasets, brief overview.

```r
head( out$data[[1]][,c(1:5)] )
#> gbifID abstract accessRights accrualMethod accrualPeriodicity
#> 1 50280003 NA NA NA
#> 2 477550574 NA NA NA
#> 3 239703844 NA NA NA
#> 4 239703843 NA NA NA
#> 5 239703833 NA NA NA
#> 6 477550692 NA NA NA
```

You can also give `dwca()` a local directory, or url that contains a Darwin Core Archive.

## Meta

* Please [report any issues or bugs](https://github.com/ropensci/finch/issues).
* License: MIT
* Get citation information for `finch` in R doing `citation(package = 'finch')`
* Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.

[![rofooter](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)