Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ropensci-archive/finch
:warning: ARCHIVED :warning: Read Darwin Core Archive files
https://github.com/ropensci-archive/finch
biodiversity darwin-core darwincore gbif r r-package rstats
Last synced: 12 days ago
JSON representation
:warning: ARCHIVED :warning: Read Darwin Core Archive files
- Host: GitHub
- URL: https://github.com/ropensci-archive/finch
- Owner: ropensci-archive
- License: other
- Archived: true
- Created: 2015-01-13T23:25:24.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2022-09-09T09:06:08.000Z (over 2 years ago)
- Last Synced: 2024-12-24T16:22:02.291Z (about 1 month ago)
- Topics: biodiversity, darwin-core, darwincore, gbif, r, r-package, rstats
- Language: R
- Homepage:
- Size: 358 KB
- Stars: 17
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README-not.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
finch
=====[![R-check](https://github.com/ropensci/finch/workflows/R-check/badge.svg)](https://github.com/ropensci/finch/actions?query=workflow%3AR-check)
[![cran checks](https://cranchecks.info/badges/worst/finch)](https://cranchecks.info/pkgs/finch)
[![codecov](https://codecov.io/gh/ropensci/finch/branch/master/graph/badge.svg)](https://codecov.io/gh/ropensci/finch)
[![cran version](https://www.r-pkg.org/badges/version/finch)](https://cran.r-project.org/package=finch)`finch` parses Darwin Core simple and archive files
Docs:
* Darwin Core description at Biodiversity Information Standards site
* Darwin Core at Wikipedia## Install
Stable version
```r
install.packages("finch")
```Development version, from GitHub
```r
remotes::install_github("ropensci/finch")
``````r
library("finch")
```## Parse
To parse a simple darwin core file like
```
urn:catalog:YPM:VP.057488
PhysicalObject
2009-02-12T12:43:31
en
FossilSpecimen
YPM
VP
VP.057488
1
North America
United States
US
Montana
Garfield
Tyrannosourus rex
Tyrannosourus
rex
Creataceous
Creataceous
Late Cretaceous
Late Cretaceous
```
This file is in this package as an example file, get the file, then `simple()`
```r
file <- system.file("examples", "example_simple_fossil.xml", package = "finch")
out <- simple_read(file)
```Index to `meta`, `dc` or `dwc`
```r
out$dc
#> [[1]]
#> [[1]]$type
#> [1] "PhysicalObject"
#>
#>
#> [[2]]
#> [[2]]$modified
#> [1] "2009-02-12T12:43:31"
#>
#>
#> [[3]]
#> [[3]]$language
#> [1] "en"
```## Parse Darwin Core Archive
To parse a Darwin Core Archive like can be gotten from GBIF use `dwca_read()`
There's an example Darwin Core Archive:
```r
file <- system.file("examples", "0000154-150116162929234.zip", package = "finch")
(out <- dwca_read(file, read = TRUE))
#>
#> Package ID: 6cfaaf9c-d518-4ca3-8dc5-f5aadddc0390
#> No. data sources: 10
#> No. datasets: 3
#> Dataset occurrence.txt: [225 X 443]
#> Dataset multimedia.txt: [15 X 1]
#> Dataset verbatim.txt: [209 X 443]
```List files in the archive
```r
out$files
#> $xml_files
#> [1] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/meta.xml"
#> [2] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/metadata.xml"
#>
#> $txt_files
#> [1] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/citations.txt"
#> [2] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/multimedia.txt"
#> [3] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/occurrence.txt"
#> [4] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/rights.txt"
#> [5] "/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/verbatim.txt"
...
```High level metadata for the whole archive
```r
out$emlmeta
#> additionalMetadata:
#> metadata:
#> gbif:
#> citation:
#> identifier: 0000154-150116162929234
#> citation: GBIF Occurrence Download 0000154-150116162929234
#> physical:
#> objectName: []
#> characterEncoding: UTF-8
#> dataFormat:
#> externallyDefinedFormat:
#> formatName: Darwin Core Archive
#> distribution:
#> online:
#> url:
#> function: download
#> url: http://api.gbif.org/v1/occurrence/download/request/0000154-150116162929234.zip
#> dataset:
#> title: GBIF Occurrence Download 0000154-150116162929234
#> creator:
...
```High level metadata for each data file, there's many files, but we'll just look at one
```r
hm <- out$highmeta
head( hm$occurrence.txt )
#> index term delimitedBy
#> 1 0 http://rs.gbif.org/terms/1.0/gbifID
#> 2 1 http://purl.org/dc/terms/abstract
#> 3 2 http://purl.org/dc/terms/accessRights
#> 4 3 http://purl.org/dc/terms/accrualMethod
#> 5 4 http://purl.org/dc/terms/accrualPeriodicity
#> 6 5 http://purl.org/dc/terms/accrualPolicy
```You can get the same metadata as above for each dataset that went into the tabular dataset downloaded
```r
out$dataset_meta[[1]]
```View one of the datasets, brief overview.
```r
head( out$data[[1]][,c(1:5)] )
#> gbifID abstract accessRights accrualMethod accrualPeriodicity
#> 1 50280003 NA NA NA
#> 2 477550574 NA NA NA
#> 3 239703844 NA NA NA
#> 4 239703843 NA NA NA
#> 5 239703833 NA NA NA
#> 6 477550692 NA NA NA
```You can also give `dwca()` a local directory, or url that contains a Darwin Core Archive.
## Meta
* Please [report any issues or bugs](https://github.com/ropensci/finch/issues).
* License: MIT
* Get citation information for `finch` in R doing `citation(package = 'finch')`
* Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.[![rofooter](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)