{"id":15165910,"url":"https://github.com/ropensci-archive/finch","last_synced_at":"2025-12-12T00:34:38.593Z","repository":{"id":25778483,"uuid":"29216860","full_name":"ropensci-archive/finch","owner":"ropensci-archive","description":":warning: ARCHIVED :warning: Read Darwin Core Archive files","archived":true,"fork":false,"pushed_at":"2022-09-09T09:06:08.000Z","size":367,"stargazers_count":17,"open_issues_count":0,"forks_count":4,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-08-28T19:30:51.850Z","etag":null,"topics":["biodiversity","darwin-core","darwincore","gbif","r","r-package","rstats"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci-archive.png","metadata":{"files":{"readme":"README-not.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-13T23:25:24.000Z","updated_at":"2023-01-27T21:27:19.000Z","dependencies_parsed_at":"2022-08-24T14:14:22.873Z","dependency_job_id":null,"html_url":"https://github.com/ropensci-archive/finch","commit_stats":null,"previous_names":["ropensci/finch"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/ropensci-archive/finch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci-archive%2Ffinch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci-archive%2Ffinch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci-archive%2Ffinch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci-archive%2Ffinch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci-archive","download_url":"https://codeload.github.com/ropensci-archive/finch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci-archive%2Ffinch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274188915,"owners_count":25237853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biodiversity","darwin-core","darwincore","gbif","r","r-package","rstats"],"created_at":"2024-09-27T04:06:10.388Z","updated_at":"2025-09-30T20:31:51.129Z","avatar_url":"https://github.com/ropensci-archive.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"finch\n=====\n\n\n\n[![R-check](https://github.com/ropensci/finch/workflows/R-check/badge.svg)](https://github.com/ropensci/finch/actions?query=workflow%3AR-check)\n[![cran checks](https://cranchecks.info/badges/worst/finch)](https://cranchecks.info/pkgs/finch)\n[![codecov](https://codecov.io/gh/ropensci/finch/branch/master/graph/badge.svg)](https://codecov.io/gh/ropensci/finch)\n[![cran version](https://www.r-pkg.org/badges/version/finch)](https://cran.r-project.org/package=finch)\n\n`finch` parses Darwin Core simple and archive files\n\nDocs: \u003chttps://docs.ropensci.org/finch/\u003e\n\n* Darwin Core description at Biodiversity Information Standards site \u003chttp://rs.tdwg.org/dwc.htm\u003e\n* Darwin Core at Wikipedia \u003chttps://en.wikipedia.org/wiki/Darwin_Core\u003e\n\n## Install\n\nStable version\n\n\n```r\ninstall.packages(\"finch\")\n```\n\nDevelopment version, from GitHub\n\n\n```r\nremotes::install_github(\"ropensci/finch\")\n```\n\n\n```r\nlibrary(\"finch\")\n```\n\n## Parse\n\nTo parse a simple darwin core file like\n\n```\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003cSimpleDarwinRecordSet\n xmlns=\"http://rs.tdwg.org/dwc/xsd/simpledarwincore/\"\n xmlns:dc=\"http://purl.org/dc/terms/\"\n xmlns:dwc=\"http://rs.tdwg.org/dwc/terms/\"\n xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n xsi:schemaLocation=\"http://rs.tdwg.org/dwc/xsd/simpledarwincore/ ../../xsd/tdwg_dwc_simple.xsd\"\u003e\n \u003cSimpleDarwinRecord\u003e\n  \u003cdwc:occurrenceID\u003eurn:catalog:YPM:VP.057488\u003c/dwc:occurrenceID\u003e\n  \u003cdc:type\u003ePhysicalObject\u003c/dc:type\u003e\n  \u003cdc:modified\u003e2009-02-12T12:43:31\u003c/dc:modified\u003e\n  \u003cdc:language\u003een\u003c/dc:language\u003e\n  \u003cdwc:basisOfRecord\u003eFossilSpecimen\u003c/dwc:basisOfRecord\u003e\n  \u003cdwc:institutionCode\u003eYPM\u003c/dwc:institutionCode\u003e\n  \u003cdwc:collectionCode\u003eVP\u003c/dwc:collectionCode\u003e\n  \u003cdwc:catalogNumber\u003eVP.057488\u003c/dwc:catalogNumber\u003e\n  \u003cdwc:individualCount\u003e1\u003c/dwc:individualCount\u003e\n  \u003cdwc:locationID xsi:nil=\"true\"/\u003e\n  \u003cdwc:continent\u003eNorth America\u003c/dwc:continent\u003e\n  \u003cdwc:country\u003eUnited States\u003c/dwc:country\u003e\n  \u003cdwc:countryCode\u003eUS\u003c/dwc:countryCode\u003e\n  \u003cdwc:stateProvince\u003eMontana\u003c/dwc:stateProvince\u003e\n  \u003cdwc:county\u003eGarfield\u003c/dwc:county\u003e\n  \u003cdwc:scientificName\u003eTyrannosourus rex\u003c/dwc:scientificName\u003e\n  \u003cdwc:genus\u003eTyrannosourus\u003c/dwc:genus\u003e\n  \u003cdwc:specificEpithet\u003erex\u003c/dwc:specificEpithet\u003e\n  \u003cdwc:earliestPeriodOrHighestSystem\u003eCreataceous\u003c/dwc:earliestPeriodOrHighestSystem\u003e\n  \u003cdwc:latestPeriodOrHighestSystem\u003eCreataceous\u003c/dwc:latestPeriodOrHighestSystem\u003e\n  \u003cdwc:earliestEonOrHighestEonothem\u003eLate Cretaceous\u003c/dwc:earliestEonOrHighestEonothem\u003e\n  \u003cdwc:latestEonOrHighestEonothem\u003eLate Cretaceous\u003c/dwc:latestEonOrHighestEonothem\u003e\n \u003c/SimpleDarwinRecord\u003e\n\u003c/SimpleDarwinRecordSet\u003e\n```\n\nThis file is in this package as an example file, get the file, then `simple()`\n\n\n```r\nfile \u003c- system.file(\"examples\", \"example_simple_fossil.xml\", package = \"finch\")\nout \u003c- simple_read(file)\n```\n\nIndex to `meta`, `dc` or `dwc`\n\n\n```r\nout$dc\n#\u003e [[1]]\n#\u003e [[1]]$type\n#\u003e [1] \"PhysicalObject\"\n#\u003e \n#\u003e \n#\u003e [[2]]\n#\u003e [[2]]$modified\n#\u003e [1] \"2009-02-12T12:43:31\"\n#\u003e \n#\u003e \n#\u003e [[3]]\n#\u003e [[3]]$language\n#\u003e [1] \"en\"\n```\n\n## Parse Darwin Core Archive\n\nTo parse a Darwin Core Archive like can be gotten from GBIF use `dwca_read()`\n\nThere's an example Darwin Core Archive:\n\n\n```r\nfile \u003c- system.file(\"examples\", \"0000154-150116162929234.zip\", package = \"finch\")\n(out \u003c- dwca_read(file, read = TRUE))\n#\u003e \u003cgbif dwca\u003e\n#\u003e   Package ID: 6cfaaf9c-d518-4ca3-8dc5-f5aadddc0390\n#\u003e   No. data sources: 10\n#\u003e   No. datasets: 3\n#\u003e   Dataset occurrence.txt: [225 X 443]\n#\u003e   Dataset multimedia.txt: [15 X 1]\n#\u003e   Dataset verbatim.txt: [209 X 443]\n```\n\nList files in the archive\n\n\n```r\nout$files\n#\u003e $xml_files\n#\u003e [1] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/meta.xml\"    \n#\u003e [2] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/metadata.xml\"\n#\u003e \n#\u003e $txt_files\n#\u003e [1] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/citations.txt\" \n#\u003e [2] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/multimedia.txt\"\n#\u003e [3] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/occurrence.txt\"\n#\u003e [4] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/rights.txt\"    \n#\u003e [5] \"/Users/sckott/Library/Caches/R/finch/0000154-150116162929234/verbatim.txt\"  \n...\n```\n\nHigh level metadata for the whole archive\n\n\n```r\nout$emlmeta\n#\u003e additionalMetadata:\n#\u003e   metadata:\n#\u003e     gbif:\n#\u003e       citation:\n#\u003e         identifier: 0000154-150116162929234\n#\u003e         citation: GBIF Occurrence Download 0000154-150116162929234\n#\u003e       physical:\n#\u003e         objectName: []\n#\u003e         characterEncoding: UTF-8\n#\u003e         dataFormat:\n#\u003e           externallyDefinedFormat:\n#\u003e             formatName: Darwin Core Archive\n#\u003e         distribution:\n#\u003e           online:\n#\u003e             url:\n#\u003e               function: download\n#\u003e               url: http://api.gbif.org/v1/occurrence/download/request/0000154-150116162929234.zip\n#\u003e dataset:\n#\u003e   title: GBIF Occurrence Download 0000154-150116162929234\n#\u003e   creator:\n...\n```\n\nHigh level metadata for each data file, there's many files, but we'll just look at one\n\n\n```r\nhm \u003c- out$highmeta\nhead( hm$occurrence.txt )\n#\u003e   index                                        term delimitedBy\n#\u003e 1     0         http://rs.gbif.org/terms/1.0/gbifID        \u003cNA\u003e\n#\u003e 2     1           http://purl.org/dc/terms/abstract        \u003cNA\u003e\n#\u003e 3     2       http://purl.org/dc/terms/accessRights        \u003cNA\u003e\n#\u003e 4     3      http://purl.org/dc/terms/accrualMethod        \u003cNA\u003e\n#\u003e 5     4 http://purl.org/dc/terms/accrualPeriodicity        \u003cNA\u003e\n#\u003e 6     5      http://purl.org/dc/terms/accrualPolicy        \u003cNA\u003e\n```\n\nYou can get the same metadata as above for each dataset that went into the tabular dataset downloaded\n\n\n```r\nout$dataset_meta[[1]]\n```\n\nView one of the datasets, brief overview.\n\n\n```r\nhead( out$data[[1]][,c(1:5)] )\n#\u003e      gbifID abstract accessRights accrualMethod accrualPeriodicity\n#\u003e 1  50280003       NA                         NA                 NA\n#\u003e 2 477550574       NA                         NA                 NA\n#\u003e 3 239703844       NA                         NA                 NA\n#\u003e 4 239703843       NA                         NA                 NA\n#\u003e 5 239703833       NA                         NA                 NA\n#\u003e 6 477550692       NA                         NA                 NA\n```\n\nYou can also give `dwca()` a local directory, or url that contains a Darwin Core Archive.\n\n## Meta\n\n* Please [report any issues or bugs](https://github.com/ropensci/finch/issues).\n* License: MIT\n* Get citation information for `finch` in R doing `citation(package = 'finch')`\n* Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.\n\n[![rofooter](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci-archive%2Ffinch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci-archive%2Ffinch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci-archive%2Ffinch/lists"}