{"id":32207065,"url":"https://github.com/ropensci/dwctaxon","last_synced_at":"2026-02-22T19:05:25.740Z","repository":{"id":61844677,"uuid":"434126221","full_name":"ropensci/dwctaxon","owner":"ropensci","description":"R package for working with Darwin Core Taxon data","archived":false,"fork":false,"pushed_at":"2025-12-15T02:06:55.000Z","size":6530,"stargazers_count":9,"open_issues_count":13,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-08T10:00:36.228Z","etag":null,"topics":["database","r-package","rstats"],"latest_commit_sha":null,"homepage":"https://docs.ropensci.org/dwctaxon/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2021-12-02T07:40:14.000Z","updated_at":"2026-01-18T12:54:39.000Z","dependencies_parsed_at":"2023-12-06T05:26:49.074Z","dependency_job_id":"67470ef9-245b-4775-9f9d-4a8063e5c788","html_url":"https://github.com/ropensci/dwctaxon","commit_stats":null,"previous_names":["ropensci/dwctaxon"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/ropensci/dwctaxon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fdwctaxon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fdwctaxon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fdwctaxon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fdwctaxon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/dwctaxon/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fdwctaxon/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29647768,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T09:27:29.698Z","status":"ssl_error","status_checked_at":"2026-02-20T09:26:12.373Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","r-package","rstats"],"created_at":"2025-10-22T05:44:29.210Z","updated_at":"2026-02-22T19:05:25.735Z","avatar_url":"https://github.com/ropensci.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\neditor_options: \n  chunk_output_type: console\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/\"\n)\n\n# Increase width for printing tibbles\nold \u003c- options(width = 140)\n\nset.seed(12345)\n```\n\n# dwctaxon \u003cimg src=\"man/figures/logo.png\" align=\"right\" alt=\"\" width=\"120\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![DOI](https://zenodo.org/badge/434126221.svg)](https://zenodo.org/badge/latestdoi/434126221)\n[![runiverse](https://ropensci.r-universe.dev/badges/dwctaxon)](https://ropensci.r-universe.dev/dwctaxon)\n[![Codecov test coverage](https://codecov.io/gh/ropensci/dwctaxon/branch/main/graph/badge.svg)](https://app.codecov.io/gh/ropensci/dwctaxon?branch=main)\n[![pkgcheck](https://github.com/ropensci/dwctaxon/workflows/pkgcheck/badge.svg)](https://github.com/ropensci/dwctaxon/actions?query=workflow%3Apkgcheck)\n[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/574_status.svg)](https://github.com/ropensci/software-review/issues/574)\n[![JOSS](https://joss.theoj.org/papers/10.21105/joss.06215/status.svg)](https://doi.org/10.21105/joss.06215)\n\u003c!-- badges: end --\u003e\n\nThe goal of dwctaxon is to facilitate working with [Darwin Core Taxon data](https://dwc.tdwg.org/terms/#taxon) in R.\n\n## Statement of need\n\ndwctaxon facilitates **editing** and **validating** Darwin Core Taxon data. There are various reasons one might want to do this. Here is a non-exhaustive list of use-cases for dwctaxon:\n\n- To maintain an existing taxonomic database.\n- To prepare a taxonomic database as a reference for taxonomic name resolution, for example with the [taxastand](https://github.com/joelnitta/taxastand) or [U.Taxonstand](https://doi.org/10.1016/j.pld.2022.09.001) R packages.\n- To curate taxonomic data as part of a [Darwin Core Archive](https://en.wikipedia.org/wiki/Darwin_Core_Archive).\n\nIn theory, dwctaxon could be used to create taxonomic databases from scratch, but it is more likely to be useful for updating and validating existing databases (R in general is more suited to data wrangling and analysis as opposed to data entry).\n\n## Resources\n\nFor detailed usage examples, see the vignettes:\n\n- [What is DwC?](https://docs.ropensci.org/dwctaxon/articles/what-is-dwc.html)\n- [Editing DwC taxon data](https://docs.ropensci.org/dwctaxon/articles/editing.html)\n- [Validating DwC taxon data](https://docs.ropensci.org/dwctaxon/articles/validation.html)\n- [Real World Example](https://docs.ropensci.org/dwctaxon/articles/real-data.html)\n\nFor more information about dwctaxon, in particular for using it to maintain a reference database for taxonomic name resolution, see [taxastand and dwctaxon: A pair of R packages for standardizing species names in Darwin Core format (BioDigiCon 2022 talk)](https://www.joelnitta.com/talks/2022-09-27_biodigi.html).\n\n## Installation\n\nThe stable version can be installed from [CRAN](https://cran.r-project.org/package=dwctaxon):\n\n```r\ninstall.packages(\"dwctaxon\")\n```\n\nThe development version can be installed from [r-universe](https://ropensci.r-universe.dev/dwctaxon) or [github](https://github.com/ropensci/dwctaxon).\n\n``` r\noptions(repos = c(\n  ropensci = \"https://ropensci.r-universe.dev/\", \n  CRAN = \"https://cran.rstudio.com/\"\n))\ninstall.packages(\"dwctaxon\", dep = TRUE)\n```\n\nOR\n\n``` r\n# install.packages(\"remotes\")\nremotes::install_github(\"ropensci/dwctaxon\")\n```\n\n## Usage\n\nFirst, load packages and a dataset to work with:\n\n```{r load-pkg-data, message = FALSE}\nlibrary(tibble) # recommended for pretty printing of tibbles\nlibrary(dwctaxon)\n\ndct_filmies\n```\n\n`dct_filmies` is a taxonomic dataset of filmy ferns included in dwctaxon.\n\nFor demonstration purposes, we will just use the first five rows:\n\n```{r filmies-small}\nfilmies_small \u003c- head(dct_filmies, 5)\n```\n\nAll functions in dwctaxon start with `dct_`.\n\n### Edit data\n\n`dct_add_row()` adds one or more rows, automatically providing values for `taxonID`.\n\n```{r add-row}\nfilmies_small |\u003e\n  dct_add_row(\n    scientificName = \"Hymenophyllum dwctaxonense Nitta\",\n    taxonomicStatus = \"accepted\"\n  )\n```\n\n`dct_modify_row()` modifies a row, automatically re-mapping synonyms if needed.\n\n```{r modify-row}\n# Change C. densinervium to a synonym of C. crassum\nfilmies_small |\u003e\n  dct_modify_row(\n    scientificName = \"Cephalomanes densinervium (Copel.) Copel.\",\n    taxonomicStatus = \"synonym\",\n    acceptedNameUsage = \"Cephalomanes crassum (Copel.) M. G. Price\"\n  )\n```\n\n`dct_fill_col()` fills in values for columns that have \"term\" - \"termID\" pairs (e.g., `acceptedNameUsage` and `acceptedNameUsageID`).\n\n```{r fill-col}\n# Fill-in the acceptedNameUsage column with scientific names\nfilmies_small |\u003e\n  dct_fill_col(\n    fill_to = \"acceptedNameUsage\",\n    fill_from = \"scientificName\",\n    match_to = \"taxonID\",\n    match_from = \"acceptedNameUsageID\"\n  )\n```\n\n### Validate data\n\n`dct_validate()` is the main function for validation, and automatically conducts a series of checks. The individual checks can be run with `dct_check_*()` functions.\n\nThe `dct_filmies` dataset is already well-formatted, so it will pass validation:\n\n```{r validate-pass}\n# Default behavior is to return the original dataset if checks pass\n# For this example, return TRUE instead\ndct_validate(dct_filmies, on_success = \"logical\")\n```\n\nFor demonstration purposes, let's mess up the data:\n\n```{r make-dirty-filmies}\n# Start by duplicating some data\nfilmies_dirty \u003c- rbind(head(dct_filmies), head(dct_filmies, 2))\n# Replace some values of `acceptedNameUsageID` with random letters\nfilmies_dirty$acceptedNameUsageID[sample(1:8, 5)] \u003c- sample(letters, 5)\n```\n\nBy default, `dct_validate()` will stop with an error on the first check that fails:\n\n```{r validate-error, error = TRUE}\ndct_validate(filmies_dirty)\n```\n\nBut it may be useful to get an overview of all the checks that failed. This can be done by setting `on_fail` to `\"summary\"`:\n\n```{r validate-summary-show, eval = FALSE, echo = TRUE}\ndct_validate(filmies_dirty, on_fail = \"summary\")\n```\n\n```{r validate-summary-print, echo = FALSE, message = FALSE}\ndct_validate(filmies_dirty, on_fail = \"summary\") |\u003e\n  dplyr::mutate(error = stringr::str_trunc(error, 40, \"right\"))\n```\n\n### Piping\n\nAll the functions in dwctaxon take a dataframe as their first argument and return a dataframe by default, so they are \"pipe-friendly\" and can be chained together:\n\n```{r pipe}\ndct_filmies |\u003e\n  dct_modify_row(\n    taxonID = \"54133783\",\n    taxonomicStatus = \"accepted\"\n  ) |\u003e\n  dct_add_row(\n    scientificName = \"Hymenophyllum dwctaxonense Nitta\",\n    taxonomicStatus = \"accepted\"\n  ) |\u003e\n  dct_validate()\n```\n\nIt's often a good idea to include `dct_validate()` at the end of a chain to make sure the modified taxonomic database is still correctly formatted.\n\n## Citing this package\n\nIf you use this package, please cite it!\n\n    Nitta, JH and Iwasaki, W (2024). dwctaxon, an R package for editing and validating taxonomic data in Darwin Core format. Journal of Open Source Software, 9(93), 6215, https://doi.org/10.21105/joss.06215\n\n## Contributing\n\nContributions to this package are welcome! Please see the [Contribution Guide](https://github.com/ropensci/dwctaxon/blob/main/.github/CONTRIBUTING.md) and [Code of Conduct](https://ropensci.org/code-of-conduct/).\n\n## Note to developers\n\n[roxyglobals](https://github.com/anthonynorth/roxyglobals) is used to maintain [`R/globals.R`](R/globals.R), but is not available on CRAN. You will need to install this package from github and use the `@autoglobal` or `@global` roxygen tags to develop functions with globals.\n\n## Licenses\n\nCode: [MIT License](https://github.com/ropensci/dwctaxon/blob/main/LICENSE.md)\n\nData: \n\n- [`dct_filmies`](https://docs.ropensci.org/dwctaxon/reference/dct_filmies.html): Modified from data downloaded from the [Catalog of Life](https://www.catalogueoflife.org/) under the [Creative Commons Attribution (CC BY) 4.0](https://creativecommons.org/licenses/by/4.0/) license.\n- [`dct_terms`](https://docs.ropensci.org/dwctaxon/reference/dct_terms.html): Modified from data downloaded from [TDWG Darwin Core](https://dwc.tdwg.org/) under the [Creative Commons Attribution (CC BY)4.0](https://creativecommons.org/licenses/by/4.0/) license.\n\nImages:\n\n- [DwC archive components image](https://docs.ropensci.org/dwctaxon/articles/dwca.png): Copied from [GBIF Integrated Publishing Toolkit (IPT)](https://github.com/gbif/ipt/) under the [Apache license](https://github.com/gbif/ipt/blob/master/LICENSE.txt)\n\n```{r, include = FALSE}\n# Reset options\noptions(old)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fdwctaxon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2Fdwctaxon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fdwctaxon/lists"}