{"id":13724554,"url":"https://github.com/brunobrr/bdc","last_synced_at":"2026-02-21T15:32:02.520Z","repository":{"id":39231728,"uuid":"299708814","full_name":"brunobrr/bdc","owner":"brunobrr","description":"Check out the vignettes with detailed documentation on each module of the bdc package","archived":false,"fork":false,"pushed_at":"2026-01-27T14:03:19.000Z","size":187598,"stargazers_count":24,"open_issues_count":7,"forks_count":10,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-02-04T08:03:44.174Z","etag":null,"topics":["bdc","biodiversity-data","workflow"],"latest_commit_sha":null,"homepage":"https://brunobrr.github.io/bdc","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brunobrr.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-09-29T18:48:10.000Z","updated_at":"2026-01-27T14:00:43.000Z","dependencies_parsed_at":"2023-09-21T19:29:37.759Z","dependency_job_id":"5648f85a-5b13-4965-a696-05a992c3dffa","html_url":"https://github.com/brunobrr/bdc","commit_stats":{"total_commits":864,"total_committers":8,"mean_commits":108.0,"dds":"0.49884259259259256","last_synced_commit":"dcab5177c32f7e514445b8116320aa10904a5d74"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/brunobrr/bdc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brunobrr%2Fbdc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brunobrr%2Fbdc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brunobrr%2Fbdc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brunobrr%2Fbdc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brunobrr","download_url":"https://codeload.github.com/brunobrr/bdc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brunobrr%2Fbdc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29684478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T14:31:22.911Z","status":"ssl_error","status_checked_at":"2026-02-21T14:31:22.570Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bdc","biodiversity-data","workflow"],"created_at":"2024-08-03T01:01:59.207Z","updated_at":"2026-02-21T15:32:02.481Z","avatar_url":"https://github.com/brunobrr.png","language":"R","funding_links":[],"categories":["Biosphere"],"sub_categories":["Biodiversity Data Cleaning and Standardization"],"readme":"---\noutput: github_document\neditor_options: \n  markdown: \n    wrap: 80\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# ***bdc*** \u003ca href='https://github.com/brunobrr/bdc'\u003e\u003cimg src=\"https://raw.githubusercontent.com/brunobrr/bdc/master/man/figures/logo.png\" align=\"right\" width=\"155\"/\u003e\u003c/a\u003e\n\n## **A toolkit for standardizing, integrating, and cleaning biodiversity data**\n\n\u003c!-- badges: start --\u003e\n\n[![CRAN\nstatus](https://www.r-pkg.org/badges/version/bdc)](https://CRAN.R-project.org/package=bdc)\n[![downloads](https://cranlogs.r-pkg.org/badges/grand-total/bdc)](https://cranlogs.r-pkg.org:443/badges/grand-total/bdc)\n\u003c!-- [![rstudio mirror --\u003e\n\u003c!-- downloads](https://cranlogs.r-pkg.org/badges/bdc)](https://cranlogs.r-pkg.org:443/badges/bdc) --\u003e\n[![R-CMD-check](https://github.com/brunobrr/bdc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/brunobrr/bdc/actions/workflows/R-CMD-check.yaml)\n[![Codecov test\ncoverage](https://codecov.io/gh/brunobrr/bdc/branch/master/graph/badge.svg?token=9AUF86G9LJ)](https://app.codecov.io/gh/brunobrr/bdc)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6450390.svg)](https://doi.org/10.5281/zenodo.6450390)\n[![License](https://img.shields.io/badge/license-GPL%20(%3E=%203)-lightgrey.svg?style=flat)](http://www.gnu.org/licenses/gpl-3.0.html)\n\n\u003c!-- badges: end --\u003e\n\n#### **Overview**\n\nHandle biodiversity data from several different sources is not an easy task.\nHere, we present the **B**iodiversity **D**ata **C**leaning (*bdc*), an R\npackage to address quality issues and improve the fitness-for-use of biodiversity\ndatasets. *bdc* contains functions to harmonize and integrate data from\ndifferent sources following common standards and protocols, and implements\nvarious tests and tools to flag, document, clean, and correct taxonomic,\nspatial, and temporal data.\n\nCompared to other available R packages, the main strengths of the *bdc* package\nare that it brings together available tools – and a series of new ones – to\nassess the quality of different dimensions of biodiversity data into a single\nand flexible toolkit. The functions can be applied to a multitude of taxonomic\ngroups, datasets (including regional or local repositories), countries, or\nworldwide.\n\n#### **Structure of *bdc***\n\nThe *bdc* toolkit is organized in thematic modules related to different\nbiodiversity dimensions.\n\n--------------------------------------------------------------------------------\n\n\u003e :warning: The modules illustrated, and **functions** within, **were linked to\n\u003e form** a proposed reproducible **workflow** (see\n\u003e [**vignettes**](https://brunobrr.github.io/bdc/)). However, all functions\n\u003e **can also be executed independently**.\n\n--------------------------------------------------------------------------------\n\n#### ![](https://raw.githubusercontent.com/brunobrr/bdc/master/inst/extdata/icon_vignettes/Figure1.png)\n\n\u003cbr/\u003e\n\n#### 1. [**Merge databases**](https://brunobrr.github.io/bdc/articles/integrate_datasets.html)\n\nStandardization and integration of different datasets into a standard database.\n\n-   `bdc_standardize_datasets()` Standardization and integration of different\n    datasets into a new dataset with column names following Darwin Core\n    terminology\n\n#### 2. [**Pre-filter**](https://brunobrr.github.io/bdc/articles/prefilter.html)\n\nFlagging and removal of invalid or non-interpretable information, followed by\ndata amendments (e.g., correct transposed coordinates and standardize country\nnames).\n\n-   `bdc_scientificName_empty()` Identification of records lacking names or with\n    names not interpretable\n-   `bdc_coordinates_empty()` Identification of records lacking information on\n    latitude or longitude\n-   `bdc_coordinates_outOfRange()` Identification of records with out-of-range\n    coordinates (latitude \\\u003e 90 or -90; longitude \\\u003e180 or -180)\n-   `bdc_basisOfRecords_notStandard()` Identification of records from doubtful\n    sources (e.g., fossil or machine observation) impossible to interpret and\n    not compatible with Darwin Core recommended vocabulary\n-   `bdc_country_from_coordinates()` Derive country name from valid geographic\n    coordinates\n-   `bdc_country_standardized()` Standardization of country names and retrieve\n    country code\n-   `bdc_coordinates_transposed()` Identification of records with potentially\n    transposed latitude and longitude\n-   `bdc_coordinates_country_inconsistent()` Identification of coordinates in\n    other countries or far from a specified distance from the coast of a\n    reference country (i.e., in the ocean)\n-   `bdc_coordinates_from_locality()` Identification of records lacking\n    coordinates but with a detailed description of the locality associate with\n    records from which coordinates can be derived\n\n#### 3. [**Taxonomy**](https://brunobrr.github.io/bdc/articles/taxonomy.html)\n\nCleaning, parsing, and harmonization of scientific names against multiple\ntaxonomic references.\n\n-   `bdc_clean_names()` Name-checking routines to clean and split a taxonomic\n    name into its binomial and authority components\n-   `bdc_query_names_taxadb()` Harmonization of scientific names by correcting\n    spelling errors and converting nomenclatural synonyms to currently accepted\n    names.\n-   `bdc_filter_out_names()` Function used to filter out records according to\n    their taxonomic status present in the column \"notes\". For example, to filter\n    only valid accepted names categorized as \"accepted\"\n\n#### 4. [**Space**](https://brunobrr.github.io/bdc/articles/space.html)\n\nFlagging of erroneous, suspicious, and low-precision geographic coordinates.\n\n-   `bdc_coordinates_precision()` Identification of records with a coordinate\n    precision below a specified number of decimal places\n-   `clean_coordinates()` (From *CoordinateCleaner* package and part of the\n    data-cleaning workflow). Identification of potentially problematic\n    geographic coordinates based on geographic gazetteers and metadata. Include\n    tests for flagging records: around country capitals or country or province\n    centroids, duplicated, with equal coordinates, around biodiversity\n    institutions, within urban areas, plain zeros in the coordinates, and\n    suspect geographic outliers\n\n#### 5. [**Time**](https://brunobrr.github.io/bdc/articles/time.html)\n\nFlagging and, whenever possible, correction of inconsistent collection date.\n\n-   `bdc_eventDate_empty()` Identification of records lacking information on\n    event date (i.e., when a record was collected or observed)\n-   `bdc_year_outOfRange()` Identification of records with illegitimate or\n    potentially imprecise collecting year. The year provided can be out-of-range\n    (e.g., in the future) or collected before a specified year supplied by the\n    user (e.g., 1900)\n-   `bdc_year_from_eventDate()` This function extracts four-digit year from\n    unambiguously interpretable collecting dates\n\n#### [**Other functions**](https://brunobrr.github.io/bdc/reference/index.html)\n\nAim to facilitate the **documentation, visualization, and interpretation** of\nresults of data quality tests the package contains functions for documenting the\nresults of the data-cleaning tests, including functions for saving i) records\nneeding further inspection, ii) figures, and iii) data-quality reports.\n\n-   `bdc_create_report()` Creation of data-quality reports documenting the\n    results of data-quality tests and the taxonomic harmonization process\n-   `bdc_create_figures()` Creation of figures (i.e., bar plots and maps)\n    reporting the results of data-quality tests\n-   `bdc_filter_out_flags()` Removal of columns containing the results of data\n    quality tests (i.e., column starting with \".\") or other columns specified\n-   `bdc_quickmap()` Creation of a map of points using ggplot2. Helpful in\n    inspecting the results of data-cleaning tests\n-   `bdc_summary_col()` This function creates or updates the column summarizing\n    the results of data quality tests (i.e., the column \".summary\")\n\n#### **Installation**\n\n```{r eval=FALSE}\ninstall.packages(\"bdc\")\nlibrary(bdc)\n```\n\nor the development version from [GitHub](https://github.com/brunobrr/bdc) using:\n\n```{r, message=FALSE, warning=FALSE,echo=TRUE,eval=FALSE}\ninstall.packages(\"remotes\")\nremotes::install_github(\"brunobrr/bdc\")\n```\n\nLoad the package with:\n\n```{r, message=FALSE, warning=FALSE,echo=TRUE,eval=TRUE}\nlibrary(bdc)\n```\n\n#### **Package website**\n\nSee *bdc* package website (\u003chttps://brunobrr.github.io/bdc/\u003e) for detailed\nexplanation on each module.\n\n#### **Getting help**\n\n\u003e If you encounter a clear bug, please file an issue\n\u003e [**here**](https://github.com/brunobrr/bdc/issues). For questions or\n\u003e suggestion, please send us a email (ribeiro.brr\\@gmail.com).\n\n#### **Citation**\n\nRibeiro, BR; Velazco, SJE; Guidoni-Martins, K; Tessarolo, G; Jardim, Lucas;\nBachman, SP; Loyola, R (2022). bdc: A toolkit for standardizing, integrating,\nand cleaning biodiversity data. Methods in Ecology and Evolution.\n[doi.org/10.1111/2041-210X.13868](https://doi.org/10.1111/2041-210X.13868)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrunobrr%2Fbdc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrunobrr%2Fbdc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrunobrr%2Fbdc/lists"}