{"id":22758210,"url":"https://github.com/ohdsi/dataqualitydashboard","last_synced_at":"2026-05-18T03:14:27.614Z","repository":{"id":37444647,"uuid":"193792186","full_name":"OHDSI/DataQualityDashboard","owner":"OHDSI","description":"A tool to help improve data quality standards in observational data science.","archived":false,"fork":false,"pushed_at":"2025-12-21T17:41:22.000Z","size":17006,"stargazers_count":169,"open_issues_count":55,"forks_count":107,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-12-22T22:22:51.168Z","etag":null,"topics":["data-quality"],"latest_commit_sha":null,"homepage":"https://ohdsi.github.io/DataQualityDashboard","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OHDSI.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-06-25T22:35:04.000Z","updated_at":"2025-12-21T17:41:29.000Z","dependencies_parsed_at":"2023-10-20T22:09:43.692Z","dependency_job_id":"f61ed884-6f78-4b6b-a460-18588d74408d","html_url":"https://github.com/OHDSI/DataQualityDashboard","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/OHDSI/DataQualityDashboard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OHDSI%2FDataQualityDashboard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OHDSI%2FDataQualityDashboard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OHDSI%2FDataQualityDashboard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OHDSI%2FDataQualityDashboard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OHDSI","download_url":"https://codeload.github.com/OHDSI/DataQualityDashboard/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OHDSI%2FDataQualityDashboard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28337617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T06:09:07.588Z","status":"ssl_error","status_checked_at":"2026-01-12T06:05:18.301Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-quality"],"created_at":"2024-12-11T08:13:43.270Z","updated_at":"2026-05-18T03:14:27.603Z","avatar_url":"https://github.com/OHDSI.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"DataQualityDashboard\n====================\n\n\u003c!-- badges: start --\u003e\n[![codecov](https://codecov.io/gh/OHDSI/DataQualityDashboard/branch/main/graph/badge.svg)](https://app.codecov.io/gh/OHDSI/DataQualityDashboard)\n[![Build Status](https://github.com/OHDSI/DataQualityDashboard/workflows/R-CMD-check/badge.svg)](https://github.com/OHDSI/DataQualityDashboard/actions?query=workflow%3AR-CMD-check)\n\u003c!-- badges: end --\u003e\n\nDataQualityDashboard is part of [HADES](https://ohdsi.github.io/Hades/).\n\nThe goal of the Data Quality Dashboard (DQD) project is to design and develop an open-source tool to expose and evaluate observational data quality. \n\nIntroduction\n============\n\nThis package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.4, v5.3 and v5.2). It systematically runs the checks, evaluates the checks against some pre-specified threshold, and then communicates what was done in a transparent and easily understandable way. \n\nOverview\n========\n\nThe quality checks were organized according to the Kahn Framework\u003csup id=\"kahn\"\u003e[1](#f1)\u003c/sup\u003e which uses a system of categories and contexts that represent strategies for assessing data quality. For an introduction to the kahn framework please click [here](https://pmc.ncbi.nlm.nih.gov/articles/PMC5051581/). \n\nUsing this framework, the Data Quality Dashboard takes a systematic-based approach to running data quality checks. Instead of writing thousands of individual checks, we use “data quality check types”. These “check types” are more general, parameterized data quality checks into which OMOP tables, fields, and concepts can be substituted to represent a singular data quality idea. For example, one check type might be written as \n\n*The number and percent of records with a value in the **cdmFieldName** field of the **cdmTableName** table less than **plausibleValueLow**.*\n\nThis would be considered an atemporal plausibility verification check because we are looking for implausibly low values in some field based on internal knowledge. We can use this check type to substitute in values for **cdmFieldName**, **cdmTableName**, and **plausibleValueLow** to create a unique data quality check. If we apply it to PERSON.YEAR_OF_BIRTH here is how that might look: \n\n*The number and percent of records with a value in the **year_of_birth** field of the **PERSON** table less than **1850**.* \n\nAnd, since it is parameterized, we can similarly apply it to DRUG_EXPOSURE.days_supply: \n\n*The number and percent of records with a value in the **days_supply** field of the **DRUG_EXPOSURE** table less than **0**.* \n\nVersion 1 of the tool includes 24 different check types organized into Kahn contexts and categories. Additionally, each data quality check type is considered either a table check, field check, or concept-level check. Table-level checks are those evaluating the table at a high-level without reference to individual fields, or those that span multiple event tables. These include checks making sure required tables are present or that at least some of the people in the PERSON table have records in the event tables. Field-level checks are those related to specific fields in a table. The majority of the check types in version 1 are field-level checks. These include checks evaluating primary key relationship and those investigating if the concepts in a field conform to the specified domain. Concept-level checks are related to individual concepts. These include checks looking for gender-specific concepts in persons of the wrong gender and plausible values for measurement-unit pairs. For a detailed description and definition of each check type please click [here](https://ohdsi.github.io/DataQualityDashboard/articles/CheckTypeDescriptions). \n\nAfter systematically applying the 24 check types to an OMOP CDM version approximately 4,000 individual data quality checks are resolved, run against the database, and evaluated based on a pre-specified threshold. The R package then creates a json object that is read into an RShiny application to view the results.\n\n\n\u003cimg src=\"man/figures/dqDashboardScreenshot.png\"/\u003e\n\n\nFeatures\n========\n- Utilizes configurable data check thresholds\n- Analyzes data in the OMOP Common Data Model format for all data checks\n- Produces a set of data check results with supplemental investigation assets.\n\nData Requirements\n=================\n\nPrior to execution the DataQualityDashboard package requires that the CDM_SOURCE table is properly populated.  The following table is a guide to the expected contents of the CDM_SOURCE table.\n\ncdmFieldName | userGuidance | etlConventions\n-- | -- | --\ncdm_source_name | The name of the CDM instance. | Descriptive name for the source   data.\ncdm_source_abbreviation | The abbreviation of the CDM   instance. | The abbreviation should   consistent for different release from the same source.\ncdm_holder | The holder of the CDM instance. | The institution that controls   access to the data.  If possible   include contact information for who to contact to request access to the data.\nsource_description | The description of the CDM   instance. | Add notes, caveats, special   characteristics about the source data that would not be assumed from the   general descriptive name.  This   description intended to help analysts determine if the data is suitable for   the problem they are studying.\nsource_documentation_reference | Reference to where one can find   documentation about the source data. | Can include URLs, file name,   source data experts contact information (if they agree to it)\ncdm_etl_reference | Reference to where one can find   documentation about the source to ETL into OMOP CDM. | Assuming there is a document or   files (such as Rabbit in the Hat) describing the ETL.  May be the location of the ETL source and   documentation repository.\nsource_release_date | The release date of the source   data. | When the source data was made   available for ETL'ing.  For sites doing   incremental updates, the date the last increment made available.  This implies that for sites doing incremental   updates the CDM Source table should be updated to reflect that changes were   made to the CDM.\ncdm_release_date | The release date of the CDM   instance. | When the source data was made   available for general use.  For sites   doing incremental updates, this implies that the CDM Source table should be   updated to reflect that changes were made to the CDM.\ncdm_version | Identifies the CDM version | Enter the numeric portion of the   version, e.g. 5.4\ncdm_version_concept_id | The Concept Id representing the   version of the CDM. | SELECT concept_id WHERE domain =   Metadata and vocabulary_id = CDM and concept_code like %[numeric portion of   the version]%\nvocabulary_version | The vocabulary version used in   the ETL | Obtained by SELECT   vocabulary_version FROM vocabulary WHERE vocabulary_id = 'None'\n\nTechnology\n==========\nDataQualityDashboard is an R package \n\nSystem Requirements\n===================\nRequires R (version 3.2.2 or higher). Requires [DatabaseConnector](https://github.com/OHDSI/DatabaseConnector) (version 2.0.2 or higher).\n\nA variety of database platforms are supported, as documented [here](https://ohdsi.github.io/Hades/supportedPlatforms.html). \n\nNote that while data quality check threshold files are provided for OMOP CDM versions 5.2, 5.3, and 5.4, the package is currently only tested against versions 5.3 and 5.4.\n\nInstallation\n=============\n1. See the instructions [here](https://ohdsi.github.io/Hades/rSetup.html) for configuring your R environment, including RTools and Java.\n\n2. In R, install DataQualityDashboard from CRAN:\n\n  ```r\n  install.packages(\"DataQualityDashboard\")\n  ```\n\n   To install from GitHub instead:\n\n  ```r\n  install.packages(\"remotes\")\n  remotes::install_github(\"OHDSI/DataQualityDashboard\")\n  ```\n\nUser Documentation\n==================\nDocumentation can be found on the [package website](https://ohdsi.github.io/DataQualityDashboard/index.html).\n\nPDF versions of the documentation are also available:\n\n* Vignette: [Add a New Data Quality Check](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/AddNewCheck.pdf)\n* Vignette: [Check Status Descriptions](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/CheckStatusDefinitions.pdf)\n* Vignette: [Running the DQD on a Cohort](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/DqdForCohorts.pdf)\n* Vignette: [Failure Thresholds and How to Change Them](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/Thresholds.pdf)\n* Vignette: [SqlOnly Mode](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/SqlOnly.pdf)\n* Package manual: [DataQualityDashboard manual](https://github.com/OHDSI/DataQualityDashboard/raw/main/inst/doc/DataQualityDashboard.pdf) \n\nSupport\n=======\n\n* Developer questions/comments/feedback: \u003ca href=\"http://forums.ohdsi.org/c/developers\"\u003eOHDSI Forum\u003c/a\u003e\n* We use the \u003ca href=\"https://github.com/OHDSI/DataQualityDashboard/issues\"\u003eGitHub issue tracker\u003c/a\u003e for all bugs/issues/enhancements \n \nLicense\n=======\nDataQualityDashboard is licensed under Apache License 2.0\n\nDevelopment\n===========\nDataQualityDashboard is being developed in R Studio.\n\n### Development status\n\nDataQualityDashboard latest release (representing code in the `main` branch) is ready for use. \n\n# Acknowledgements\n- This project is supported in part through the National Science Foundation grant IIS 1251151.\n\n\u003cb id=\"f1\"\u003e1\u003c/b\u003e Kahn, M.G., et al., A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC), 2016. 4(1): p. 1244. [↩](#kahn)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fohdsi%2Fdataqualitydashboard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fohdsi%2Fdataqualitydashboard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fohdsi%2Fdataqualitydashboard/lists"}