Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/machiela-lab/UKBBcleanR

Prepare electronic medical record data from the UK Biobank for time-to-event analyses
https://github.com/machiela-lab/UKBBcleanR

data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank

Last synced: 2 days ago
JSON representation

Prepare electronic medical record data from the UK Biobank for time-to-event analyses

Awesome Lists containing this project

README

        

UKBBcleanR: Prepare electronic medical record data from the UK Biobank for time-to-event analyses
===================================================

[![R-CMD-check](https://github.com/machiela-lab/UKBBcleanR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/machiela-lab/UKBBcleanR/actions/workflows/R-CMD-check.yaml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub last commit](https://img.shields.io/github/last-commit/machiela-lab/UKBBcleanR)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7301712.svg)](https://doi.org/10.5281/zenodo.7301712)

**Date repository last updated**: January 26, 2023

### Overview

The `UKBBcleanR` package contains an `R` function that prepares time-to-event data from raw [UK Biobank](https://www.ukbiobank.ac.uk/) electronic medical record data. The prepared data can be used for cancer outcomes, but could be modified for other health outcomes. This package is not available on [CRAN](https://cran.r-project.org/).

### Installation

To install the development version from GitHub:

devtools::install_github("machiela-lab/UKBBcleanR")

### Available function(s)

Function
Description

tte
Prepares time-to-event data from raw UK Biobank electronic medical record data.

The repository also includes the resources and code to create the project hex sticker.

### Authors

* **Alexander Depaulis** - *Integrative Tumor Epidemiology Branch (ITEB), Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, Maryland (MD), USA* - [GitHub](https://github.com/adepaulis1)

* **Derek W. Brown** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA (original)* - [GitHub](https://github.com/derekbrown12) - [ORCID](https://orcid.org/0000-0001-8393-1713)

* **Aubrey K. Hubbard** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA* - [ORCID](https://orcid.org/0000-0003-4052-1110)

See also the list of [contributors](https://github.com/machiela-lab/UKBBcleanR/graphs/contributors) who participated in this package, including:

* **Ian D. Buller** - *Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland (current)* - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original)* - [GitHub](https://github.com/idblr) - [ORCID](https://orcid.org/0000-0001-9477-8582)

* **Mitchell J. Machiela** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA* - [GitHub](https://github.com/machiela) - [ORCID](https://orcid.org/0000-0001-6538-9705)

### Getting Started

The `tte` function requires several raw [UK Biobank](https://www.ukbiobank.ac.uk/) variables to run correctly. A detailed list of required variables are provided in the [README_required_variables.txt](https://github.com/machiela-lab/UKBBcleanR/blob/main/data-raw/README_required_variables.txt) file.

Data can be loaded in the `tte` function in two ways:

* The user can specify a working directory using `setwd()` to where each individual data set is stored.
+ NOTE: These individual data sets must contain the specific variables and have names which match the [README_required_variables.txt](https://github.com/machiela-lab/UKBBcleanR/blob/main/data-raw/README_required_variables.txt) file. Example data is available within the [package](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).

* The user can generate a single data set containing all the variables of interest. This data set can then be loaded into the `tte` function using the `combined_data` argument. Example data is available within the [package](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).

### Usage

``` r
# ------------------ #
# Necessary packages #
# ------------------ #

library(UKBBcleanR)

# -------- #
# Settings #
# -------- #

##### Input UKBBcleanR sample data

# Use combined data set
testdata <- as.data.frame(combined_data)

# Set ICD-10 outcome of interest
cancer_outcome <- c("C911")

# Set prevalent cancers to identify in data cleaning
prevalent_cancers <- c("D37", "D38", "D39", "D40", "D41", "D42",
"D43", "D44", "D45", "D46", "D47", "D48")

# Set incident cancers to identify in data cleaning
incident_cancers <- c("C900")

# ------- #
# Run tte #
# ------- #

# Run without removing prevalent cancers from analysis
test1 <- tte(combined_data = testdata,
cancer_of_interest_ICD10 = cancer_outcome,
prevalent_cancer_list = prevalent_cancers,
prevalent_C_cancers = TRUE,
incident_cancer_list = incident_cancers,
remove_prevalent_cancer = FALSE,
remove_self_reported_cancer = FALSE)

table(test1$case_control_cancer_ignore) # tte outcome ignoring other incident cancers
table(test1$case_control_cancer_control) # tte outcome controlling for other incident cancers

# Run with removing prevalent cancers from analysis
test2 <- tte(combined_data = testdata,
cancer_of_interest_ICD10 = cancer_outcome,
prevalent_cancer_list = prevalent_cancers,
prevalent_C_cancers = TRUE,
incident_cancer_list = incident_cancers,
remove_prevalent_cancer = TRUE,
remove_self_reported_cancer = TRUE)
table(test2$case_control_cancer_ignore) # tte outcome ignoring other incident cancers
table(test2$case_control_cancer_control) # tte outcome controlling for other incident cancers
```

### Vignette

We provide a [vignette](https://htmlpreview.github.io/?https://github.com/machiela-lab/UKBBcleanR/blob/main/vignettes/vignette.html) with a practical example and work through of the provided [example data](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).

### Funding

Package was developed while the first author was a participant of the 2022 [National Institutes of Health](https://www.nih.gov/) [Summer Internship Program in Biomedical Research](https://www.training.nih.gov/programs/sip) and while the second author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov/) at the [National Cancer Institute](https://www.cancer.gov/) (NCI) and the third author was a postdoctoral fellow in the NCI [Division of Cancer Epidemiology and Genetics](https://dceg.cancer.gov/).

### Acknowledgments

When citing this package for publication, please cite follow:

citation("UKBBcleanR")

### Questions? Feedback?

For questions about the package please contact the maintainer [Dr. Derek Brown](mailto:[email protected]) or [submit a new issue](https://github.com/machiela-lab/UKBBcleanR/issues).