https://github.com/machiela-lab/UKBBcleanR

Prepare electronic medical record data from the UK Biobank for time-to-event analyses
https://github.com/machiela-lab/UKBBcleanR

data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank

Last synced: about 1 month ago
JSON representation

Prepare electronic medical record data from the UK Biobank for time-to-event analyses

Host: GitHub
URL: https://github.com/machiela-lab/UKBBcleanR
Owner: machiela-lab
License: mit
Created: 2022-08-02T23:56:25.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-01-26T14:53:31.000Z (over 2 years ago)
Last Synced: 2025-04-01T23:51:47.933Z (about 2 months ago)
Topics: data-processing, electronic-medical-records, r, r-package, rstats, rstats-package, time-to-event, uk-biobank
Language: R
Homepage:
Size: 1.23 MB
Stars: 11
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-uk-biobank - UKBBcleanR - to-event analyses | (Data processing / Optical coherence tomography and fundus)

README

        UKBBcleanR: Prepare electronic medical record data from the UK Biobank for time-to-event analyses 

===================================================

[![R-CMD-check](https://github.com/machiela-lab/UKBBcleanR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/machiela-lab/UKBBcleanR/actions/workflows/R-CMD-check.yaml)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

![GitHub last commit](https://img.shields.io/github/last-commit/machiela-lab/UKBBcleanR)

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7301712.svg)](https://doi.org/10.5281/zenodo.7301712)

**Date repository last updated**: January 26, 2023

### Overview

The `UKBBcleanR` package contains an `R` function that prepares time-to-event data from raw [UK Biobank](https://www.ukbiobank.ac.uk/) electronic medical record data. The prepared data can be used for cancer outcomes, but could be modified for other health outcomes. This package is not available on [CRAN](https://cran.r-project.org/).

### Installation

To install the development version from GitHub:

    devtools::install_github("machiela-lab/UKBBcleanR")

### Available function(s)

Function

Description

tte

Prepares time-to-event data from raw UK Biobank electronic medical record data.

The repository also includes the resources and code to create the project hex sticker.

### Authors

* **Alexander Depaulis** - *Integrative Tumor Epidemiology Branch (ITEB), Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, Maryland (MD), USA* - [GitHub](https://github.com/adepaulis1)

* **Derek W. Brown** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA (original)* - [GitHub](https://github.com/derekbrown12) - [ORCID](https://orcid.org/0000-0001-8393-1713)

* **Aubrey K. Hubbard** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA* - [ORCID](https://orcid.org/0000-0003-4052-1110)

See also the list of [contributors](https://github.com/machiela-lab/UKBBcleanR/graphs/contributors) who participated in this package, including:

* **Ian D. Buller** - *Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland (current)* - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original)* - [GitHub](https://github.com/idblr) - [ORCID](https://orcid.org/0000-0001-9477-8582)

* **Mitchell J. Machiela** - *ITEB, DCEG, NCI, NIH, Rockville, MD, USA* - [GitHub](https://github.com/machiela) - [ORCID](https://orcid.org/0000-0001-6538-9705)

### Getting Started

The `tte` function requires several raw [UK Biobank](https://www.ukbiobank.ac.uk/) variables to run correctly. A detailed list of required variables are provided in the [README_required_variables.txt](https://github.com/machiela-lab/UKBBcleanR/blob/main/data-raw/README_required_variables.txt) file.   

Data can be loaded in the `tte` function in two ways: 

* The user can specify a working directory using `setwd()` to where each individual data set is stored.  

    + NOTE: These individual data sets must contain the specific variables and have names which match the [README_required_variables.txt](https://github.com/machiela-lab/UKBBcleanR/blob/main/data-raw/README_required_variables.txt) file. Example data is available within the [package](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).

* The user can generate a single data set containing all the variables of interest. This data set can then be loaded into the `tte` function using the `combined_data` argument. Example data is available within the [package](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).

### Usage

``` r

# ------------------ #

# Necessary packages #

# ------------------ #

library(UKBBcleanR)

# -------- #

# Settings #

# -------- #

##### Input UKBBcleanR sample data

 # Use combined data set

 testdata <- as.data.frame(combined_data)

 

 # Set ICD-10 outcome of interest

 cancer_outcome <- c("C911") 

 

 # Set prevalent cancers to identify in data cleaning

 prevalent_cancers <- c("D37", "D38", "D39", "D40", "D41", "D42",

                        "D43", "D44", "D45", "D46", "D47", "D48") 

 

 # Set incident cancers to identify in data cleaning

 incident_cancers <- c("C900") 

 

# ------- #

# Run tte #

# ------- #

# Run without removing prevalent cancers from analysis

test1 <- tte(combined_data = testdata, 

             cancer_of_interest_ICD10 = cancer_outcome,

             prevalent_cancer_list = prevalent_cancers, 

             prevalent_C_cancers = TRUE, 

             incident_cancer_list = incident_cancers, 

             remove_prevalent_cancer = FALSE, 

             remove_self_reported_cancer = FALSE)

            

table(test1$case_control_cancer_ignore)  # tte outcome ignoring other incident cancers

table(test1$case_control_cancer_control) # tte outcome controlling for other incident cancers

# Run with removing prevalent cancers from analysis

test2 <- tte(combined_data = testdata, 

             cancer_of_interest_ICD10 = cancer_outcome,

             prevalent_cancer_list = prevalent_cancers, 

             prevalent_C_cancers = TRUE, 

             incident_cancer_list = incident_cancers, 

             remove_prevalent_cancer = TRUE, 

             remove_self_reported_cancer = TRUE)

table(test2$case_control_cancer_ignore)  # tte outcome ignoring other incident cancers

table(test2$case_control_cancer_control) # tte outcome controlling for other incident cancers

```

### Vignette

We provide a [vignette](https://htmlpreview.github.io/?https://github.com/machiela-lab/UKBBcleanR/blob/main/vignettes/vignette.html) with a practical example and work through of the provided [example data](https://github.com/machiela-lab/UKBBcleanR/tree/main/inst/extdata).  

    

### Funding

Package was developed while the first author was a participant of the 2022 [National Institutes of Health](https://www.nih.gov/) [Summer Internship Program in Biomedical Research](https://www.training.nih.gov/programs/sip) and while the second author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov/) at the [National Cancer Institute](https://www.cancer.gov/) (NCI) and the third author was a postdoctoral fellow in the NCI [Division of Cancer Epidemiology and Genetics](https://dceg.cancer.gov/).

### Acknowledgments

When citing this package for publication, please cite follow:

    citation("UKBBcleanR")

### Questions? Feedback?

For questions about the package please contact the maintainer [Dr. Derek Brown](mailto:[email protected]) or [submit a new issue](https://github.com/machiela-lab/UKBBcleanR/issues).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/machiela-lab/UKBBcleanR

Awesome Lists containing this project

README