Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kenhanscombe/ukbtools

An R package to manipulate and explore UK Biobank data
https://github.com/kenhanscombe/ukbtools

biobank kcl-sgu r uk-biobank ukb

Last synced: about 7 hours ago
JSON representation

An R package to manipulate and explore UK Biobank data

Awesome Lists containing this project

README

        

ukbtools
===

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/ukbtools)](https://cran.r-project.org/package=ukbtools)
[![codecov](https://codecov.io/gh/kenhanscombe/ukbtools/branch/master/graph/badge.svg?token=4MMpYxggFt)](https://codecov.io/gh/kenhanscombe/ukbtools)
[![R-CMD-check](https://github.com/kenhanscombe/ukbtools/workflows/R-CMD-check/badge.svg)](https://github.com/kenhanscombe/ukbtools/actions)

> **NB. With the advent of the UKB RAP, this package is no longer supported or under active development.**

After downloading and decrypting your UK Biobank (UKB) data with the supplied [UKB programs] (http://biobank.ctsu.ox.ac.uk/crystal/docs/UsingUKBData.pdf), you have multiple files that need to be brought together to give you a dataset to explore. The data file has column names that are edited field-codes from the [UKB data showcase](http://www.ukbiobank.ac.uk/data-showcase/). ukbtools makes it easy to collapse the multiple UKB files into a single dataset for analysis, in the process giving meaningful names to the variables. The package also includes functionality to retrieve ICD diagnoses, explore a sample subset in the context of the UKB sample, and collect genetic metadata.

## Installation

```r
# Install from CRAN
install.packages("ukbtools")

# Install latest development version
devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)
```

## Prerequisite: Make a UKB fileset

Download§ then decrypt your data and create a "UKB fileset" (.tab, .r, .html):

```bash
ukb_unpack ukbxxxx.enc key
ukb_conv ukbxxxx.enc_ukb r
ukb_conv ukbxxxx.enc_ukb docs
```

`ukb_unpack` decrypts your downloaded `ukbxxxx.enc` file, outputting a `ukbxxxx.enc_ukb` file. `ukb_conv` with the `r` flag converts the decrypted data to a tab-delimited file `ukbxxxx.tab` and an R script `ukbxxxx.r` that reads the tab file. The `docs` flag creates an html file containing a field-code-to-description table (among others).

§ Full details of the data download and decrypt process are given in the [Using UK Biobank Data](http://biobank.ctsu.ox.ac.uk/crystal/docs/UsingUKBData.pdf) documentation.

## Make a UKB dataset

The function `ukb_df()` takes two arguments, the stem of your fileset and the path, and returns a dataframe with usable column names. This will take a few minutes. The rate-limiting step is reading and parsing the code in the UKB-generated .r file - not `ukb_df` per se.

```r
library(ukbtools)
my_ukb_data <- ukb_df("ukbxxxx")
```

You can also specify the path to your fileset if it is not in the current directory. For example, if your fileset is in a subdirectory of the working directory called data

```r
my_ukb_data <- ukb_df("ukbxxxx", path = "/full/path/to/my/data")
```

__Note:__ You can move the three files in your fileset after creating them with `ukb_conv`, but they should be kept together. `ukb_df()` automatically updates the read call in the R source file to point to the correct directory (the current directory by default, or a directory specified by `path`).

## Other tools

All tools are described on the [ukbtools webpage](https://kenhanscombe.github.io/ukbtools/) and in the package vignette "Explore UK Biobank Data"

```r
vignette("explore-ukb-data", package = "ukbtools")
```
For a list of all functions

```r
help(package = "ukbtools")
```