Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/umich-cphds/createUKBphenome

Create a PheWAS code based phenome using ICD9 and ICD10 data from baskets of the UK biobank
https://github.com/umich-cphds/createUKBphenome

Last synced: 4 months ago
JSON representation

Create a PheWAS code based phenome using ICD9 and ICD10 data from baskets of the UK biobank

Lists

README

        

# createUKBphenome

## Basic concepts
1. ICD code / PheWAS code mapping from phewascatalog (https://phewascatalog.org/phecodes and https://phewascatalog.org/phecodes_icd10)
2. Collection of information about PheWAS codes and their inclusion / exclusion filters
3. Collection and harmonization of ICD codes from UKB
4. Extraction of all ICD codes from the available fields in your UKB baskets
5. Generatation of a phenome: case control study for each phecode

## Required R libraries
- data.table
- tidyr
- parallel
- intervals
- htmltab
- bitops

## Step 1: Describe your data
Add the absolute paths (e.g. `/driveA/UKB/ukb####.tab`) of your TAB-delimited UKB baskets to a single text file `./data/baskets.txt`
Add the latest file with withdrawn samples 'w#####_########.csv' to './data/' folder

## Step 2: Create Phenome
`cd createUKBphenome`
`Rscript ./scripts/function.createUKBphenome.r`

## Output
1. Full ICD / PheWAS code tables with descriptions (what's the underlying ICD code for each phecode)
2. UKB phenome with exclusion criteria applied to controls
3. UKB phenome without applying exclusion criteria to controls
4. Overview of all phecodes, their categories and general descriptions
5. Output of all ICD codes that were NOT mapped to phecodes (incl. sample sizes)
6. Output of all individuals that had sex-specific diagnose codes that did not match their sex

## Notes:
- This script requires a ton of memory (~20-30 GB), because it reads and collects a lot of data into memory.
- This script requires ICD data of the UK Biobank (ideally the most comprehensive list), `Genetic Sex` and `Sex`
- Only samples with `Genetic Sex` equals `Sex` are kept, because it's unclear why it should be different (potential sources for mismatch: gender identity, bone marrow transplant, sample swap)