https://github.com/borgwardtlab/biobank_genomics
https://github.com/borgwardtlab/biobank_genomics
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/borgwardtlab/biobank_genomics
- Owner: BorgwardtLab
- Created: 2020-09-30T14:54:00.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-02T07:21:34.000Z (over 5 years ago)
- Last Synced: 2025-01-22T04:14:01.324Z (over 1 year ago)
- Language: Shell
- Size: 759 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Pre-processing of genomic data from the UK Biobank

## Software requirement
This pre-processing procedure requires:
* QCTOOL, version 1.0
* GTOOL, version 1.0
* PLINK, version 2.00
* Python, version 3.7.4
## Setup and workflow
Before starting the pre-processing procedure, the script `paths_and_parameters.sh` should be modified to add information about :
* paths:
* main project directory: `prj_dir`
* QCTOOL software: `oxford_dir`
* GTOOL software: `oxford_dir2`
* PLINK software: `plink_dir`
* genotype data: `geno_dir`
* imputed data: `imp_dir`
* output data: `out_dir`
* parameters:
* list of chromosomes: `CHR_LIST`
* SNP call rate: `SNP_CALLRATE`
* SNP minor allele frequency: `SNP_MAF`
* sample call rate: `SAMPLE_CALLRATE`
* window size for linkage disequilibrium analysis: `LD_WINDOWSIZE`
* step size for linkage disequilibrium analysis: `LD_STEPSIZE`
* coefficient of correlation squared for linkage disequilibrium analysis: `LD_R2`
* threshold for Hardy-Weinberg equilibrium analysis: `HWE`
Pre-processing can be performed by running the script `main_script` as follows :
```
bash ./code/main_script.sh
```
It will successively call the scripts for the different steps, which can be found under `./code/steps_code`. Below is a description of each step:

Output examples for each steps can be found under [`./code/README.md`](https://github.com/lbourguignon/Preprocessing_UKBiobank_genotype500k/tree/master/code)
## External resources
* [UK Biobank](https://www.ukbiobank.ac.uk/)
* [PLINK](https://www.cog-genomics.org/plink/2.0/)
## Contact
This repository is maintained by:
* [Lucie Bourguignon](https://github.com/lbourguignon)
* [Caroline Weis](https://github.com/cvweis)
* [Catherine Jutzeler](https://github.com/jutzca)
Data acquisition and analyses in the present study were conducted under UK Biobank Application #14762.