Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vccri/dv-trio

dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors.
https://github.com/vccri/dv-trio

father-mother-child mendelian-genetics variant-calling variants

Last synced: about 1 month ago
JSON representation

dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors.

Awesome Lists containing this project

README

        

# dv-trio

dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors. The final output is a VCF with sample GT value representative of the FamSeq called genotype.

## Installation
Clone this repository into your cloud instance and run the `bash install_dependencies.sh` script. This will install all dependencies onto your instance's PATH.

## Usage
```
Usage:
dv-trio.sh -i -r -d [ -o ] [ -t ] [ -b ]

Post-processes trio calls made by DeepVariant to correct for Mendelian errors.

Required arguments:

-i path to input file contain trio details.
See input file creation section below for details
-r path to reference file.
The directory holding the reference file need to contain the fa, fai and dict files
-d path to dbSNP VCF file.

Options:
-o path to desired output directory (defaults to current directory)
-t likelihood ratio cutoff threshold for mendelian error correction (float between 0 [use single individual based method] and 1 [use pedigree information], default is 1.0)
-b S3 bucket path to write output to
-h this help message
```
## Input Parameter File
A **tab delimited** text file contains details regarding the trio samples

- Sample ID
- Sample Bam location
- Sample Gender (1 - male, 2 - female)

#Sample Sample_ID Sample_bam_location Sample_gender
CHILD    HG002  /home/ubuntu/GIAB_bams/HG002.GRCh38.60x.1.RG.bam  1
FATHER  HG003  /home/ubuntu/GIAB_bams/HG003.GRCh38.60x.1.RG.bam  1
MOTHER HG004  /home/ubuntu/GIAB_bams/HG004.GRCh38.60x.1.RG.bam  2

See template input file GIAB_trio_file.txt

## Cloud instance recommendation
We were able to successfully run dv-trio for a WGS trio under the following machine condition.

Samples : Genome in a Bottle Consortium's AshkenazimTrio - HG002/HG003/HG004
Virtual Machine : **AWS** - Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - 64-bit (x86)
Instance Type : Compute Optimized - C5.9xlarge - 36 vCPUs, 72GB Memory
Instance Storage : 1000GB (at least two times the size of the bam files size)

## Application Note Details
For more detail on how to replicate the results shown in the application note please see the testing-README.md.

## Citation
Eddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou
dv-trio: a family-based variant calling pipeline using DeepVariant
Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3549–3551, https://doi.org/10.1093/bioinformatics/btaa116

## References

1. R. Poplin, P.-C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger,
J. Dijamco, N. Nguyen, P. T. Afshar, et al. A universal snp and small-indel
variant caller using deep neural networks. Nature biotechnology, 2018.

2. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A.
Philippakis, G. Del Angel, M. A. Rivas, M. Hanna, et al. A framework for variation
discovery and genotyping using next-generation dna sequencing data. Nature genetics,
43(5):491–498, 2011.
3. G. Peng, Y. Fan, and W. Wang. Famseq: a variant calling program for familybased
sequencing data using graphics processing units. PLoS computational biology,
10(10):e1003880, 2014.