Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vccri/dv-trio
dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors.
https://github.com/vccri/dv-trio
father-mother-child mendelian-genetics variant-calling variants
Last synced: about 1 month ago
JSON representation
dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors.
- Host: GitHub
- URL: https://github.com/vccri/dv-trio
- Owner: VCCRI
- License: bsd-3-clause
- Created: 2018-11-05T21:12:01.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-02-03T10:28:27.000Z (almost 4 years ago)
- Last Synced: 2024-03-26T22:59:13.991Z (10 months ago)
- Topics: father-mother-child, mendelian-genetics, variant-calling, variants
- Language: Shell
- Homepage:
- Size: 165 KB
- Stars: 8
- Watchers: 8
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dv-trio
dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors. The final output is a VCF with sample GT value representative of the FamSeq called genotype.
## Installation
Clone this repository into your cloud instance and run the `bash install_dependencies.sh` script. This will install all dependencies onto your instance's PATH.## Usage
```
Usage:
dv-trio.sh -i -r -d [ -o ] [ -t ] [ -b ]Post-processes trio calls made by DeepVariant to correct for Mendelian errors.
Required arguments:
-i path to input file contain trio details.
See input file creation section below for details
-r path to reference file.
The directory holding the reference file need to contain the fa, fai and dict files
-d path to dbSNP VCF file.
Options:
-o path to desired output directory (defaults to current directory)
-t likelihood ratio cutoff threshold for mendelian error correction (float between 0 [use single individual based method] and 1 [use pedigree information], default is 1.0)
-b S3 bucket path to write output to
-h this help message
```
## Input Parameter File
A **tab delimited** text file contains details regarding the trio samples- Sample ID
- Sample Bam location
- Sample Gender (1 - male, 2 - female)#Sample Sample_ID Sample_bam_location Sample_gender
CHILD HG002 /home/ubuntu/GIAB_bams/HG002.GRCh38.60x.1.RG.bam 1
FATHER HG003 /home/ubuntu/GIAB_bams/HG003.GRCh38.60x.1.RG.bam 1
MOTHER HG004 /home/ubuntu/GIAB_bams/HG004.GRCh38.60x.1.RG.bam 2See template input file GIAB_trio_file.txt
## Cloud instance recommendation
We were able to successfully run dv-trio for a WGS trio under the following machine condition.Samples : Genome in a Bottle Consortium's AshkenazimTrio - HG002/HG003/HG004
Virtual Machine : **AWS** - Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - 64-bit (x86)
Instance Type : Compute Optimized - C5.9xlarge - 36 vCPUs, 72GB Memory
Instance Storage : 1000GB (at least two times the size of the bam files size)## Application Note Details
For more detail on how to replicate the results shown in the application note please see the testing-README.md.## Citation
Eddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou
dv-trio: a family-based variant calling pipeline using DeepVariant
Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3549–3551, https://doi.org/10.1093/bioinformatics/btaa116## References
1. R. Poplin, P.-C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger,
J. Dijamco, N. Nguyen, P. T. Afshar, et al. A universal snp and small-indel
variant caller using deep neural networks. Nature biotechnology, 2018.
2. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A.
Philippakis, G. Del Angel, M. A. Rivas, M. Hanna, et al. A framework for variation
discovery and genotyping using next-generation dna sequencing data. Nature genetics,
43(5):491–498, 2011.
3. G. Peng, Y. Fan, and W. Wang. Famseq: a variant calling program for familybased
sequencing data using graphics processing units. PLoS computational biology,
10(10):e1003880, 2014.