{"id":22580203,"url":"https://github.com/vccri/dv-trio","last_synced_at":"2025-04-10T18:31:41.870Z","repository":{"id":151121803,"uuid":"156283672","full_name":"VCCRI/dv-trio","owner":"VCCRI","description":"dv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors.","archived":false,"fork":false,"pushed_at":"2021-02-03T10:28:27.000Z","size":169,"stargazers_count":10,"open_issues_count":2,"forks_count":1,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-24T16:11:05.618Z","etag":null,"topics":["father-mother-child","mendelian-genetics","variant-calling","variants"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VCCRI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-11-05T21:12:01.000Z","updated_at":"2024-12-09T16:42:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"f3cb1f59-dd9c-43f1-bc1e-8a8bb1775268","html_url":"https://github.com/VCCRI/dv-trio","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VCCRI%2Fdv-trio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VCCRI%2Fdv-trio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VCCRI%2Fdv-trio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VCCRI%2Fdv-trio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VCCRI","download_url":"https://codeload.github.com/VCCRI/dv-trio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248271571,"owners_count":21075800,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["father-mother-child","mendelian-genetics","variant-calling","variants"],"created_at":"2024-12-08T05:14:27.449Z","updated_at":"2025-04-10T18:31:41.840Z","avatar_url":"https://github.com/VCCRI.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# dv-trio\n\ndv-trio provides a pipeline to call variants for a trio (father-mother-child) using DeepVariants [1]. Genomic Variant Calling Files (gVCFs) created by DeepVariants are then co_called together using GATK[2]. The resultant trio VCF is then post-processing with FamSeq[3] to eliminate mendelian errors. The final output is a VCF with sample GT value representative of the FamSeq called genotype.\n\n## Installation\nClone this repository into your cloud instance and run the `bash install_dependencies.sh` script. This will install all dependencies onto your instance's PATH.\n\n## Usage\n```\nUsage:\n       dv-trio.sh -i \u003cinput parameter file\u003e -r \u003creference\u003e -d \u003cdbSNP VCF\u003e [ -o \u003coutput directory name\u003e ] [ -t \u003cthreshold\u003e ] [ -b \u003cbucket\u003e ]\n\nPost-processes trio calls made by DeepVariant to correct for Mendelian errors.\n\nRequired arguments:\n\n  -i \u003cinput parameter file\u003e   path to input file contain trio details. \n                              See input file creation section below for details\n  -r \u003creference\u003e              path to reference file. \n                              The directory holding the reference file need to contain the fa, fai and dict files\n  -d \u003cdbSNP VCF\u003e              path to dbSNP VCF file. \n                                                        \n\nOptions:\n  -o \u003coutput\u003e     path to desired output directory (defaults to current directory)\n  -t \u003cthreshold\u003e  likelihood ratio cutoff threshold for mendelian error correction (float between 0 [use single individual based method] and 1 [use pedigree information], default is 1.0)\n  -b \u003cbucket\u003e     S3 bucket path to write output to\n  -h              this help message\n```\n## Input Parameter File\nA **tab delimited** text file contains details regarding the trio samples\n\n - Sample ID\n - Sample Bam location \n - Sample Gender (1 - male, 2 - female)\n\n#Sample\u0026nbsp;Sample_ID\u0026nbsp;Sample_bam_location\u0026nbsp;Sample_gender  \nCHILD\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;HG002 \u0026nbsp;/home/ubuntu/GIAB_bams/HG002.GRCh38.60x.1.RG.bam \u0026nbsp;1  \nFATHER\u0026nbsp;\u0026nbsp;HG003 \u0026nbsp;/home/ubuntu/GIAB_bams/HG003.GRCh38.60x.1.RG.bam \u0026nbsp;1  \nMOTHER\u0026nbsp;HG004 \u0026nbsp;/home/ubuntu/GIAB_bams/HG004.GRCh38.60x.1.RG.bam \u0026nbsp;2  \n\nSee template input file GIAB_trio_file.txt\n\n## Cloud instance recommendation\nWe were able to successfully run dv-trio for a WGS trio under the following machine condition.\n\nSamples : Genome in a Bottle Consortium's AshkenazimTrio - HG002/HG003/HG004  \nVirtual Machine :  **AWS** - Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - 64-bit (x86)  \nInstance Type : Compute Optimized - C5.9xlarge - 36 vCPUs, 72GB Memory  \nInstance Storage : 1000GB (at least two times the size of the bam files size)   \n\n## Application Note Details\nFor more detail on how to replicate the results shown in the application note please see the testing-README.md.\n\n## Citation\nEddie K K Ip, Clinton Hadinata, Joshua W K Ho, Eleni Giannoulatou  \ndv-trio: a family-based variant calling pipeline using DeepVariant  \nBioinformatics, Volume 36, Issue 11, June 2020, Pages 3549–3551, https://doi.org/10.1093/bioinformatics/btaa116\n\n## References \n\n 1. R. Poplin, P.-C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger,\nJ. Dijamco, N. Nguyen, P. T. Afshar, et al. A universal snp and small-indel\nvariant caller using deep neural networks. Nature biotechnology, 2018.\n \n 2. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A.\nPhilippakis, G. Del Angel, M. A. Rivas, M. Hanna, et al. A framework for variation\ndiscovery and genotyping using next-generation dna sequencing data. Nature genetics,\n43(5):491–498, 2011. \n 3. G. Peng, Y. Fan, and W. Wang. Famseq: a variant calling program for familybased\nsequencing data using graphics processing units. PLoS computational biology,\n10(10):e1003880, 2014.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvccri%2Fdv-trio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvccri%2Fdv-trio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvccri%2Fdv-trio/lists"}