{"id":13510706,"url":"https://github.com/google/deepvariant","last_synced_at":"2025-05-13T21:05:08.673Z","repository":{"id":37406194,"uuid":"111751293","full_name":"google/deepvariant","owner":"google","description":"DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.","archived":false,"fork":false,"pushed_at":"2025-05-13T00:24:42.000Z","size":909240,"stargazers_count":3391,"open_issues_count":11,"forks_count":747,"subscribers_count":150,"default_branch":"r1.8","last_synced_at":"2025-05-13T01:23:38.864Z","etag":null,"topics":["bioinformatics","deep-learning","deep-neural-network","deepvariant","dna","genome","genomics","machine-learning","ngs","science","sequencing","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-23T01:56:22.000Z","updated_at":"2025-05-12T07:59:43.000Z","dependencies_parsed_at":"2023-10-21T11:37:42.887Z","dependency_job_id":"9e321126-4e19-4413-b729-3362194cf7f3","html_url":"https://github.com/google/deepvariant","commit_stats":{"total_commits":2374,"total_committers":31,"mean_commits":76.58064516129032,"dds":0.6419545071609098,"last_synced_commit":"bf9ed7e6de97cf6c8381694cb996317a740625ad"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fdeepvariant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fdeepvariant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fdeepvariant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fdeepvariant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/deepvariant/tar.gz/refs/heads/r1.8","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254028533,"owners_count":22002275,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","deep-learning","deep-neural-network","deepvariant","dna","genome","genomics","machine-learning","ngs","science","sequencing","tensorflow"],"created_at":"2024-08-01T02:01:50.840Z","updated_at":"2025-05-13T21:05:03.663Z","avatar_url":"https://github.com/google.png","language":"Python","readme":"\u003cimg src=\"docs/images/dv_logo.png\" width=50% height=50%\u003e\n\n[![release](https://img.shields.io/badge/release-v1.6.1-green?logo=github)](https://github.com/google/deepvariant/releases)\n[![announcements](https://img.shields.io/badge/announcements-blue)](https://groups.google.com/d/forum/deepvariant-announcements)\n[![blog](https://img.shields.io/badge/blog-orange)](https://goo.gl/deepvariant)\n\nDeepVariant is a deep learning-based variant caller that takes aligned reads (in\nBAM or CRAM format), produces pileup image tensors from them, classifies each\ntensor using a convolutional neural network, and finally reports the results in\na standard VCF or gVCF file.\n\nDeepVariant supports germline variant-calling in diploid organisms.\n\n*   NGS (Illumina or Element) data for either a\n    [whole genome](docs/deepvariant-case-study.md) or\n    [whole exome](docs/deepvariant-exome-case-study.md).\n*   [RNA-seq Case Study](docs/deepvariant-rnaseq-case-study.md) for Illumina\n    RNA-seq.\n*   PacBio HiFi data, see the\n    [PacBio case study](docs/deepvariant-pacbio-model-case-study.md).\n*   Oxford Nanopore R10.4.1 Simplex or Duplex data, see the\n    [ONT R10.4.1 Simplex case study](docs/deepvariant-ont-r104-simplex-case-study.md)\n    and\n    [ONT R10.4.1 Duplex case study](docs/deepvariant-ont-r104-duplex-case-study.md).\n*   Hybrid PacBio HiFi + Illumina WGS, see the\n    [hybrid case study](docs/deepvariant-hybrid-case-study.md).\n*   Oxford Nanopore R9.4.1 data by using\n    [PEPPER-DeepVariant](https://github.com/kishwarshafin/pepper).\n*   To map using a pangenome to improve accuracy, use this\n    [vg case study](docs/deepvariant-vg-case-study.md).\n*   Complete Genomics data:\n    [T7 case study](docs/deepvariant-complete-t7-case-study.md);\n    [G400 case study](docs/deepvariant-complete-g400-case-study.md)\n\nPlease also note:\n\n*   For somatic data or any other samples where the genotypes go beyond two\n    copies of DNA, DeepVariant will not work out of the box because the only\n    genotypes supported are hom-alt, het, and hom-ref.\n*   The models included with DeepVariant are only trained on human data. For\n    other organisms, see the\n    [blog post on non-human variant-calling](https://google.github.io/deepvariant/posts/2018-12-05-improved-non-human-variant-calling-using-species-specific-deepvariant-models/)\n    for some possible pitfalls and how to handle them.\n\n## DeepTrio\n\nDeepTrio is a deep learning-based trio variant caller built on top of\nDeepVariant. DeepTrio extends DeepVariant's functionality, allowing it to\nutilize the power of neural networks to predict genomic variants in trios or\nduos. See [this page](docs/deeptrio-details.md) for more details and\ninstructions on how to run DeepTrio.\n\nDeepTrio supports germline variant-calling in diploid organisms for the\nfollowing types of input data:\n\n*   NGS (Illumina) data for either\n    [whole genome](docs/deeptrio-wgs-case-study.md) or whole exome.\n*   PacBio HiFi data, see the\n    [PacBio case study](docs/deeptrio-pacbio-case-study.md).\n\nPlease also note:\n\n*   All DeepTrio models were trained on human data.\n*   It is possible to use DeepTrio with only 2 samples (child, and one parent).\n*   External tool [GLnexus](https://github.com/dnanexus-rnd/GLnexus) is used to\n    merge output VCFs.\n\n## How to run DeepVariant\n\nWe recommend using our Docker solution. The command will look like this:\n\n```\nBIN_VERSION=\"1.6.1\"\ndocker run \\\n  -v \"YOUR_INPUT_DIR\":\"/input\" \\\n  -v \"YOUR_OUTPUT_DIR:/output\" \\\n  google/deepvariant:\"${BIN_VERSION}\" \\\n  /opt/deepvariant/bin/run_deepvariant \\\n  --model_type=WGS \\ **Replace this string with exactly one of the following [WGS,WES,PACBIO,ONT_R104,HYBRID_PACBIO_ILLUMINA]**\n  --ref=/input/YOUR_REF \\\n  --reads=/input/YOUR_BAM \\\n  --output_vcf=/output/YOUR_OUTPUT_VCF \\\n  --output_gvcf=/output/YOUR_OUTPUT_GVCF \\\n  --num_shards=$(nproc) \\ **This will use all your cores to run make_examples. Feel free to change.**\n  --logging_dir=/output/logs \\ **Optional. This saves the log output for each stage separately.\n  --haploid_contigs=\"chrX,chrY\" \\ **Optional. Heterozygous variants in these contigs will be re-genotyped as the most likely of reference or homozygous alternates. For a sample with karyotype XY, it should be set to \"chrX,chrY\" for GRCh38 and \"X,Y\" for GRCh37. For a sample with karyotype XX, this should not be used.\n  --par_regions_bed=\"/input/GRCh3X_par.bed\" \\ **Optional. If --haploid_contigs is set, then this can be used to provide PAR regions to be excluded from genotype adjustment. Download links to this files are available in this page.\n  --dry_run=false **Default is false. If set to true, commands will be printed out but not executed.\n```\n\nFor details on X,Y support, please see\n[DeepVariant haploid support](docs/deepvariant-haploid-support.md) and the case\nstudy in\n[DeepVariant X, Y case study](docs/deepvariant-xy-calling-case-study.md). You\ncan download the PAR bed files from here:\n[GRCh38_par.bed](https://storage.googleapis.com/deepvariant/case-study-testdata/GRCh38_PAR.bed),\n[GRCh37_par.bed](https://storage.googleapis.com/deepvariant/case-study-testdata/GRCh37_PAR.bed).\n\nTo see all flags you can use, run: `docker run\ngoogle/deepvariant:\"${BIN_VERSION}\"`\n\nIf you're using GPUs, or want to use Singularity instead, see\n[Quick Start](docs/deepvariant-quick-start.md) for more details or see all the\n[setup options](#deepvariant_setup) available.\n\nFor more information, also see:\n\n*   [Full documentation list](docs/README.md)\n*   [Detailed usage guide](docs/deepvariant-details.md) with more information on\n    the input and output file formats and how to work with them.\n*   [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)\n*   [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)\n*   [DeepVariant's Frequently Asked Questions, FAQ](docs/FAQ.md)\n\n## How to cite\n\nIf you're using DeepVariant in your work, please cite:\n\n[A universal SNP and small-indel variant caller using deep neural networks. *Nature Biotechnology* 36, 983–987 (2018).](https://rdcu.be/7Dhl) \u003cbr/\u003e\nRyan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.\u003cbr/\u003e\ndoi: https://doi.org/10.1038/nbt.4235\n\nAdditionally, if you are generating multi-sample calls using our\n[DeepVariant and GLnexus Best Practices](docs/trio-merge-case-study.md), please\ncite:\n\n[Accurate, scalable cohort variant calls using DeepVariant and GLnexus.\n_Bioinformatics_ (2021).](https://doi.org/10.1093/bioinformatics/btaa1081)\u003cbr/\u003e\nTaedong Yun, Helen Li, Pi-Chuan Chang, Michael F. Lin, Andrew Carroll, and Cory\nY. McLean.\u003cbr/\u003e\ndoi: https://doi.org/10.1093/bioinformatics/btaa1081\n\n## Why Use DeepVariant?\n\n*   **High accuracy** - DeepVariant won 2020\n    [PrecisionFDA Truth Challenge V2](https://precision.fda.gov/challenges/10/results)\n    for All Benchmark Regions for ONT, PacBio, and Multiple Technologies\n    categories, and 2016\n    [PrecisionFDA Truth Challenge](https://precision.fda.gov/challenges/truth/results)\n    for best SNP Performance. DeepVariant maintains high accuracy across data\n    from different sequencing technologies, prep methods, and species. For\n    [lower coverage](https://google.github.io/deepvariant/posts/2019-09-10-twenty-is-the-new-thirty-comparing-current-and-historical-wgs-accuracy-across-coverage/),\n    using DeepVariant makes an especially great difference. See\n    [metrics](docs/metrics.md) for the latest accuracy numbers on each of the\n    sequencing types.\n*   **Flexibility** - Out-of-the-box use for\n    [PCR-positive](https://ai.googleblog.com/2018/04/deepvariant-accuracy-improvements-for.html)\n    samples and\n    [low quality sequencing runs](https://blog.dnanexus.com/2018-01-16-evaluating-the-performance-of-ngs-pipelines-on-noisy-wgs-data/),\n    and easy adjustments for\n    [different sequencing technologies](https://google.github.io/deepvariant/posts/2019-01-14-highly-accurate-snp-and-indel-calling-on-pacbio-ccs-with-deepvariant/)\n    and\n    [non-human species](https://google.github.io/deepvariant/posts/2018-12-05-improved-non-human-variant-calling-using-species-specific-deepvariant-models/).\n*   **Ease of use** - No filtering is needed beyond setting your preferred\n    minimum quality threshold.\n*   **Cost effectiveness** - With a single non-preemptible n1-standard-16\n    machine on Google Cloud, it costs ~$11.8 to call a 30x whole genome and\n    ~$0.89 to call an exome. With preemptible pricing, the cost is $2.84 for a\n    30x whole genome and $0.21 for whole exome (not considering preemption).\n*   **Speed** - See [metrics](docs/metrics.md) for the runtime of all supported\n    datatypes on a 64-core CPU-only machine\u003c/sup\u003e. Multiple options for\n    acceleration exist.\n*   **Usage options** - DeepVariant can be run via Docker or binaries, using\n    both on-premise hardware or in the cloud, with support for hardware\n    accelerators like GPUs and TPUs.\n\n\u003ca name=\"myfootnote1\"\u003e(1)\u003c/a\u003e: Time estimates do not include mapping.\n\n## How DeepVariant works\n\n![Stages in DeepVariant](docs/images/inference_flow_diagram.svg)\n\nFor more information on the pileup images and how to read them, please see the\n[\"Looking through DeepVariant's Eyes\" blog post](https://google.github.io/deepvariant/posts/2020-02-20-looking-through-deepvariants-eyes/).\n\nDeepVariant relies on [Nucleus](https://github.com/google/nucleus), a library of\nPython and C++ code for reading and writing data in common genomics file formats\n(like SAM and VCF) designed for painless integration with the\n[TensorFlow](https://www.tensorflow.org/) machine learning framework. Nucleus\nwas built with DeepVariant in mind and open-sourced separately so it can be used\nby anyone in the genomics research community for other projects. See this blog\npost on\n[Using Nucleus and TensorFlow for DNA Sequencing Error Correction](https://google.github.io/deepvariant/posts/2019-01-31-using-nucleus-and-tensorflow-for-dna-sequencing-error-correction/).\n\n## DeepVariant Setup\n\n### Prerequisites\n\n*   Unix-like operating system (cannot run on Windows)\n*   Python 3.8\n\n### Official Solutions\n\nBelow are the official solutions provided by the\n[Genomics team in Google Health](https://health.google/health-research/).\n\nName                                                                                                | Description\n:-------------------------------------------------------------------------------------------------: | -----------\n[Docker](docs/deepvariant-quick-start.md)           | This is the recommended method.\n[Build from source](docs/deepvariant-build-test.md) | DeepVariant comes with scripts to build it on Ubuntu 20.04. To build and run on other Unix-based systems, you will need to modify these scripts.\nPrebuilt Binaries                                                                                   | Available at [`gs://deepvariant/`](https://console.cloud.google.com/storage/browser/deepvariant). These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the `/proc/cpuinfo` file on your computer, which lists these features under \"flags\".\n\n## Contribution Guidelines\n\nPlease [open a pull request](https://github.com/google/deepvariant/compare) if\nyou wish to contribute to DeepVariant. Note, we have not set up the\ninfrastructure to merge pull requests externally. If you agree, we will test and\nsubmit the changes internally and mention your contributions in our\n[release notes](https://github.com/google/deepvariant/releases). We apologize\nfor any inconvenience.\n\nIf you have any difficulty using DeepVariant, feel free to\n[open an issue](https://github.com/google/deepvariant/issues/new). If you have\ngeneral questions not specific to DeepVariant, we recommend that you post on a\ncommunity discussion forum such as [BioStars](https://www.biostars.org/).\n\n## License\n\n[BSD-3-Clause license](LICENSE)\n\n## Acknowledgements\n\nDeepVariant happily makes use of many open source packages. We would like to\nspecifically call out a few key ones:\n\n*   [Boost Graph Library](http://www.boost.org/doc/libs/1_65_1/libs/graph/doc/index.html)\n*   [abseil-cpp](https://github.com/abseil/abseil-cpp) and\n    [abseil-py](https://github.com/abseil/abseil-py)\n*   [CLIF](https://github.com/google/clif)\n*   [GNU Parallel](https://www.gnu.org/software/parallel/)\n*   [htslib \u0026 samtools](http://www.htslib.org/)\n*   [Nucleus](https://github.com/google/nucleus)\n*   [numpy](http://www.numpy.org/)\n*   [SSW Library](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library)\n*   [TensorFlow and Slim](https://www.tensorflow.org/)\n\nWe thank all of the developers and contributors to these packages for their\nwork.\n\n## Disclaimer\n\nThis is not an official Google product.\n\nNOTE: the content of this research code repository (i) is not intended to be a\nmedical device; and (ii) is not intended for clinical use of any kind, including\nbut not limited to diagnosis or prognosis.\n","funding_links":[],"categories":["Python","Ranked by starred repositories","Next Generation Sequencing","Variant Callers","医疗领域","其他_生物医药","bioinformatics","Genomics Software","Genomics","Variant calling \u0026 alternative splicing tools"],"sub_categories":["Variant Calling","Germline SNP/Indel Callers","网络服务_其他","Articles and References","Mutations","Learning tools"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fdeepvariant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Fdeepvariant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fdeepvariant/lists"}