https://github.com/illumina/gvcfgenotyper
A utility for merging and genotyping Illumina-style GVCFs.
https://github.com/illumina/gvcfgenotyper
Last synced: 2 months ago
JSON representation
A utility for merging and genotyping Illumina-style GVCFs.
- Host: GitHub
- URL: https://github.com/illumina/gvcfgenotyper
- Owner: Illumina
- License: apache-2.0
- Created: 2018-02-07T14:38:52.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-26T14:40:15.000Z (over 6 years ago)
- Last Synced: 2025-03-24T07:07:29.934Z (3 months ago)
- Language: C++
- Homepage:
- Size: 7.92 MB
- Stars: 33
- Watchers: 5
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# gvcfgenotyper
A utility for merging and genotyping [strelka2](https://github.com/Illumina/strelka) GVCFs.
This source code is provided under the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/#). Copyright (c) 2018, Illumina, Inc. All rights reserved.
This tool provides basic genome VCF (GVCF) merging and genotyping functionality to provide a multisample BCF/VCF suitable for cohort analysis. Variants are normalised and decomposed on-the-fly before merging. Samples that do not have a particular variant have their homozygous reference confidence estimated from the GVCF depth blocks using some simple heuristics.
#### Caution:
This software is in early development, it is largely functional but may contain bugs.
There are various flavours of GVCF in the wild, this tool only works with the format [produced by Illumina pipelines](https://sites.google.com/site/gvcftools/home/about-gvcf).
### Installation
The only requirement is a C++11 compatible compiler.
```
git clone https://github.com/Illumina/gvcfgenotyper.git
cd gvcfgenotyper/
make
bin/gvcfgenotyper
```### Running
```
find directory/ -name '*genome.vcf.gz' > gvcfs.txt
time ./gvcfgenotyper -f genome.fa -l gvcfs.txt -Ob -o output.bcf
```or with some trivial parallelism:
```
for i in {1..22} X;
do
echo -r $i -f genome.fa -l gvcfs.txt -Ob -o output.chr${i}.bcf;
done | xargs -l -P 23 ./gvcfgenotyper
```If you are looking for a sequencing cohort to try this out, have a look at [Polaris](https://github.com/Illumina/Polaris).
### Known issues
Homozygous reference confidence (`GQ` and `DP`) works well for SNPs but is less reliable for indels. Our homozygous reference likelihoods are currently just dummy values eg. `PL=0,255,255` and should not be used for any sophisticated analysis such as denovo mutation calling (Strelka has good joint-calling-from BAM functionality for small pedigrees).
Complex variants can occasionally contain primitive alleles called in other samples. We are investigating decomposition approaches for this problem.
We are working on multi-threading to improve performance.
### Feedback
Please open an [issue](https://github.com/Illumina/gvcfgenotyper/issues) on github to provide feedback or ask questions.
### Acknowledgements
This tool depends on [htslib](http://www.htslib.org), [googletest](https://github.com/google/googletest) and [spdlog](https://github.com/gabime/spdlog). We also borrowed some variant normalisation code from [BCFtools](https://samtools.github.io/bcftools/bcftools.html).