Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dnanexus-rnd/GLnexus
Scalable gVCF merging and joint variant calling for population sequencing projects
https://github.com/dnanexus-rnd/GLnexus
Last synced: 3 months ago
JSON representation
Scalable gVCF merging and joint variant calling for population sequencing projects
- Host: GitHub
- URL: https://github.com/dnanexus-rnd/GLnexus
- Owner: dnanexus-rnd
- License: apache-2.0
- Created: 2015-04-08T05:05:05.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2024-04-12T00:01:44.000Z (7 months ago)
- Last Synced: 2024-06-21T18:40:48.500Z (5 months ago)
- Language: C++
- Homepage:
- Size: 9.83 MB
- Stars: 137
- Watchers: 15
- Forks: 36
- Open Issues: 87
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Bioinformatics - GLNexus - Scalable gVCF merging and joint variant calling for population sequencing projects. [ [paper-2018](https://www.biorxiv.org/content/10.1101/343970v1.abstract) ] (Next Generation Sequencing / Data Analysis)
README
# GLnexus
**From DNAnexus R&D: scalable gVCF merging and joint variant calling for population sequencing projects.**
(GL, genotype likelihood)### Reading
[Our 2018 manuscript](http://dx.doi.org/10.1101/343970) with collaborators at [Regeneron Genetics Center](https://www.regeneron.com/genetics-center) and [Baylor College of Medicine](https://www.hgsc.bcm.edu/) details the design of GLnexus and scientific validation using up to 240,000 human exomes and 22,600 genomes. Compared to the DNAnexus cloud-native deployment used for such large projects, this open-source version produces identical scientific results but lacks some of the scalability and production-oriented features.
NEW for 2020: [Accurate, scalable cohort variant calls using DeepVariant and GLnexus](https://doi.org/10.1101/2020.02.10.942086) (by Google Health team) including [public bucket](https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP/) with 1000 Genomes Project [modern resequencing](https://www.internationalgenome.org/data-portal/data-collection/30x-grch38) products.
### [Getting Started](https://github.com/dnanexus-rnd/GLnexus/wiki/Getting-Started)
The [Getting Started](https://github.com/dnanexus-rnd/GLnexus/wiki/Getting-Started) wiki page has a tutorial for first-time users.
### [Prebuilt executables](https://github.com/dnanexus-rnd/GLnexus/releases)
For each tagged revision, the [Releases](https://github.com/dnanexus-rnd/GLnexus/releases) page has a static executable suitable for most Linux x86-64 hosts; just download it and `chmod +x glnexus_cli`. Each release also provides a lightweight Docker image wrapping `glnexus_cli`.
### Build & test
[![Coverage Status](https://coveralls.io/repos/dnanexus-rnd/GLnexus/badge.svg?branch=master&service=github)](https://coveralls.io/github/dnanexus-rnd/GLnexus?branch=master)
The GLnexus build process has a number of dependencies, but produces a standalone, statically-linked executable `glnexus_cli`. The easiest way to build it is to use our Dockerfile to control all the compile-time dependencies, then simply copy the static executable out of the resting Docker container and put it anywhere you like.
```
# Clone repo
git clone https://github.com/dnanexus-rnd/GLnexus.git
cd GLnexus
git checkout vX.Y.Z # optional, check out desired revision# Build GLnexus in docker
docker build --target builder -t glnexus_tests .# Run GLnexus unit tests.
docker run --rm glnexus_tests# Copy the static GLnexus executable to the current working directory.
docker run --rm -v $(pwd):/io glnexus_tests cp glnexus_cli /io# Run it to see its usage message.
./glnexus_cli
```**To build GLnexus without Docker**, make sure you have [gcc 5+](http://askubuntu.com/a/581497), [CMake 3.2+](http://askubuntu.com/questions/610291/how-to-install-cmake-3-2-on-ubuntu-14-04), and all the dependencies indicated in the [Dockerfile](https://github.com/dnanexus-rnd/GLnexus/blob/master/Dockerfile).
Then,
```
git clone https://github.com/dnanexus-rnd/GLnexus.git
cd GLnexus
cmake -Dtest=ON . && make -j$(nproc) && ctest -V
```You will also find `./glnexus_cli` here.
### Coding conventions
* C++14 - take advantage of [the goodies](http://shop.oreilly.com/product/0636920033707.do)
* Use smart pointers to avoid passing resources needing manual deallocation across function/class boundaries
* Prefer references over pointers when they shouldn't be null nor change ever.
* Avoid exceptions; prefer returning a `Status`, defined early in [types.h](https://github.com/dnanexus-rnd/GLnexus/blob/master/include/types.h)
* nb the frequently-used convenience macro `S()` defined just below `Status`
* Avoid public constructors with nontrivial bodies; prefer static initializer function returning `Status`
* Avoid elaborate templated class hierarchies### Libraries used
* [htslib](https://github.com/samtools/htslib)
* [rocksdb](https://github.com/facebook/rocksdb)
* [yaml-cpp](https://github.com/jbeder/yaml-cpp)
* [Capnproto](https://github.com/sandstorm-io/capnproto)
* [CTPL](https://github.com/vit-vit/CTPL)
* [fcmm](https://github.com/giacomodrago/fcmm)
* [zstd](https://github.com/facebook/zstd)
* [Catch](https://github.com/philsquared/Catch) test framework### Performance profiling
The [Performance](https://github.com/dnanexus-rnd/GLnexus/wiki/Performance) wiki page has practical advice for deploying GLnexus on a powerful server.
The code has some hooks for performance profiling using
[`perf`](https://en.wikipedia.org/wiki/Perf_(Linux)) and
[FlameGraph](http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html).To profile performance within the DNAnexus applet run the applet as
usual plus `-i perf=true`. This produces an output file
```genotype.stacks``` containing sampling observation counts for common call
stacks. To generate an SVG visualization with FlameGraph:```
git clone https://github.com/brendangregg/FlameGraph
FlameGraph/flamegraph.pl < genotype.stacks > genotype.svg
```