Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bigdatagenomics/avocado
A Variant Caller, Distributed. Apache 2 licensed.
https://github.com/bigdatagenomics/avocado
Last synced: 2 months ago
JSON representation
A Variant Caller, Distributed. Apache 2 licensed.
- Host: GitHub
- URL: https://github.com/bigdatagenomics/avocado
- Owner: bigdatagenomics
- License: apache-2.0
- Created: 2013-09-10T21:44:45.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-03-11T21:33:58.000Z (almost 6 years ago)
- Last Synced: 2024-08-05T17:23:37.383Z (6 months ago)
- Language: Scala
- Homepage: http://bdgenomics.org/projects/avocado/
- Size: 2.24 MB
- Stars: 71
- Watchers: 22
- Forks: 42
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
avocado
=======[![Coverage Status](https://coveralls.io/repos/github/bigdatagenomics/avocado/badge.svg?branch=master)](https://coveralls.io/github/bigdatagenomics/avocado?branch=master)
# A Variant Caller, Distributed
This README represents the TL;DR docs for avocado. More detailed documentation
is hosted at [Read the Docs](http://bdg-avocado.readthedocs.io/).# Who/What/When/Where/Why avocado?
Avocado is a distributed variant caller built on top of the [ADAM format and
APIs](http://www.github.com/bigdatagenomics/adam) and [Apache
Spark](http://spark.apache.org/). Avocado is an open source project and is
released under the [Apache 2.0 license](https://github.com/bigdatagenomics/avocado/blob/master/LICENSE).Avocado can be used for single sample germline variant calling, trio calling,
and joint variant calling. Avocado has >99% SNP calling accuracy, and >96%
INDEL calling accuracy when paired with ADAM's INDEL realignment pipeline.
When run on a single 32 core machine, Avocado can call variants on a 60x
coverage whole genome sequencing (WGS) dataset in approximately 7 hours. By
using Apache Spark to scale across multiple machines, Avocado can process the
same WGS dataset in approximately 15 minutes when using 1,024 cores.# How avocado?
## Building Avocado
Avocado uses [Maven](http://maven.apache.org/) to build. To build avocado, cd
into the repository and run "mvn package".## Avocado binaries
Nightly builds of Avocado are available from the [OSS Sonatype
repository](https://oss.sonatype.org/content/repositories/snapshots/org/bdgenomics/avocado/).
Additionally, we make a Docker image available from [Quay](https://quay.io/repository/ucsc_cgl/avocado?tag=latest&tab=tags).# License
ADAM is released under the [Apache License, Version 2.0](LICENSE.txt).
# Citing Avocado
Avocado has been described in a PhD thesis. To cite this thesis, please cite:
```
@article{nothaft17,
title={Scalable Systems and Algorithms for Genomic Variant Analysis},
author={Nothaft, Frank Austin},
school = {EECS Department, University of California, Berkeley},
uear = {2017},
month = {Dec},
URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-204.html},
number = {UCB/EECS-2017-204}
}
```A preprint describing Avocado should be released by the end of January 2018.