Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bigdatagenomics/gnocchi
https://github.com/bigdatagenomics/gnocchi
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bigdatagenomics/gnocchi
- Owner: bigdatagenomics
- License: apache-2.0
- Created: 2017-01-26T17:36:15.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-24T19:29:35.000Z (over 6 years ago)
- Last Synced: 2024-06-30T20:37:48.580Z (6 months ago)
- Language: Scala
- Size: 74.7 MB
- Stars: 6
- Watchers: 9
- Forks: 10
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gnocchi
[![Coverage Status](https://coveralls.io/repos/github/bigdatagenomics/gnocchi/badge.svg?branch=master)](https://coveralls.io/github/bigdatagenomics/gnocchi?branch=master)
Genotype-phenotype analysis using the [ADAM](https://github.com/bigdatagenomics/adam) genomics analysis platform.
This is work-in-progress. Currently, we implement a simple case/control analysis using a Chi squared test.# Build
To build, install [Maven](http://maven.apache.org). Then run:
```
mvn package
```Maven will automatically pull down and install all of the necessary dependencies.
Occasionally, building in Maven will fail due to memory issues. You can work around this
by setting the `MAVEN_OPTS` environment variable to `-Xmx2g -XX:MaxPermSize=1g`.# Run
To run, you'll need to install Spark. If you are just evaluating locally, you can use
[a prebuilt Spark distribution](http://spark.apache.org/downloads.html). If you'd like to
use a cluster, refer to Spark's [cluster overview](http://spark.apache.org/docs/latest/cluster-overview.html).Once Spark is installed, set the environment variable `SPARK_HOME` to point to the Spark
installation root directory. Then, you can run `gnocchi` via `./bin/gnocchi-submit`.We include test data. You can run with the test data by running:
```
./bin/gnocchi-submit regressPhenotypes testData/sample.vcf testData/samplePhenotypes.csv testData/associations -saveAsText
```## Phenotype Input
We accept phenotype inputs in a CSV format:
```
Sample,Phenotype,Has Phenotype
mySample,a phenotype,true
```The `has phenotype` column is binary true/false. See the test data for more descriptions.
# License
This project is released under an [Apache 2.0 license](LICENSE.txt).