Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/PapenfussLab/bionix

Functional highly reproducible bioinformatics pipelines
https://github.com/PapenfussLab/bionix

Last synced: 3 months ago
JSON representation

Functional highly reproducible bioinformatics pipelines

Awesome Lists containing this project

README

        

BioNix is a tool for reproducible bioinformatics that unifies workflow
engines, package managers, and containers. It is implemented as a
lightweight library on top of the [Nix](https://nixos.org/nix/)
deployment system.

BioNix is currently a work in progress, so documentation is sparse.
Please get in contact with us for more information, help, and
contributing (see bottom of this page).

## Installation

BioNix requires no dependencies beyond [Nix](http://nixos.org/nix),
which may be installed by:
```{sh}
curl -L https://nixos.org/nix/install | sh
```
If you do not have root access a variety of [rootless
install](https://nixos.wiki/wiki/Nix_Installation_Guide#Installing_without_root_permissions)
options are available.

API docs can be generated by executing `nix build` in the `doc`
directory and viewing `result/OEBPS/index.html`.

## Examples

Several examples are available in `./examples/`. The main example is
presented in `./examples/default.nix` and can be built using `nix build`
in `./examples/`. This sample pipeline performs variant calling using
[`platypus`](https://github.com/andyrimmer/Platypus), alignment using
[`bwa mem`](https://github.com/lh3/bwa), and preprocessing using
[`samtools`](http://www.htslib.org/).

See the documentation in `./examples/README.md` for more detail about
this pipeline and the other examples.

- The pipeline itself is specified in `examples/call.nix` and
`examples/default.nix`.
- The BioNix wrapper to run `platypus` is in
`tools/platypus-callVariants.nix`.
- The Nix expression for the `platypus` software itself can be found in
[nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/biology/platypus/default.nix).

## Constructing workflows

Writing workflows requires some familiarity with the Nix
programming language and deployment system. Good introductions can be
found [here](https://learnxinyminutes.com/docs/nix/) and
[here](https://github.com/tazjin/nix-1p).

To understand how to construct workflows it is recommended to study the
examples provided. Thanks to the flexibility of Nix, the workflows can
be constructed in different ways to suit the intended purposes and the
examples illustrate some of the ways one might approach various
problems.

For constructing tool wrappers, take a look in the `./tools/`
directory for the currently existing tool wrappers. A good starting
point are the wrappers for BWA.

## HPC execution

BioNix supports submission of jobs to computing queues rather than
directly building them using the Nix build engine. The two supported
engines are Slurm and PBS represented by the `slurm` and `qsub` entries
in the root BioNix tree, which take an attribute set of default
parameters to a new tree of tools. Simply use tools out of these trees
to submit jobs, and specify resource requirements as ordinary
configuration options to the tools.

The following resource parameters can be specified:

- *ppn*: The number of cores to request;
- *mem*: The amount of memory to request (GB);
- *walltime*: A string defining the maximum walltime.

As we rely on side effects to submit jobs sandbox builds cannot be used
and must be disabled (`--option sandbox false` with `nix-build` or
`--no-sandbox` with `nix build`).

### Slurm specifics

Slurm jobs are submitted by executing the `salloc` binary on the
cluster. By default this is assumed to be `/usr/bin/salloc`; if this is
not the case on your cluster then you need to additionally specify the
path to salloc via the `salloc` parameter.

When launching the build, it is important that the `TMPDIR`
environment variable points to a location which is on shared storage
(i.e., available from all nodes). This will be the location used for
temporary files during the execution of stages.

### PBS specifics

The PBS wrapper is considerably more complicated as initiating
interactive processes is not as reliable as Slurm's `salloc`.
Consequently, jobs are submitted via non-interactive queue submissions
and the queue polled to determine when the submitted job has completed.

The path to the PBS executables (i.e., `qsub` and `qstat`) has to be
given in the `qsubPath` attribute. Furthermore, a temporary directory
that's shared across all nodes must be specified in `tmpDir`.

## Distributed execution

Nix has support for distributing jobs amongst a collection of
distributed machines. See the
[manual](https://nixos.org/nix/manual/#chap-distributed-builds) and
[wiki](https://nixos.wiki/wiki/Distributed_build) for more information.

## Citing

1. Bedő, J., Di Stefano, L., & Papenfuss, A. T. (2020). Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix. GigaScience, 9(11). https://doi.org/10.1093/gigascience/giaa121

## Getting help and contributing

For general questions and reporting problem please open an issue. For real-time
help there is a chat room at
[#bionix:nixos.org](https://matrix.to/#/#bionix:nixos.org).