Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alkc/parallel-structure

A probably over-engineered bash-script for starting parallel runs of structure for different population sizes and numbers of replicates.
https://github.com/alkc/parallel-structure

parallel population-genetics structure

Last synced: about 1 month ago
JSON representation

A probably over-engineered bash-script for starting parallel runs of structure for different population sizes and numbers of replicates.

Awesome Lists containing this project

README

        

# parallel-structure

Glued-together bash script for running parallel runs "Structure" for population genetics inference for different values of `K` and reps.

You will need both `parallel` and structure installed — an easy task with conda:

```
conda install -c bioconda parallel structure
```

## Example

To check if the script works, please use the included example data set. More info about the sample data can be found at: https://web.stanford.edu/group/pritchardlab/software/structure-data_v.2.3.1.html (testdata1)

Please run the following command from the script directory:

```
bash parallel-structure.sh example-data/mainparams example-data/extraparams example-data/testdata1 output_dir 1 3 5 8
```

In the last four digits of the above command you are able to set, in the following order: minimum K, maximum K, number of repetitions and number of parallel jobs.

The command starts 8 parallel jobs for K=1 to K=3 with 5 replicates for each tested value K.

All output is saved to `output_dir/`

## Citation

[![DOI](https://zenodo.org/badge/125206866.svg)](https://zenodo.org/badge/latestdoi/125206866)

If this script has been useful to you and you think more researchers would benefit from knowing about it, then feel free to cite it at as follows:

* Alexander Koc. (2021, April 16). alkc/parallel-structure: (Version v0.6.1). Zenodo. http://doi.org/10.5281/zenodo.4697229

More importantly, you should probably cite both structure and parallel on which this script relies.

For more info about how to cite Structure please refer to page 37 of the official [Structure 3.4 manual (PDF)](https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/structure_doc.pdf)

For more info about how to cite GNU parallel, please look here https://doi.org/10.5281/zenodo.1146014 (or run `parallel --citation` in the bash prompt!).

## CHANGELOG

## version 0.6.1

* Prepare for release on Zenodo
* UPDATED README with better description

## version 0.6 <2021-04-13>

* FIXED bug where replicate runs started with the same seed, which defeated the purpose of reps.
* ADDED Ability to set min K, max K, number of reps and number of parallel jobs from the command line
* ADDED More informative error messages if files missing at specified paths

## TODO:

* Add installation instructions?
* Add long named parameters (probably requires moving away from using bash?)
* Add some parameter validation (e.g. exit with informative error if input files do not exist)
* Make nbr parallel jobs parameter optional (default to nproc - 1?)