https://github.com/althonos/smatrix
Not the slurm job dispatcher you need, but the one you deserve.
https://github.com/althonos/smatrix
cli-app python slurm slurm-cluster
Last synced: 10 months ago
JSON representation
Not the slurm job dispatcher you need, but the one you deserve.
- Host: GitHub
- URL: https://github.com/althonos/smatrix
- Owner: althonos
- License: mit
- Created: 2020-06-04T18:46:34.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-24T10:39:11.000Z (over 5 years ago)
- Last Synced: 2025-01-25T18:44:29.795Z (12 months ago)
- Topics: cli-app, python, slurm, slurm-cluster
- Language: Python
- Size: 35.2 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
# `smatrix`
*Not the slurm job dispatcher you need, but the one you deserve.*
[](https://choosealicense.com/licenses/mit/)
[](https://github.com/althonos/smatrix/)
[](https://github.com/althonos/smatrix/issues)
## Introduction
`slurm` is a workload manager, and it is typically used to parallelize work
at a job level. It is heavily configurable, but can sometimes be quite
overwhelming when the work to be performed is simple.
A typical usecase in our lab is to use `slurm` to process metagenomes in the
EMBL cluster: we want to run a single command (like `hmmsearch` or `gecco`)
on a very large number of files, and also possibly with different threshold
values. Doing so efficiently requires writing a custom script that ends up
being copied and pasted around. **As a programmer, I found this unacceptable.**
`smatrix` leverages the most common tasks of splitting the workload evenly,
generating a job script with the parameters, and launching the jobs to the
cluster. Think `xargs`, except it spawns `slurm` jobs instead of processes.
## Usage
`smatrix` uses the same names as `sbatch` or `srun` for parameters if needed,
and some additional flags to pass parameters. A quick example:
```console
$ smatrix --cpus-per-task 2 -P:f1 0.02 0.01 -P:file /data/seq1.fa /data/seq2.fa \
--wrap 'hmmsearch --F1=$f1 Pfam.hmm $file'
```
This command will launch 4 jobs, using 2 CPUs per job (using the same option
as with `sbatch`), for all possible combinations of `$f1` and `$file` as given
in the CLI arguments. `--cpus-per-task` is a builtin `sbatch` option, so it
will be transparently given to SLURM when we queue the job.
The other arguments however are being used by `smatrix` to setup the job array.
### `smatrix`-hijacked options
#### `--wrap` flag
The `--wrap` CLI flag is used to pass the command to wrap in a script. It will
get executed once for every element of the job matrix created with the parameters
given to the CLI.
### `smatrix`-specific options
#### `-P` / `--param` flag
The `-P` flag is the only new flag introduced by `smatrix`. Use it to specify
parameter arrays
The format for the `--param` flag is designed to accommodate globing and
sub-command calls in the shell:
```console
$ smatrix --param:n $(seq 1 100) --param:file /etc/*.conf --wrap '...'
```
Note that, in this example, the glob pattern expansion is done by the shell
and may have escaping issues if the filenames contain whitespace characters.
#### `--wrap` flag
`--wrap` was already there in `sbatch`, but `smatrix` wraps the command
differently, since it will also expose the parameters you request with `-P`.