Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/masa16/pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
https://github.com/masa16/pwrake

cluster distributed-computing gfarm parallel parallel-computing pwrake rake ruby scientific-computing workflow

Last synced: 2 months ago
JSON representation

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

Awesome Lists containing this project

README

        

# Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
* Author: Masahiro Tanaka

[README in Japanese](https://github.com/masa16/pwrake/wiki/Pwrakeとは),
[GitHub Repository](https://github.com/masa16/pwrake),
[RubyGems](https://rubygems.org/gems/pwrake)

## Features

* Pwrake executes a workflow written in Rakefile in parallel.
* The specification of Rakefile is same as Rake.
* The tasks which do not have mutual dependencies are automatically executed in parallel.
* The `multitask` which is a parallel task definition of Rake is no more necessary.
* Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
* Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
* Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
* Remote host names and the number of cores to use are provided in a hostfile.
* [Gfarm file system](http://sourceforge.net/projects/gfarm/) utilizes storage of compute nodes. It provides the high-performance parallel I/O.
* Parallel I/O access to local storage of compute nodes enables scalable increase in the I/O performance.
* Gfarm schedules a compute node to store an output file, to local storage.
* Pwrake schedules a compute node to execute a task, to a node where input files are stored.
* Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.

## Requirement

* Ruby version 2.2.3 or later
* UNIX-like OS
* For distributed processing using multiple computers:
* SSH command
* distributed file system (NFS, Gfarm, etc.)

## Installation

Install with RubyGems:

$ gem install pwrake

Or download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

If you use rbenv, your system may fail to find pwrake command after installation:

-bash: pwrake: command not found

In this case, you need the rehash of command paths:

$ rbenv rehash

## Usage

### Parallel execution using 4 cores at localhost:

$ pwrake -j 4

### Parallel execution using all cores at localhost:

$ pwrake -j

### Parallel execution using total 2*2 cores at remote 2 hosts:

1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
2. Allow passphrase-less access via SSH in either way:
* Add passphrase-less key generated by `ssh-keygen`. (Be careful)
* Add passphrase using `ssh-add`.
3. Make `hosts` file in which remote host names and the number of cores are listed:

$ cat hosts
host1 2
host2 2

4. Run `pwrake` with an option `--hostfile` or `-F`:

$ pwrake -F hosts

### Sustitute MPI for SSH to start remote worker (Experimental)

1. Setup MPI on your cluster.
2. Install [MPipe gem](https://rubygems.org/gems/mpipe). (requires `mpicc`)
3. Run `pwrake-mpi` command.

$ pwrake-mpi -F hosts

## Options

### Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE [Pw] Read hostnames from FILE
-j, --jobs [N] [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
--ssh-opt, --ssh-option OPTION
[Pw] Option passed to SSH
--filesystem FILESYSTEM [Pw] Specify FILESYSTEM (nfs|gfarm2fs)
--gfarm [Pw] (obsolete; Start pwrake on Gfarm FS)
-A, --disable-affinity [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal [Pw] Turn OFF task steal
-d, --debug [Pw] Output Debug messages
--pwrake-conf [FILE] [Pw] Pwrake configuration file in YAML
--show-conf, --show-config [Pw] Show Pwrake configuration options
--report LOGDIR [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
--report-image IMAGE_TYPE [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
--clear-gfarm2fs [Pw] Clear gfarm2fs mountpoints left after failure.

### pwrake_conf.yaml

* If `pwrake_conf.yaml` exists at current directory, Pwrake reads options from it.
* Example (in YAML form):

HOSTFILE: hosts
LOG_DIR: true
DISABLE_AFFINITY: true
DISABLE_STEAL: true
FAILED_TARGET: delete
PASS_ENV :
- ENV1
- ENV2

* Option list:

HOSTFILE, HOSTS nil(default, localhost)|filename
LOG_DIR, LOG nil(default, No log output)|true(dirname="Pwrake%Y%m%d-%H%M%S")|dirname
LOG_FILE default="pwrake.log"
TASK_CSV_FILE default="task.csv"
COMMAND_CSV_FILE default="command.csv"
GC_LOG_FILE default="gc.log"
WORK_DIR default=$PWD
FILESYSTEM default(autodetect)|gfarm
SSH_OPTION SSH option
PASS_ENV (Array) Environment variables passed to SSH
HEARTBEAT default=240 - Hearbeat interval in seconds
RETRY default=1 - The number of task retry
HOST_FAILURE default=2 - The number of allowed continuous host failure (since v2.3)
FAILED_TARGET rename(default)|delete|leave - Treatment of failed target files
FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed
QUEUE_PRIORITY LIFO(default)|FIFO|LIHR(LIfo&Highest-Rank-first; obsolete)
DISABLE_RANK_PRIORITY false(default)|true - Disable rank-aware task scheduling (since v2.3)
RESERVE_NODE false(default)|true - Reserve a node for tasks with ncore>1 (since v2.3)
NOACTION_QUEUE_PRIORITY FIFO(default)|LIFO|RAND
SHELL_START_INTERVAL default=0.012 (sec)
GRAPH_PARTITION false(default)|true
REPORT_IMAGE default=png

* Options for Gfarm system:

DISABLE_AFFINITY default=false
DISABLE_STEAL default=false
GFARM_BASEDIR default="/tmp"
GFARM_PREFIX default="pwrake_$USER"
GFARM_SUBDIR default='/'
MAX_GFWHERE_WORKER default=8
GFARM2FS_COMMAND default='gfarm2fs'
GFARM2FS_OPTION default=""
GFARM2FS_DEBUG default=false
GFARM2FS_DEBUG_WAIT default=1

## Task Properties

* Task properties are specified in `desc` strings above task definition in Rakefile.

Example of Rakefile:

``` ruby
desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
sh "..."
end

(1..n).each do |i|
desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
file "task#{i}" do
sh "..."
end
end
```

Properties (The leftmost item is default):

ncore=integer|rational - The number of cores used by this task.
exclusive=no|yes - Exclusively execute this task in a single node.
reserve=no|yes - Gives higher priority to this task if ncore>1. (reserve a host)
allow=hostname - Allow this host to execute this task. (accepts wild card)
deny=hostname - Deny this host to execute this task. (accepts wild card)
order=deny,allow|allow,deny - The order of evaluation.
steal=yes|no - Allow task stealing for this task.
retry=integer - The number of retry for this task.

## Note for Gfarm

* Gfarm file-affinity scheduling is achieved by `gfwhere-pipe` script bundled in the Pwrake package.
This script accesses `libgfarm.so.1` through Fiddle (a Ruby's standard module) since Pwrake ver.2.2.7.
Please set the environment variable `LD_LIBRARY_PATH` correctly to find `libgfarm.so.1`.

## Scheduling with Graph Partitioning

* Compile and Install METIS 5.1.0 (http://www.cs.umn.edu/~metis/). This requires CMake.

* Install RbMetis (https://github.com/masa16/rbmetis) by

gem install rbmetis -- \
--with-metis-include=/usr/local/include \
--with-metis-lib=/usr/local/lib

* Option (`pwrake_conf.yaml`):

GRAPH_PARTITION: true

* See publication: [M. Tanaka and O. Tatebe, “Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning,” in CCGrid 2012](http://ieeexplore.ieee.org/abstract/document/6217406/)

## [Publications](https://github.com/masa16/pwrake/wiki/Publications)

## Acknowledgment

This work is supported by:
* JST CREST, research themes:
* ["Statistical Computational Cosmology with Big Astronomical Imaging Data,"](http://www.jst.go.jp/kisoken/crest/en/project/44/14532369.html)
* ["System Software for Post Petascale Data Intensive Science,"](http://postpeta.jst.go.jp/en/researchers/tatebe22.html)
* MEXT Promotion of Research for Next Generation IT Infrastructure "Resources Linkage for e-Science (RENKEI)."