{"id":13806210,"url":"https://github.com/masa16/pwrake","last_synced_at":"2025-05-13T21:32:37.792Z","repository":{"id":2998065,"uuid":"4015087","full_name":"masa16/pwrake","owner":"masa16","description":"Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.","archived":false,"fork":false,"pushed_at":"2020-01-16T07:12:37.000Z","size":784,"stargazers_count":57,"open_issues_count":2,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-14T03:19:42.163Z","etag":null,"topics":["cluster","distributed-computing","gfarm","parallel","parallel-computing","pwrake","rake","ruby","scientific-computing","workflow"],"latest_commit_sha":null,"homepage":"http://masa16.github.io/pwrake/","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/masa16.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES_V2.md","contributing":null,"funding":null,"license":"MIT-LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-04-13T11:00:54.000Z","updated_at":"2023-10-16T10:25:56.000Z","dependencies_parsed_at":"2022-09-02T14:22:55.900Z","dependency_job_id":null,"html_url":"https://github.com/masa16/pwrake","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masa16%2Fpwrake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masa16%2Fpwrake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masa16%2Fpwrake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masa16%2Fpwrake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/masa16","download_url":"https://codeload.github.com/masa16/pwrake/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225260388,"owners_count":17446089,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","distributed-computing","gfarm","parallel","parallel-computing","pwrake","rake","ruby","scientific-computing","workflow"],"created_at":"2024-08-04T01:01:08.907Z","updated_at":"2024-11-18T22:31:32.070Z","avatar_url":"https://github.com/masa16.png","language":"Ruby","funding_links":[],"categories":["NLP Pipeline Subtasks"],"sub_categories":["Pipeline Generation"],"readme":"# Pwrake\n\nParallel Workflow extension for Rake, runs on multicores, clusters, clouds.\n* Author: Masahiro Tanaka\n\n[README in Japanese](https://github.com/masa16/pwrake/wiki/Pwrakeとは),\n[GitHub Repository](https://github.com/masa16/pwrake),\n[RubyGems](https://rubygems.org/gems/pwrake)\n\n## Features\n\n* Pwrake executes a workflow written in Rakefile in parallel.\n  * The specification of Rakefile is same as Rake.\n  * The tasks which do not have mutual dependencies are automatically executed in parallel.\n  * The `multitask` which is a parallel task definition of Rake is no more necessary.\n* Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.\n  * Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.\n  * Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.\n  * Remote host names and the number of cores to use are provided in a hostfile.\n* [Gfarm file system](http://sourceforge.net/projects/gfarm/) utilizes storage of compute nodes. It provides the high-performance parallel I/O.\n  * Parallel I/O access to local storage of compute nodes enables scalable increase in the I/O performance.\n  * Gfarm schedules a compute node to store an output file, to local storage.\n  * Pwrake schedules a compute node to execute a task, to a node where input files are stored.\n  * Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.\n\n## Requirement\n\n* Ruby version 2.2.3 or later\n* UNIX-like OS\n* For distributed processing using multiple computers:\n  * SSH command\n  * distributed file system (NFS, Gfarm, etc.)\n\n## Installation\n\nInstall with RubyGems:\n\n    $ gem install pwrake\n\nOr download source tgz/zip and expand, cd to subdirectory and install:\n\n    $ ruby setup.rb\n\nIf you use rbenv, your system may fail to find pwrake command after installation:\n\n    -bash: pwrake: command not found\n\nIn this case, you need the rehash of command paths:\n\n    $ rbenv rehash\n\n\n## Usage\n\n### Parallel execution using 4 cores at localhost:\n\n    $ pwrake -j 4\n\n### Parallel execution using all cores at localhost:\n\n    $ pwrake -j\n\n### Parallel execution using total 2*2 cores at remote 2 hosts:\n\n1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.\n2. Allow passphrase-less access via SSH in either way:\n   * Add passphrase-less key generated by `ssh-keygen`.  (Be careful)\n   * Add passphrase using `ssh-add`.\n3. Make `hosts` file in which remote host names and the number of cores are listed:\n\n        $ cat hosts\n        host1 2\n        host2 2\n\n4. Run `pwrake` with an option `--hostfile` or `-F`:\n\n        $ pwrake -F hosts\n\n### Sustitute MPI for SSH to start remote worker (Experimental)\n\n1. Setup MPI on your cluster.\n2. Install [MPipe gem](https://rubygems.org/gems/mpipe). (requires `mpicc`)\n3. Run `pwrake-mpi` command.\n\n        $ pwrake-mpi -F hosts\n\n## Options\n\n### Pwrake command line options (in addition to Rake option)\n\n    -F, --hostfile FILE              [Pw] Read hostnames from FILE\n    -j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)\n    -L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY\n        --ssh-opt, --ssh-option OPTION\n                                     [Pw] Option passed to SSH\n        --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm2fs)\n        --gfarm                      [Pw] (obsolete; Start pwrake on Gfarm FS)\n    -A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)\n    -S, --disable-steal              [Pw] Turn OFF task steal\n    -d, --debug                      [Pw] Output Debug messages\n        --pwrake-conf [FILE]         [Pw] Pwrake configuration file in YAML\n        --show-conf, --show-config   [Pw] Show Pwrake configuration options\n        --report LOGDIR              [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.\n        --report-image IMAGE_TYPE    [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.\n        --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.\n\n### pwrake_conf.yaml\n\n* If `pwrake_conf.yaml` exists at current directory, Pwrake reads options from it.\n* Example (in YAML form):\n\n        HOSTFILE: hosts\n        LOG_DIR: true\n        DISABLE_AFFINITY: true\n        DISABLE_STEAL: true\n        FAILED_TARGET: delete\n        PASS_ENV :\n         - ENV1\n         - ENV2\n\n* Option list:\n\n        HOSTFILE, HOSTS   nil(default, localhost)|filename\n        LOG_DIR, LOG      nil(default, No log output)|true(dirname=\"Pwrake%Y%m%d-%H%M%S\")|dirname\n        LOG_FILE          default=\"pwrake.log\"\n        TASK_CSV_FILE     default=\"task.csv\"\n        COMMAND_CSV_FILE  default=\"command.csv\"\n        GC_LOG_FILE       default=\"gc.log\"\n        WORK_DIR          default=$PWD\n        FILESYSTEM        default(autodetect)|gfarm\n        SSH_OPTION        SSH option\n        PASS_ENV          (Array) Environment variables passed to SSH\n        HEARTBEAT         default=240 - Hearbeat interval in seconds\n        RETRY             default=1 - The number of task retry\n        HOST_FAILURE      default=2 - The number of allowed continuous host failure (since v2.3)\n        FAILED_TARGET     rename(default)|delete|leave - Treatment of failed target files\n        FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed\n        QUEUE_PRIORITY          LIFO(default)|FIFO|LIHR(LIfo\u0026Highest-Rank-first; obsolete)\n        DISABLE_RANK_PRIORITY   false(default)|true - Disable rank-aware task scheduling (since v2.3)\n        RESERVE_NODE            false(default)|true - Reserve a node for tasks with ncore\u003e1 (since v2.3)\n        NOACTION_QUEUE_PRIORITY FIFO(default)|LIFO|RAND\n        SHELL_START_INTERVAL    default=0.012 (sec)\n        GRAPH_PARTITION         false(default)|true\n        REPORT_IMAGE            default=png\n\n* Options for Gfarm system:\n\n        DISABLE_AFFINITY    default=false\n        DISABLE_STEAL       default=false\n        GFARM_BASEDIR       default=\"/tmp\"\n        GFARM_PREFIX        default=\"pwrake_$USER\"\n        GFARM_SUBDIR        default='/'\n        MAX_GFWHERE_WORKER  default=8\n        GFARM2FS_COMMAND    default='gfarm2fs'\n        GFARM2FS_OPTION     default=\"\"\n        GFARM2FS_DEBUG      default=false\n        GFARM2FS_DEBUG_WAIT default=1\n\n## Task Properties\n\n* Task properties are specified in `desc` strings above task definition in Rakefile.\n\nExample of Rakefile:\n\n``` ruby\ndesc \"ncore=4 allow=ourhost*\" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.\nrule \".o\" =\u003e \".c\" do\n  sh \"...\"\nend\n\n(1..n).each do |i|\n  desc \"ncore=2 steal=no\" # desc should be inside of loop because it is effective only for the next task.\n  file \"task#{i}\" do\n    sh \"...\"\n  end\nend\n```\n\nProperties (The leftmost item is default):\n\n    ncore=integer|rational - The number of cores used by this task.\n    exclusive=no|yes       - Exclusively execute this task in a single node.\n    reserve=no|yes         - Gives higher priority to this task if ncore\u003e1. (reserve a host)\n    allow=hostname         - Allow this host to execute this task. (accepts wild card)\n    deny=hostname          - Deny this host to execute this task. (accepts wild card)\n    order=deny,allow|allow,deny - The order of evaluation.\n    steal=yes|no           - Allow task stealing for this task.\n    retry=integer          - The number of retry for this task.\n\n## Note for Gfarm\n\n* Gfarm file-affinity scheduling is achieved by `gfwhere-pipe` script bundled in the Pwrake package.\n  This script accesses `libgfarm.so.1` through Fiddle (a Ruby's standard module) since Pwrake ver.2.2.7.\n  Please set the environment variable `LD_LIBRARY_PATH` correctly to find `libgfarm.so.1`.\n\n## Scheduling with Graph Partitioning\n\n* Compile and Install METIS 5.1.0 (http://www.cs.umn.edu/~metis/). This requires CMake.\n\n* Install RbMetis (https://github.com/masa16/rbmetis) by\n\n        gem install rbmetis -- \\\n         --with-metis-include=/usr/local/include \\\n         --with-metis-lib=/usr/local/lib\n\n* Option (`pwrake_conf.yaml`):\n\n        GRAPH_PARTITION: true\n\n* See publication: [M. Tanaka and O. Tatebe, “Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning,” in CCGrid 2012](http://ieeexplore.ieee.org/abstract/document/6217406/)\n\n## [Publications](https://github.com/masa16/pwrake/wiki/Publications)\n\n## Acknowledgment\n\nThis work is supported by:\n* JST CREST, research themes:\n  * [\"Statistical Computational Cosmology with Big Astronomical Imaging Data,\"](http://www.jst.go.jp/kisoken/crest/en/project/44/14532369.html)\n  * [\"System Software for Post Petascale Data Intensive Science,\"](http://postpeta.jst.go.jp/en/researchers/tatebe22.html)\n* MEXT Promotion of Research for Next Generation IT Infrastructure \"Resources Linkage for e-Science (RENKEI).\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasa16%2Fpwrake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmasa16%2Fpwrake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasa16%2Fpwrake/lists"}