{"id":22569079,"url":"https://github.com/henrikbengtsson/future.batchjobs","last_synced_at":"2025-04-10T12:36:11.390Z","repository":{"id":27015191,"uuid":"30479500","full_name":"HenrikBengtsson/future.BatchJobs","owner":"HenrikBengtsson","description":":rocket: R package: future.BatchJobs: A Future API for Parallel and Distributed Processing using BatchJobs [Intentionally archived on CRAN on 2021-01-08]","archived":false,"fork":false,"pushed_at":"2021-02-08T03:23:50.000Z","size":759,"stargazers_count":8,"open_issues_count":3,"forks_count":0,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-03-31T15:24:25.629Z","etag":null,"topics":["distributed-computing","hpc","job-scheduler","package","parallel","parallel-computing","pbs","r","sge","slurm","torque"],"latest_commit_sha":null,"homepage":"https://cran.r-project.org/package=future.BatchJobs","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HenrikBengtsson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-08T03:28:56.000Z","updated_at":"2021-02-08T03:23:53.000Z","dependencies_parsed_at":"2022-08-31T22:21:52.056Z","dependency_job_id":null,"html_url":"https://github.com/HenrikBengtsson/future.BatchJobs","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Ffuture.BatchJobs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Ffuture.BatchJobs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Ffuture.BatchJobs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Ffuture.BatchJobs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HenrikBengtsson","download_url":"https://codeload.github.com/HenrikBengtsson/future.BatchJobs/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248217150,"owners_count":21066633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-computing","hpc","job-scheduler","package","parallel","parallel-computing","pbs","r","sge","slurm","torque"],"created_at":"2024-12-08T00:17:54.397Z","updated_at":"2025-04-10T12:36:11.356Z","avatar_url":"https://github.com/HenrikBengtsson.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\n# future.BatchJobs: A Future API for Parallel and Distributed Processing using BatchJobs\n\n![Life cycle: superseded](vignettes/imgs/lifecycle-superseded-blue.svg)\n\n_NOTE: The [BatchJobs](https://cran.r-project.org/package=BatchJobs) package is deprecated in favor of the [batchtools](https://cran.r-project.org/package=batchtools) package. Because of this, it is recommended to use the [future.batchtools](https://cran.r-project.org/package=future.batchtools) package instead of this package. This [future.BatchJobs](https://cran.r-project.org/package=future.BatchJobs) package is formally deprecated and is archived on CRAN as of 2021-01-08._\n \n\n## Introduction\n\nThe [future] package provides a generic API for using futures in R.\nA future is a simple yet powerful mechanism to evaluate an R expression\nand retrieve its value at some point in time.  Futures can be resolved\nin many different ways depending on which strategy is used.\nThere are various types of synchronous and asynchronous futures to\nchoose from in the [future] package.\n\nThis package, [future.BatchJobs], provides a type of futures that\nutilizes the [BatchJobs] package.  This means that _any_ type of\nbackend that the BatchJobs package supports can be used as a future.\nMore specifically, future.BatchJobs will allow you or users of your\npackage to leverage the compute power of high-performance computing\n(HPC) clusters via a simple switch in settings - without having to\nchange any code at all.\n\nFor instance, if BatchJobs is properly configures, the below two\nexpressions for futures `x` and `y` will be processed on two different\ncompute nodes:\n```r\n\u003e library(\"future.BatchJobs\")\n\u003e plan(batchjobs_custom)\n\u003e\n\u003e x %\u003c-% { Sys.sleep(5); 3.14 }\n\u003e y %\u003c-% { Sys.sleep(5); 2.71 }\n\u003e x + y\n[1] 5.85\n```\nThis is obviously a toy example to illustrate what futures look like\nand how to work with them.\n\nA more realistic example comes from the field of cancer research\nwhere very large data FASTQ files, which hold a large number of short\nDNA sequence reads, are produced.  The first step toward a biological\ninterpretation of these data is to align the reads in each sample\n(one FASTQ file) toward the human genome.  In order to speed this up,\nwe can have each file be processed by a separate compute node and each\nnode we can use 24 parallel processes such that each process aligns a\nseparate chromosome.  Here is an outline of how this nested parallelism\ncould be implemented using futures.\n```r\nlibrary(\"future\")\nlibrary(\"listenv\")\n## The first level of futures should be submitted to the\n## cluster using BatchJobs.  The second level of futures\n## should be using multiprocessing, where the number of\n## parallel processes is automatically decided based on\n## what the cluster grants to each compute node.\nplan(list(batchjobs_custom, multisession))\n\n## Find all samples (one FASTQ file per sample)\nfqs \u003c- dir(pattern=\"[.]fastq$\")\n\n## The aligned results are stored in BAM files\nbams \u003c- listenv()\n\n## For all samples (FASTQ files) ...\nfor (ss in seq_along(fqs)) {\n  fq \u003c- fqs[ss]\n\n  ## ... use futures to align them ...\n  bams[[ss]] %\u003c-% {\n    bams_ss \u003c- listenv()\n\t## ... and for each FASTQ file use a second layer\n\t## of futures to align the individual chromosomes\n    for (cc in 1:24) {\n      bams_ss[[cc]] %\u003c-% htseq::align(fq, chr=cc)\n    }\n\t## Resolve the \"chromosome\" futures and return as a list\n    as.list(bams_ss)\n  }\n}\n## Resolve the \"sample\" futures and return as a list\nbams \u003c- as.list(bams)\n```\nNote that a user who do not have access to a cluster could use the same script processing samples sequentially and chromosomes in parallel on a single machine using:\n```r\nplan(list(sequential, multisession))\n```\nor samples in parallel and chromosomes sequentially using:\n```r\nplan(list(multisession, sequential))\n```\n\nFor an introduction as well as full details on how to use futures,\nplease consult the package vignettes of the [future] package.\n\n\n\n## Choosing BatchJobs backend\nThe future.BatchJobs package implements a generic future wrapper\nfor all BatchJobs backends.  Below are the most common types of\nBatchJobs backends.\n\n\n| Backend                 | Description                                                              | Alternative in future package\n|:------------------------|:-------------------------------------------------------------------------|:------------------------------------\n| _generic:_              |                                                                          |\n| `batchjobs_custom`      | Uses custom BatchJobs configuration script files, e.g. `.BatchJobs.R`    | N/A\n| _predefined:_           |                                                                          |\n| `batchjobs_torque`      | Futures are evaluated via a [TORQUE] / PBS job scheduler                 | N/A\n| `batchjobs_slurm`       | Futures are evaluated via a [Slurm] job scheduler                        | N/A\n| `batchjobs_sge`         | Futures are evaluated via a [Sun/Oracle Grid Engine (SGE)] job scheduler | N/A\n| `batchjobs_lsf`         | Futures are evaluated via a [Load Sharing Facility (LSF)] job scheduler  | N/A\n| `batchjobs_openlava`    | Futures are evaluated via an [OpenLava] job scheduler                    | N/A\n| `batchjobs_interactive` | synchronous evaluation in the calling R environment                      | `plan(transparent)`\n| `batchjobs_local`       | synchronous evaluation in a separate R process (on current machine)      | `plan(cluster, workers=\"localhost\")`\n\nIn addition to the above, there is also `batchjobs_multicore` (which on Windows and Solaris falls back to `batchjobs_local`), which runs BatchJobs tasks asynchronously in background R sessions (sic!) on the current machine.  We _advise to not use_ this and instead use `multisession` of the [future] package.  For details, see `help(\"batchjobs_multicore\")`.\n\n\n### Examples\n\nBelow are two examples illustrating how to use `batchjobs_custom` and `batchjobs_torque` to configure the BatchJobs backend.  For further details and examples on how to configure BatchJobs, see the [BatchJobs configuration] wiki page.\n\n### Example: A .BatchJobs.R file using local BatchJobs\nThe most general way of configuring BatchJobs is via a `.BatchJobs.R` file.\nThis file should be located in the current directory or in the user's\nhome directory.  For example, as an alternative to `batchjobs_local`,\nwe can manually configure local BatchJobs futures a `.BatchJobs.R` file\nthat contains\n```r\ncluster.functions \u003c- makeClusterFunctionsLocal()\n```\nThis will then be found and used when specifying\n```r\n\u003e plan(batchjobs_custom)\n```\nTo specify this BatchJobs configuration file explicitly, one can use\n```r\n\u003e plan(batchjobs_custom, pathname=\"./.BatchJobs.R\")\n```\n\nThis follows the naming convention set up by the BatchJobs package.\n\n\n\n### Example: A .BatchJobs.*.tmpl template file for TORQUE / PBS\nTo configure BatchJobs for job schedulers we need to setup a template\nfile that is used to generate the script used by the scheduler.\nThis is what a template file for TORQUE / PBS may look like:\n```sh\n## Job name:\n#PBS -N \u003c%= job.name %\u003e\n\n## Merge standard error and output:\n#PBS -j oe\n\n## Resource parameters:\n\u003c% for (name in names(resources)) { %\u003e\n#PBS -l \u003c%= name %\u003e=\u003c%= resources[[name]] %\u003e\n\u003c% } %\u003e\n\n## Run R:\nR CMD BATCH --no-save --no-restore \"\u003c%= rscript %\u003e\" /dev/stdout\n```\nIf this template is saved to file `.BatchJobs.torque.tmpl` in the\nworking directory or the user's home directory, then it will be\nautomatically located and loaded when doing:\n```r\n\u003e plan(batchjobs_torque)\n```\nResource parameters can be specified via argument `resources` which should be a named list and is passed as is to the template file.  For example, to request that each job would get allotted 12 cores (one a single machine) and up to  5 GiB of memory, use:\n```r\n\u003e plan(batchjobs_torque, resources=list(nodes=\"1:ppn=12\", vmem=\"5gb\"))\n```\n\nTo specify the `resources` argument at the same time as using nested future strategies, one can use `tweak()` to tweak the default arguments.  For instance,\n```r\nplan(list(\n  tweak(batchjobs_torque, resources=list(nodes=\"1:ppn=12\", vmem=\"5gb\")),\n  multisession\n))\n```\ncauses the first level of futures to be submitted via the TORQUE job scheduler requesting 12 cores and 5 GiB of memory per job.  The second level of futures will be evaluated using multiprocessing using the 12 cores given to each job by the scheduler.\n\nA similar filename format is used for the other types of job schedulers supported.  For instance, for Slurm the template file should be named `.BatchJobs.slurm.tmpl` in order for\n```r\n\u003e plan(batchjobs_slurm)\n```\nto locate the file automatically.  To specify this template file explicitly, use argument `pathname`, e.g.\n```r\n\u003e plan(batchjobs_slurm, pathname=\"./.BatchJobs.slurm.tmpl\")\n```\n\n\nNote that it is still possible to use a `.BatchJobs.R` and load the template file using a standard BatchJobs approach for maximum control.  For further details and examples on how to configure BatchJobs per se, see the [BatchJobs configuration] wiki page.\n\n\n\n## Demos\nThe [future] package provides a demo using futures for calculating a\nset of Mandelbrot planes.  The demo does not assume anything about\nwhat type of futures are used.\n_The user has full control of how futures are evaluated_.\nFor instance, to use local BatchJobs futures, run the demo as:\n```r\nlibrary(\"future.BatchJobs\")\nplan(batchjobs_local)\ndemo(\"mandelbrot\", package=\"future\", ask=FALSE)\n```\n\n\n[BatchJobs]: https://cran.r-project.org/package=BatchJobs\n[brew]: https://cran.r-project.org/package=brew\n[future]: https://cran.r-project.org/package=future\n[future.BatchJobs]: https://cran.r-project.org/package=future.BatchJobs\n[BatchJobs configuration]: https://github.com/tudo-r/BatchJobs/wiki/Configuration\n[TORQUE]: https://en.wikipedia.org/wiki/TORQUE\n[Slurm]: https://en.wikipedia.org/wiki/Slurm_Workload_Manager\n[Sun/Oracle Grid Engine (SGE)]: https://en.wikipedia.org/wiki/Oracle_Grid_Engine\n[Load Sharing Facility (LSF)]: https://en.wikipedia.org/wiki/Platform_LSF\n[OpenLava]: https://en.wikipedia.org/wiki/OpenLava\n\n\n## Installation\nR package future.BatchJobs is only available via [GitHub](https://github.com/HenrikBengtsson/future.BatchJobs) and can be installed in R as:\n```r\nremotes::install_github(\"HenrikBengtsson/future.BatchJobs\", ref=\"master\")\n```\n\n\n### Pre-release version\n\nTo install the pre-release version that is available in Git branch `develop` on GitHub, use:\n```r\nremotes::install_github(\"HenrikBengtsson/future.BatchJobs\", ref=\"develop\")\n```\nThis will install the package from source.  \n\n## Contributions\n\nThis Git repository uses the [Git Flow](https://nvie.com/posts/a-successful-git-branching-model/) branching model (the [`git flow`](https://github.com/petervanderdoes/gitflow-avh) extension is useful for this).  The [`develop`](https://github.com/HenrikBengtsson/future.BatchJobs/tree/develop) branch contains the latest contributions and other code that will appear in the next release, and the [`master`](https://github.com/HenrikBengtsson/future.BatchJobs) branch contains the code of the latest release.\n\nContributing to this package is easy.  Just send a [pull request](https://help.github.com/articles/using-pull-requests/).  When you send your PR, make sure `develop` is the destination branch on the [future.BatchJobs repository](https://github.com/HenrikBengtsson/future.BatchJobs).  Your PR should pass `R CMD check --as-cran`, which will also be checked by \u003ca href=\"https://travis-ci.org/HenrikBengtsson/future.BatchJobs\"\u003eTravis CI\u003c/a\u003e and \u003ca href=\"https://ci.appveyor.com/project/HenrikBengtsson/future-batchjobs\"\u003eAppVeyor CI\u003c/a\u003e when the PR is submitted.\n\nWe abide to the [Code of Conduct](https://www.contributor-covenant.org/version/2/0/code_of_conduct/) of Contributor Covenant.\n\n\n## Software status\n\n| Resource      | GitHub        | GitHub Actions      | Travis CI       | AppVeyor CI      |\n| ------------- | ------------------- | ------------------- | --------------- | ---------------- |\n| _Platforms:_  | _Multiple_          | _Multiple_          | _Linux \u0026 macOS_ | _Windows_        |\n| R CMD check   |  |        | \u003ca href=\"https://travis-ci.org/HenrikBengtsson/future.BatchJobs\"\u003e\u003cimg src=\"https://travis-ci.org/HenrikBengtsson/future.BatchJobs.svg\" alt=\"Build status\"\u003e\u003c/a\u003e   | \u003ca href=\"https://ci.appveyor.com/project/HenrikBengtsson/future-batchjobs\"\u003e\u003cimg src=\"https://ci.appveyor.com/api/projects/status/github/HenrikBengtsson/future.BatchJobs?svg=true\" alt=\"Build status\"\u003e\u003c/a\u003e |\n| Test coverage |                     |                     | \u003ca href=\"https://codecov.io/gh/HenrikBengtsson/future.BatchJobs\"\u003e\u003cimg src=\"https://codecov.io/gh/HenrikBengtsson/future.BatchJobs/branch/develop/graph/badge.svg\" alt=\"Coverage Status\"/\u003e\u003c/a\u003e     |                  |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrikbengtsson%2Ffuture.batchjobs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenrikbengtsson%2Ffuture.batchjobs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrikbengtsson%2Ffuture.batchjobs/lists"}