{"id":24529948,"url":"https://github.com/mschubert/clustermq","last_synced_at":"2026-04-23T13:01:23.991Z","repository":{"id":47050038,"uuid":"61437975","full_name":"mschubert/clustermq","owner":"mschubert","description":"R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH","archived":false,"fork":false,"pushed_at":"2026-04-23T08:15:26.000Z","size":6385,"stargazers_count":153,"open_issues_count":21,"forks_count":29,"subscribers_count":6,"default_branch":"master","last_synced_at":"2026-04-23T10:20:28.942Z","etag":null,"topics":["cluster","high-performance-computing","lsf","r-package","sge","slurm","ssh"],"latest_commit_sha":null,"homepage":"https://mschubert.github.io/clustermq/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mschubert.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-06-18T14:48:56.000Z","updated_at":"2026-03-28T19:24:53.000Z","dependencies_parsed_at":"2023-10-14T14:13:05.016Z","dependency_job_id":"e77a5205-10c2-499b-874e-1215cf3f510d","html_url":"https://github.com/mschubert/clustermq","commit_stats":{"total_commits":1063,"total_committers":14,"mean_commits":75.92857142857143,"dds":"0.024459078080903085","last_synced_commit":"3f4c72b00a3758f699ab9f3c74fa091f05033a47"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"purl":"pkg:github/mschubert/clustermq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mschubert%2Fclustermq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mschubert%2Fclustermq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mschubert%2Fclustermq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mschubert%2Fclustermq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mschubert","download_url":"https://codeload.github.com/mschubert/clustermq/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mschubert%2Fclustermq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32181374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T11:42:27.955Z","status":"ssl_error","status_checked_at":"2026-04-23T11:42:18.877Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","high-performance-computing","lsf","r-package","sge","slurm","ssh"],"created_at":"2025-01-22T07:53:12.354Z","updated_at":"2026-04-23T13:01:23.985Z","avatar_url":"https://github.com/mschubert.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"ClusterMQ: send R function calls as cluster jobs\n================================================\n\n[![CRAN version](https://www.r-pkg.org/badges/version/clustermq)](https://cran.r-project.org/package=clustermq)\n[![Build Status](https://github.com/mschubert/clustermq/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/mschubert/clustermq/actions)\n[![CRAN downloads](https://cranlogs.r-pkg.org/badges/clustermq)](https://cran.r-project.org/package=clustermq)\n[![DOI](https://zenodo.org/badge/DOI/10.1093/bioinformatics/btz284.svg)](https://doi.org/10.1093/bioinformatics/btz284)\n\nThis package will allow you to send function calls as jobs on a computing\ncluster with a minimal interface provided by the `Q` function:\n\n```r\n# install the package if you haven't done so yet\nSys.setenv(CLUSTERMQ_AUTO_LIBZMQ=1)\ninstall.packages('clustermq')\n\n# queue a function call on your scheduler\nlibrary(clustermq)\nfx = function(x) x * 2\nQ(fx, x=1:3, n_jobs=1)\n# list(2,4,6)\n```\n\nComputations are done [entirely on the network](https://zeromq.org/)\nand without any temporary files on network-mounted storage, so there is no\nstrain on the file system apart from starting up R once per job. All\ncalculations are load-balanced, i.e. workers that get their jobs done faster\nwill also receive more function calls to work on. This is especially useful if\nnot all calls return after the same time, or one worker has a high load.\n\nBrowse the vignettes here:\n\n* [User Guide](https://mschubert.github.io/clustermq/articles/userguide.html)\n* [Technical Documentation](https://mschubert.github.io/clustermq/articles/technicaldocs.html)\n* [FAQ](https://mschubert.github.io/clustermq/articles/faq.html)\n\nSchedulers\n----------\n\nAn HPC cluster's scheduler ensures that computing jobs are distributed to\navailable worker nodes. Hence, this is what clustermq interfaces with in order\nto do computations.\n\nWe currently support the [following\nschedulers](https://mschubert.github.io/clustermq/articles/userguide.html#configuration)\n(either locally or via SSH):\n\n* [Multiprocess](https://mschubert.github.io/clustermq/articles/userguide.html#local-parallelization) -\n  *test your calls and parallelize on cores using* `options(clustermq.scheduler=\"multiprocess\")`\n* [SLURM](https://mschubert.github.io/clustermq/articles/userguide.html#slurm) - *should work without setup*\n* [LSF](https://mschubert.github.io/clustermq/articles/userguide.html#lsf) - *should work without setup*\n* [SGE](https://mschubert.github.io/clustermq/articles/userguide.html#sge) - *may require configuration*\n* [GCS](https://mschubert.github.io/clustermq/articles/userguide.html#gcs)/[OCS](https://mschubert.github.io/clustermq/articles/userguide.html#ocs) - *needs* `options(clustermq.scheduler=\"GCS\"/\"OCS\")`\n* [PBS](https://mschubert.github.io/clustermq/articles/userguide.html#pbs)/[Torque](https://mschubert.github.io/clustermq/articles/userguide.html#torque) - *needs* `options(clustermq.scheduler=\"PBS\"/\"Torque\")`\n* via [SSH](https://mschubert.github.io/clustermq/articles/userguide.html#ssh-connector) -\n*needs* `options(clustermq.scheduler=\"ssh\", clustermq.ssh.host=\u003cyourhost\u003e)`\n\n\u003e [!TIP]\n\u003e Follow the links above to configure your scheduler in case it is not working\n\u003e out of the box and check the\n\u003e [FAQ](https://mschubert.github.io/clustermq/articles/faq.html) if\n\u003e your job submission errors or gets stuck\n\nUsage\n-----\n\nThe most common arguments for `Q` are:\n\n * `fun` - The function to call. This needs to be self-sufficient (because it\n        will not have access to the `master` environment)\n * `...` - All iterated arguments passed to the function. If there is more than\n        one, all of them need to be named\n * `const` - A named list of non-iterated arguments passed to `fun`\n * `export` - A named list of objects to export to the worker environment\n\nThe documentation for other arguments can be accessed by typing `?Q`. Examples\nof using `const` and `export` would be:\n\n```r\n# adding a constant argument\nfx = function(x, y) x * 2 + y\nQ(fx, x=1:3, const=list(y=10), n_jobs=1)\n\n# exporting an object to workers\nfx = function(x) x * 2 + y\nQ(fx, x=1:3, export=list(y=10), n_jobs=1)\n```\n\nWe can also use `clustermq` as a parallel backend in\n[`foreach`](https://cran.r-project.org/package=foreach) or\n[`BiocParallel`](https://bioconductor.org/packages/release/bioc/html/BiocParallel.html):\n\n```r\n# using foreach\nlibrary(foreach)\nregister_dopar_cmq(n_jobs=2, memory=1024) # see `?workers` for arguments\nforeach(i=1:3) %dopar% sqrt(i) # this will be executed as jobs\n\n# using BiocParallel\nlibrary(BiocParallel)\nregister(DoparParam()) # after register_dopar_cmq(...)\nbplapply(1:3, sqrt)\n```\n\nMore examples are available in [the\nUser Guide](https://mschubert.github.io/clustermq/articles/userguide.html).\n\nComparison to other packages\n----------------------------\n\nThere are some packages that provide high-level parallelization of R function calls\non a computing cluster. We compared `clustermq` to `BatchJobs` and `batchtools` for\nprocessing many short-running jobs, and found it to have approximately 1000x less\noverhead cost.\n\n![Overhead comparison](http://image.ibb.co/cRgYNR/plot.png)\n\nIn short, use `clustermq` if you want:\n\n* a one-line solution to run cluster jobs with minimal setup\n* access cluster functions from your local Rstudio via SSH\n* fast processing of many function calls without network storage I/O\n\nUse [`batchtools`](https://github.com/mlr-org/batchtools) if you:\n\n* want to use a mature and well-tested package\n* don't mind that arguments to every call are written to/read from disc\n* don't mind there's no load-balancing at run-time\n\nUse [Snakemake](https://snakemake.readthedocs.io/en/latest/) or\n[`targets`](https://github.com/ropensci/targets) if:\n\n* you want to design and run a workflow on HPC\n\nDon't use [`batch`](https://cran.r-project.org/package=batch)\n(last updated 2013) or [`BatchJobs`](https://github.com/tudo-r/BatchJobs)\n(issues with SQLite on network-mounted storage).\n\nContributing\n------------\n\nContributions are welcome and they come in many different forms, shapes, and\nsizes. These include, but are not limited to:\n\n* **Questions**: Ask on the [Github\n  Discussions](https://github.com/mschubert/clustermq/discussions) board. If\n  you are an advanced user, please also consider answering questions there.\n* **Bug reports**: [File an issue](https://github.com/mschubert/clustermq/issues)\n  if something does not work as expected. Be sure to\n  include a self-contained [Minimal Reproducible\n  Example](https://stackoverflow.com/help/minimal-reproducible-example) and set\n  `log_worker=TRUE`.\n* **Code contributions**: Have a look at the [`good first\n  issue`](https://github.com/mschubert/clustermq/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)\n  tag. Please discuss anything more complicated before putting a lot of work\n  in, I'm happy to help you get started.\n\n\u003e [!TIP]\n\u003e Check the\n\u003e [User Guide](https://mschubert.github.io/clustermq/articles/userguide.html) and the\n\u003e [FAQ](https://mschubert.github.io/clustermq/articles/faq.html) first, maybe\n\u003e your query is already answered there\n\nCitation\n--------\n\nThis project is part of my academic work, for which I will be evaluated on\ncitations. If you like me to be able to continue working on research support\ntools like `clustermq`, please cite the article when using it for publications:\n\n\u003e M Schubert. clustermq enables efficient parallelisation of genomic analyses.\n\u003e *Bioinformatics* (2019).\n\u003e [doi:10.1093/bioinformatics/btz284](https://doi.org/10.1093/bioinformatics/btz284)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmschubert%2Fclustermq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmschubert%2Fclustermq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmschubert%2Fclustermq/lists"}