{"id":32202539,"url":"https://github.com/akersting/bettermc","last_synced_at":"2026-03-07T22:31:28.807Z","repository":{"id":50362290,"uuid":"317818875","full_name":"akersting/bettermc","owner":"akersting","description":"Enhanced Fork-Based Parallelization for R","archived":false,"fork":false,"pushed_at":"2025-06-23T11:53:47.000Z","size":504,"stargazers_count":16,"open_issues_count":4,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2026-02-20T19:03:01.948Z","etag":null,"topics":["parallelization","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akersting.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-12-02T09:57:12.000Z","updated_at":"2025-09-08T18:20:25.000Z","dependencies_parsed_at":"2025-09-08T16:35:53.796Z","dependency_job_id":null,"html_url":"https://github.com/akersting/bettermc","commit_stats":null,"previous_names":["akersting/bettermc"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/akersting/bettermc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akersting%2Fbettermc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akersting%2Fbettermc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akersting%2Fbettermc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akersting%2Fbettermc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akersting","download_url":"https://codeload.github.com/akersting/bettermc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akersting%2Fbettermc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30234499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T19:01:10.287Z","status":"ssl_error","status_checked_at":"2026-03-07T18:59:58.103Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parallelization","r"],"created_at":"2025-10-22T04:02:36.814Z","updated_at":"2026-03-07T22:31:28.506Z","avatar_url":"https://github.com/akersting.png","language":"R","readme":"---\noutput:\n  md_document:\n    variant: gfm\n---\n\n```{r, include=FALSE}\nknitr::opts_chunk$set(cache = TRUE)\n\n# knitr hook function to allow an output.lines option\n# e.g., \n#   output.lines=12 prints lines 1:12 ...\n#   output.lines=1:12 does the same\n#   output.lines=3:15 prints lines ... 3:15 ...\n#   output.lines=-(1:8) removes lines 1:8 and prints ... 9:n ...\n#   No allowance for anything but a consecutive range of lines\n#   \n# adopted from https://stackoverflow.com/a/23205752\n\ncreate_output_hook \u003c- function(type) {\n  hook_output \u003c- knitr::knit_hooks$get(type)\n  function(x, options) {\n    lines \u003c- options$output.lines\n    if (is.null(lines)) {\n      return(hook_output(x, options))  # pass to default hook\n    }\n    x \u003c- unlist(strsplit(x, \"\\n\"))\n    more \u003c- \"...\"\n    if (length(lines) == 1) {  # first n lines\n      if (length(x) \u003e lines) {\n        # truncate the output, but add ...\n        x \u003c- c(head(x, lines), more)\n      }\n    } else {\n      x \u003c- c(if (abs(lines[1]) \u003e 1) more else NULL, \n             x[lines], \n             if (length(x) \u003e lines[abs(length(lines))]) more else NULL\n      )\n    }\n    # paste these lines together\n    x \u003c- paste(c(x, \"\"), collapse = \"\\n\")\n    hook_output(x, options)\n  }\n}\n\nknitr::knit_hooks$set(output = create_output_hook(\"output\"))\nknitr::knit_hooks$set(error = create_output_hook(\"error\"))\nknitr::knit_hooks$set(warning = create_output_hook(\"warning\"))\nknitr::knit_hooks$set(message = create_output_hook(\"message\"))\n```\n\n# bettermc\n\n[![R build status](https://github.com/gfkse/bettermc/workflows/R-CMD-check/badge.svg)](https://github.com/gfkse/bettermc/actions?workflow=R-CMD-check)\n[![codecov](https://codecov.io/gh/gfkse/bettermc/branch/master/graph/badge.svg?token=FYYM156COF)](https://codecov.io/gh/gfkse/bettermc)\n\nThe `bettermc` package provides a wrapper around the `parallel::mclapply` function for better performance, error handling, seeding and UX. \n\n## Installation of the Development Version\n```{r, eval = FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"gfkse/bettermc\")\n```\n\n## Supported Platforms\n`bettermc` was originally developed for 64-bit Linux. \nBy now it should also compile and run on 32-bit systems, and on macOS and Solaris.\nHowever, as stated in the respective help pages, not all features are supported on macOS.\nPorting to other POSIX-compliant Unix flavors should be fairly straightforward.\nThe Windows support is very limited and mainly provided for compatibility reasons only, i.e. to allow the *serial* execution of code using `bettermc::mclapply`, which was originally developed for Linux or macOS.\n\n## Features\nHere is a short overview on its main features ...\n\n### Progress Bar\n![progress bar](progress.png)\n\n### Error Handling, Tracebacks and Crashdumps\nBy default, crashdumps and full tracebacks are generated on errors in child processes:\n```{r traceback, error=TRUE, output.lines=10}\ng \u003c- function(x) x + 1\nf \u003c- function(x) g(as.character(x))\nbettermc::mclapply(1:2, f, mc.dumpto = \"last.dump\")\n```\n```{r crashdump}\n# in an interactive session use debugger() instead of print() for actual debugging\nprint(attr(bettermc::crash_dumps[[\"last.dump\"]][[1]], \"dump.frames\"))\n```\n\nAs shown in the example above, `bettermc` by default fails if there are errors in child processes.\nThis behavior can be changed to merely warn about both fatal and non-fatal error:\n```{r allow_errors, output.lines=10}\nret \u003c- bettermc::mclapply(1:4, function(i) {\n  if (i == 1L)\n    stop(i)\n  else if (i == 4L)\n    system(paste0(\"kill \", Sys.getpid()))\n  NULL\n}, mc.allow.fatal = TRUE, mc.allow.error = TRUE, mc.preschedule = FALSE)\n```\n\nAlso in this case, full tracebacks and crash dumps are available:\n```{r crashdump_allow_errors}\nstopifnot(inherits(ret[[1]], \"try-error\"))\nnames(attributes(ret[[1L]]))\n```\n\nAdditionally, results affected by fatal errors are clearly indicated and can be differentiated from legitimate `NULL` values:\n```{r fatal-error}\nlapply(ret, class)\n```\n\nYou can use `mc.allow.fatal = NULL` to instead return `NULL` on fatal errors as does `parallel::mclapply`.\n\n### Output, Messages and Warnings\nIn contrast to `parallel::mclapply`, neither output nor messages nor warnings from the child processes are lost.\nAll of these can be forwarded to the parent process and are prefixed with the index of the function invocation from which they originate:\n```{r output}\nf \u003c- function(i) {\n  if (i == 1) message(letters[i])\n  else if (i == 2) warning(letters[i])\n  else print(letters[i])\n  \n  i\n}\nret \u003c- bettermc::mclapply(1:4, f)\n```\nSimilarly, other conditions can also be re-signaled in the parent process.\n\n### Reproducible Seeding\nBy default, `bettermc` reproducibly seeds all function calls:\n```{r seeding}\nset.seed(538)\na \u003c- bettermc::mclapply(1:4, function(i) runif(1), mc.cores = 3)\nset.seed(538)\nb \u003c- bettermc::mclapply(1:4, function(i) runif(1), mc.cores = 1)\na\nstopifnot(identical(a, b))\n```\n\nCompare with `parallel`:\n```{r seeding_parallel, error=TRUE}\nset.seed(594)\na \u003c- parallel::mclapply(1:4, function(i) runif(1), mc.cores = 3)\nset.seed(594)\nb \u003c- parallel::mclapply(1:4, function(i) runif(1), mc.cores = 3)\nstopifnot(identical(a, b))\n```\n\n\n### POSIX Shared Memory\nMany types of objects can be returned from the child processes using POSIX shared memory.\nThis includes e.g. logical, numeric, complex and raw vectors and arrays as well as factors.\nIn doing so, the overhead of getting larger results back into the parent R process is reduced:\n```{r shm_performance}\nX \u003c- data.frame(\n  x = runif(3e7),\n  y = sample(c(TRUE, FALSE), 3e7, TRUE),\n  z = 1:3e7\n)\nf \u003c- function(i) X\n\nmicrobenchmark::microbenchmark(\n  bettermc1 = bettermc::mclapply(1:2, f, mc.share.copy = FALSE),\n  bettermc2 = bettermc::mclapply(1:2, f),\n  bettermc3 = bettermc::mclapply(1:2, f, mc.share.vectors = FALSE),\n  bettermc4 = bettermc::mclapply(1:2, f, mc.share.vectors = FALSE, mc.shm.ipc = FALSE),\n  parallel = parallel::mclapply(1:2, f),\n  times = 10, setup = gc()\n)\n```\n\nIn examples `bettermc1` and `bettermc2`, the child processes place the columns of the return value `X` in shared memory. \nThe object which needs to be serialized for transfer from child to parent processes hence becomes:\n```{r vectors2shm}\nX_shm \u003c- bettermc:::vectors2shm(X, name_prefix = \"/bettermc_README_\")\nstr(X_shm)\n```\n\nColumn `z` is an `ALTREP` and, because it can be serialized efficiently, is left alone by default. \nThe parent process can recover the original object from `X_shm`:\n```{r shm2vectors}\nY \u003c- bettermc:::shm2vectors(X_shm)\nstopifnot(identical(X, Y))\n```\n\nThe internal functions `vectors2shm()` and `shm2vectors()` recursively walk the return value and apply the exported functions `copy2shm()` and `allocate_from_shm()`, respectively.\n\nIn `bettermc1`, the shared memory objects are used directly by the parent process. \nIn `bettermc2`, which is the default, new vectors are allocated in the parent process and the data is merely copied from the shared memory objects, which are freed afterwards. See `?copy2shm` for more details on this topic and why the slower `mc.share.copy = TRUE` might be a sensible default.\n\nIn `bettermc3`, the original `X` is serialized and the resulting raw vector is placed in shared memory, from where it is deserialized in the parent process.\n\n`bettermc4` does not involve any POSIX shared memory and hence is equivalent to `parallel`, i.e. the original `X` is serialized and transferred to the parent process using pipes.\n\n### Character Compression\nIn practice, character vectors often contain a substantial amount of duplicated values.\nThis is exploited by `bettermc` to speed up the returning of larger character vectors from child processes:\n```{r char_compression}\nX \u003c- rep(as.character(runif(1e6)), 30)\nf \u003c- function(i) X\nmicrobenchmark::microbenchmark(\n  bettermc1 = bettermc::mclapply(1:2, f),\n  parallel =  parallel::mclapply(1:2, f),\n  times = 1, setup = gc()\n)\n```\n\nBy default, `bettermc` replaces character vectors with objects of type `char_map` before returning them to the parent process:\n```{r compress_chars}\nX_comp \u003c- bettermc::compress_chars(X)\nstr(X_comp)\n```\n\nThe important detail here is the length of the `chars` vector, which just contains the unique elements of `X` and hence is significantly faster to (de)serialize than the original vector. The parent process can recover the original character vectors:\n\n```{r uncompress_chars}\nY \u003c- bettermc::uncompress_chars(X_comp)\nstopifnot(identical(X, Y))\n```\n\nThe functions `compress_chars()` and `uncompress_chars()` recursively walk the return value and apply the functions `char_map()` and `map2char()`, respectively.\n\n`char_map()` is implemented using a radix sort, which makes it very efficient:\n```{r char_map}\nmicrobenchmark::microbenchmark(\n  char_map = bettermc::char_map(X),\n  match = {chars \u003c- unique(X); idx \u003c- match(X, chars)},\n  times = 3, setup = gc()\n)\n```\n\n### Retries\n`bettermc` supports automatic retries on both fatal and non-fatal errors. \n`mc.force.fork` ensures that `FUN` is called in a child process, even if `X` is of length 1.\nThis is useful if `FUN` might encounter a fatal error and we want to protect the parent process against it.\nWith retires, `length(X)` might drop to 1 if all other values could already be processed.\nThis is also why we need `mc.force.fork` in the following example:\n```{r retry, output.lines=10}\nset.seed(456)\nres \u003c-\n  bettermc::mclapply(1:20, function(i) {\n    r \u003c- runif(1)\n    if (r \u003c 0.25)\n      system(paste0(\"kill \", Sys.getpid()))\n    else if (r \u003c 0.5)\n      stop(i)\n    else\n      i\n  }, mc.retry = 50, mc.cores = 10, mc.force.fork = TRUE)\nstopifnot(identical(res, as.list(1:20)))\n```\n\nAdditionally, it is possible to automatically decrease the number of cores with every retry by specifying a negative value for `mc.retry`.\nThis is useful if we expect failures to be caused simply by too many concurrent processes, e.g. if system load or the size of input data is unpredictable and might lead to the Linux Out Of Memory Killer stepping in.\nIn such a case it makes sense to retry using fewer concurrent processes:\n```{r neg_retry}\nppid \u003c- Sys.getpid()\nres \u003c-\n  bettermc::mclapply(1:20, function(i) {\n    Sys.sleep(0.25)  # wait for the other child processes\n    number_of_child_processes \u003c- length(system(paste0(\"pgrep -P \", ppid), intern = TRUE))\n    if (number_of_child_processes \u003e= 5) system(paste0(\"kill \", Sys.getpid()))\n    i\n  }, mc.retry = -3, mc.cores = 10, mc.force.fork = TRUE)\n\nstopifnot(identical(res, as.list(1:20)))\n```\n\nIf there are still errors after the retries, we regularly fail:\n```{r retry_failing, error=TRUE, output.lines=10}\nset.seed(123)\nres \u003c-\n  bettermc::mclapply(1:20, function(i) {\n    r \u003c- runif(1)\n    if (r \u003c 0.25)\n      system(paste0(\"kill \", Sys.getpid()))\n    else if (r \u003c 0.5)\n      stop(i)\n    else\n      i\n  }, mc.retry = 1, mc.cores = 10, mc.force.fork = TRUE, mc.dumpto = \"last.dump\")\n```\n\n\n### Timeouts\n`bettermc` can kill child processes after a certain time has elapsed:\n\n```{r timeout}\nbettermc::mclapply(X = 1:4, function(i) {\n  Sys.sleep(i * 2)\n  i\n},\n  mc.preschedule = FALSE, mc.allow.fatal = NA,\n  mc.timeout.elapsed = 5, mc.force.fork = TRUE\n)\n```\n\nIt is recommended to use `mc.force.fork = TRUE` to not accidentally kill the main R process if `X` is of length 1.\n\nIn contrast to `base::setTimeLimit()` and hence `R.utils::withTimeout()`, it does *not* suffer from the following limitation:\n\n\u003e Time limits are checked whenever a user interrupt could occur. This will happen frequently in R code and during Sys.sleep(*), but only at points in compiled C and Fortran code identified by the code author.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakersting%2Fbettermc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakersting%2Fbettermc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakersting%2Fbettermc/lists"}