{"id":19629117,"url":"https://github.com/jiefei-wang/sharedobject","last_synced_at":"2025-04-28T06:31:28.861Z","repository":{"id":37493943,"uuid":"181734164","full_name":"Jiefei-Wang/SharedObject","owner":"Jiefei-Wang","description":"Sharing R objects across multiple R processes without duplicating the object in memory","archived":false,"fork":false,"pushed_at":"2024-06-05T15:16:11.000Z","size":512,"stargazers_count":42,"open_issues_count":4,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-06-05T17:21:20.951Z","etag":null,"topics":["sharedobject"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jiefei-Wang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-16T17:15:16.000Z","updated_at":"2024-06-05T15:16:15.000Z","dependencies_parsed_at":"2024-06-05T17:22:44.042Z","dependency_job_id":null,"html_url":"https://github.com/Jiefei-Wang/SharedObject","commit_stats":{"total_commits":228,"total_committers":4,"mean_commits":57.0,"dds":0.07017543859649122,"last_synced_commit":"4ca85743117d9b6ae72a72869e7a61ae0b72d55a"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiefei-Wang%2FSharedObject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiefei-Wang%2FSharedObject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiefei-Wang%2FSharedObject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jiefei-Wang%2FSharedObject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jiefei-Wang","download_url":"https://codeload.github.com/Jiefei-Wang/SharedObject/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224099628,"owners_count":17255578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sharedobject"],"created_at":"2024-11-11T11:57:56.330Z","updated_at":"2024-11-11T11:57:57.091Z","avatar_url":"https://github.com/Jiefei-Wang.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\n`SharedObject` is designed for sharing data across many R workers. It allows multiple workers to read and write the same R object located in the same memory location. This feature is useful in parallel computing when a large R object needs to be read by all R workers. It has the potential to reduce the memory consumption and the overhead of data transmission. \n\n\n# Quick example\n## Creating a shared object from an existing object\nTo share an R object, all you need to do is to call the `share` function with the object you want to share. In this example, we will create a 3-by-3 matrix `A1` and use the function `share` to make a shared object `A2`\n\n```r\n## Create data\nA1 \u003c- matrix(1:9, 3, 3)\n## Create a shared object\nA2 \u003c- share(A1)\n```\nThere is no visible difference between the matrix `A1` and the shared matrix `A2`. There is no need to change the existing code to work with the shared object. We can verify this through\n\n```r\n## Check the data\nA1\n#\u003e      [,1] [,2] [,3]\n#\u003e [1,]    1    4    7\n#\u003e [2,]    2    5    8\n#\u003e [3,]    3    6    9\nA2\n#\u003e      [,1] [,2] [,3]\n#\u003e [1,]    1    4    7\n#\u003e [2,]    2    5    8\n#\u003e [3,]    3    6    9\n\n## Check if they are identical\nidentical(A1, A2)\n#\u003e [1] TRUE\n```\nUsers can treat the shared object `A2` as a regular matrix and do operations on it as usual. The function `is.shared` can be used to check whether an object is shared.\n\n```r\n## Check if an object is shared\nis.shared(A1)\n#\u003e [1] FALSE\nis.shared(A2)\n#\u003e [1] TRUE\n```\nThe object `A2` should work with any parallel package including `BiocParallel`. In this vignette we will simply use the `parallel` package to export the object `A2`.\n\n```r\nlibrary(parallel)\n## Create a cluster with only 1 worker\ncl \u003c- makeCluster(1)\nclusterExport(cl, \"A2\")\n## Check if the object is still a shared object\nclusterEvalQ(cl, SharedObject::is.shared(A2))\n#\u003e [[1]]\n#\u003e [1] TRUE\nstopCluster(cl)\n```\nWhen a shared object is exported to the other R workers, only the data ID along with some basic information of the shared object will be sent to the workers. We can see the exported data from the `serialize` function.\n\n```r\n## make a larger vector\nx1 \u003c- rep(0, 10000)\nx2 \u003c- share(x1)\n\n## This is the actual data that will\n## be sent to the other R workers\ndata1 \u003c-serialize(x1, NULL)\ndata2 \u003c-serialize(x2, NULL)\n\n## Check the size of the data\nlength(data1)\n#\u003e [1] 80032\nlength(data2)\n#\u003e [1] 391\n```\nAs we see from the example, the size of the shared object `x2` is significantly smaller than the size of the regular R object `x1`. When workers receive the shared object `x2`, they can get the data from the memory using the memory ID. Therefore, there is no memory allocation for the data of `x2` in the workers. \n## Creating a shared object from scratch\nAnalogy to the `vector` function in R, the shared object can also be made from scratch.\n\n```r\nSharedObject(mode = \"integer\", length = 6)\n#\u003e [1] 0 0 0 0 0 0\n```\nYou can attach the attributes to `x` when creating the empty shared object. For example\n\n```r\nSharedObject(mode = \"integer\", length = 6, attrib = list(dim = c(2L, 3L)))\n#\u003e      [,1] [,2] [,3]\n#\u003e [1,]    0    0    0\n#\u003e [2,]    0    0    0\n```\nPlease refer to `?SharedObject` for the details of the function.\n\n\n## Properties of the shared object\nThere are several properties associated with the shared object, one can check them via\n\n```r\n## get a summary report\nsharedObjectProperties(A2)\n#\u003e $dataId\n#\u003e [1] \"28\"\n#\u003e \n#\u003e $length\n#\u003e [1] 9\n#\u003e \n#\u003e $totalSize\n#\u003e [1] 36\n#\u003e \n#\u003e $dataType\n#\u003e [1] 13\n#\u003e \n#\u003e $ownData\n#\u003e [1] TRUE\n#\u003e \n#\u003e $copyOnWrite\n#\u003e [1] TRUE\n#\u003e \n#\u003e $sharedSubset\n#\u003e [1] FALSE\n#\u003e \n#\u003e $sharedCopy\n#\u003e [1] FALSE\n```\nwhere `dataId` is the memory ID that will be used to find the shared memory, `length` and `totalSize` are pretty self-explained, `dataType` is the type ID of the R object, `ownData` determines whether the shared memory will be released after the shared object is freed in the current process. `copyOnWrite`, `sharedSubset` and `sharedCopy` control the procedures of data writing, subsetting and duplication. please see `Package options` and `Advanced topics` sections to see the meaning of the properties and how to use them properly.\n\nNote that most properties in a shared object are not mutable, only `copyOnWrite`, `sharedSubset` and `sharedCopy` are allowed to be changed. The properties can be viewed by `getCopyOnWrite`, `getSharedSubset` and `getSharedCopy` and set via `setCopyOnWrite`, `setSharedSubset` and `setSharedCopy`.\n\n```r\n## get the individual properties\ngetCopyOnWrite(A2)\n#\u003e [1] TRUE\ngetSharedSubset(A2)\n#\u003e [1] FALSE\ngetSharedCopy(A2)\n#\u003e [1] FALSE\n\n## set the individual properties\nsetCopyOnWrite(A2, FALSE)\nsetSharedSubset(A2, TRUE)\nsetSharedCopy(A2, TRUE)\n\n## Check if the change has been made\ngetCopyOnWrite(A2)\n#\u003e [1] FALSE\ngetSharedSubset(A2)\n#\u003e [1] TRUE\ngetSharedCopy(A2)\n#\u003e [1] TRUE\n```\n\n# Supported data types and structures\nFor the basic R type, the package supports `raw`, `logical`, `integer`, `numeric`, `complex` and `character`. Note that sharing a character vector is beneficial only when there are a lot repetitions in the elements of the vector. Due to the complicated structure of the character vector, you are not allowed to set the value of a shared character vector to a value which haven't presented in the vector. Therefore, It is recommended to treat the shared character vector as read-only.\n\nFor the container, the package supports `list`, `pairlist` and `environment`. Sharing a container is equivalent to sharing all elements in the container, the container itself will not be shared. Therefore, adding or replacing an element in a shared container in one worker will not implicitly change the shared container in the other workers. Since a data frame is fundamentally a list object, sharing a data frame will follow the same principle. \n\nFor the more complicated data structure like `S3` and `S4` class. They are available out-of-box. Therefore, there is no need to customize the `share` function to support an S3/S4 class. However, if the S3/S4 class has a special design(e.g. on-disk data), the function `share` is an S4 generic and developers are free to define their own `share` method.\n\nWhen an object is not sharable, no error will be given and the same object will be returned. This should be a rare case as most data types are supported. The argument `mustWork = TRUE` can be used if you want to make sure the return value is a shared object.\n\n```r\n## the element `A` is sharable and `B` is not\nx \u003c- list(A = 1:3, B = as.symbol(\"x\"))\n\n## No error will be given, \n## but the element `B` is not shared\nshared_x \u003c- share(x)\n\n## Use the `mustWork` argument\n## An error will be given for the non-sharable object `B`\ntryCatch({\n  shared_x \u003c- share(x, mustWork = TRUE)\n},\nerror=function(msg)message(msg$message)\n)\n#\u003e The object of the class \u003cname\u003e cannot be shared.\n#\u003e To suppress this error and return the same object, \n#\u003e provide `mustWork = FALSE` as a function argument\n#\u003e or change its default value in the package settings\n```\nAs we mentioned before, the package provides `is.shared` function to identify a shared object.\nBy default, `is.shared` function returns a single logical value indicating whether the object is a shared object or contains any shared objects. If the object is a container(e.g. list), you can explore the details using the `depth` parameter.\n\n```r\n## A single logical is returned\nis.shared(shared_x)\n#\u003e [1] TRUE\n## Check each element in x\nis.shared(shared_x, depth = 1)\n#\u003e $A\n#\u003e [1] TRUE\n#\u003e \n#\u003e $B\n#\u003e [1] FALSE\n```\n\n# Package options\nThere are some options that can control the default behavior of a shared object, you can view them via\n\n```r\nsharedObjectPkgOptions()\n#\u003e $mustWork\n#\u003e [1] FALSE\n#\u003e \n#\u003e $sharedAttributes\n#\u003e [1] TRUE\n#\u003e \n#\u003e $copyOnWrite\n#\u003e [1] TRUE\n#\u003e \n#\u003e $sharedSubset\n#\u003e [1] FALSE\n#\u003e \n#\u003e $sharedCopy\n#\u003e [1] FALSE\n#\u003e \n#\u003e $minLength\n#\u003e [1] 3\n```\nAs we have seen previously, the option `mustWork = FALSE` suppress the error message when the function `share` encounter a non-sharable object and force the function to return the same object. `sharedSubset` controls whether the subset of a shared object is still a shared object. `minLength` determines the minimum length of a shared object. An R object will not be shared if its length is less than the minimum length.\n\nWe will talk about the options `copyOnWrite` and `sharedCopy` in the advanced section, but for most users it is safe to ignore them. The global setting can be modified via `sharedObjectPkgOptions`\n\n```r\n## change the default setting\nsharedObjectPkgOptions(mustWork = TRUE)\n\n## Check if the change is made\nsharedObjectPkgOptions(\"mustWork\")\n#\u003e [1] TRUE\n\n## Restore the default\nsharedObjectPkgOptions(mustWork = FALSE)\n```\nNote that the package options can be temporary overwritten by providing named parameters to the function `share`. For example, you can overwrite the package `mustwork` via `share(x, mustWork = TRUE)`.\n\n# Advanced topics\n## Copy-On-Write\nSince all workers are using shared objects located in the same memory location, a change made on a shared object in one worker can affect the value of the object in the other workers. To prevent users from changing the values of a shared object unintentionally, a shared object will duplicate itself if a change of its value is made. For example\n\n```r\nx1 \u003c- share(1:4)\nx2 \u003c- x1\n\n## x2 becames a regular R object after the change\nis.shared(x2)\n#\u003e [1] TRUE\nx2[1] \u003c- 10L\nis.shared(x2)\n#\u003e [1] FALSE\n\n## x1 is not changed\nx1\n#\u003e [1] 1 2 3 4\nx2\n#\u003e [1] 10  2  3  4\n```\nWhen we change the value of `x2`, R will first duplicate the object `x2`, then applies the change. Therefore, although `x1` and `x2` share the same data, the change in `x2` will not affect the value of `x1`. This default behavior can be overwritten by the parameter `copyOnWrite`.\n\n```r\nx1 \u003c- share(1:4, copyOnWrite = FALSE)\nx2 \u003c- x1\n\n## x2 will not be duplicated when a change is made\nis.shared(x2)\n#\u003e [1] TRUE\nx2[1] \u003c- 0L\nis.shared(x2)\n#\u003e [1] TRUE\n\n## x1 has been changed\nx1\n#\u003e [1] 0 2 3 4\nx2\n#\u003e [1] 0 2 3 4\n```\nIf copy-on-write is off, a change in the matrix `x2` causes a change in `x1`. This feature could be potentially useful to collect the results from workers. For example, you can pre-allocate an empty shared object with `copyOnWrite = FALSE` and let the workers write their results back to the shared object. This will avoid the need of sending the data from workers to the main process. However, due to the limitation of R, it is possible to change the value of a shared object unexpectedly. For example\n\n```r\nx \u003c- share(1:4, copyOnWrite = FALSE)\nx\n#\u003e [1] 1 2 3 4\n-x\n#\u003e [1] -1 -2 -3 -4\nx\n#\u003e [1] -1 -2 -3 -4\n```\nThe above example shows a surprising result when the copy-on-write feature is off. Simply calling an unary function can change the values of a shared object. Therefore, users must use this feature with caution. The copy-on-write feature of an object can be set via the `setCopyOnwrite` function or the `copyOnWrite` parameter in the `share` function.\n\n\n```r\n## Create x1 with copy-on-write off\nx1 \u003c- share(1:4, copyOnWrite = FALSE)\nx2 \u003c- x1\n## change the value of x2\nx2[1] \u003c- 0L\n## Both x1 and x2 are affected\nx1\n#\u003e [1] 0 2 3 4\nx2\n#\u003e [1] 0 2 3 4\n\n## Enable copy-on-write\n## x2 is now independent with x1\nsetCopyOnWrite(x2, TRUE)\nx2[2] \u003c- 0L\n## only x2 is affected\nx1\n#\u003e [1] 0 2 3 4\nx2\n#\u003e [1] 0 0 3 4\n```\nThis flexibility provides a way to do safe operations during the computation and return the results without memory duplication.\n\n### Warning\nIf a high-precision value is assigned to a low-precision shared object(E.g. assigning a numeric value to an integer shared object), an implicit type conversion will be triggered for correctly storing the change. The resulting object would be a regular R object, not a shared object. Therefore, the change will not be broadcasted even if the copy-on-write feature is off. Users should be cautious with the data type that a shared object is using.\n\n## Shared copy\nThe options `sharedCopy` determines if the duplication of a shared object is still a shared object. For example\n\n```r\nx1 \u003c- share(1:4)\nx2 \u003c- x1\n## x2 is not shared after the duplication\nis.shared(x2)\n#\u003e [1] TRUE\nx2[1] \u003c- 0L\nis.shared(x2)\n#\u003e [1] FALSE\n\n\nx1 \u003c- share(1:4, sharedCopy = TRUE)\nx2 \u003c- x1\n## x2 is still shared(but different from x1) \n## after the duplication\nis.shared(x2)\n#\u003e [1] TRUE\nx2[1] \u003c- 0L\nis.shared(x2)\n#\u003e [1] TRUE\n```\nFor performance consideration, the default settings are `sharedCopy=FALSE`, but you can turn it on and off at any time via `setSharedCopy`. Please note that `sharedCopy` is only available when `copyOnWrite = TRUE`.\n\n## Listing the shared object \nYou can list the ID of the shared object you have created via\n\n```r\nlistSharedObjects()\n#\u003e    Id  size\n#\u003e 1  28  4096\n#\u003e 2  29 81920\n#\u003e 3  32  4096\n#\u003e 4  34  4096\n#\u003e 5  35  4096\n#\u003e 6  36  4096\n#\u003e 7  37  4096\n#\u003e 8  38  4096\n#\u003e 9  39  4096\n#\u003e 10 40  4096\n```\nGetting a list of shared object should have a rare use case, but it can be useful if you have a memory leaking problem and a shared memory can be manually released by `freeSharedMemory(ID)`.\n\n# Developing package based upon SharedObject\nThe package offers three levels of API to help the package developers to build their own shared object. \n\n## user API\nThe simplest and recommended way to make your own shared object is to define an S4 function `share` in your own package, where you can rely on the existing `share` functions to quickly add the support for an S4 class which is not provided by `SharedObject`. We recommend to use this method to build your package for the developers do not have to bother with the memory management. The package will automatically free the shared object after use.\n\n## R's shared memory API\nIt is a common request to have a low level control to the shared memory. To achieve that, the package exports some low-level R API for the developers who want to have a fine control of their shared objects. These functions are `allocateSharedMemory`, `mapSharedMemory`, `unmapSharedMemory`, `freeSharedMemory`, `hasSharedMemory` and `getSharedMemorySize`. Note that developers are responsible for freeing the shared memory after use. Please see the function documentation for more information\n\n## C++ shared memory API\nFor the most sophisticated package developers, it might be more comfortable to use the C++ API rather than the R API. All the R functions in `SharedObject` are based upon its C++ API. Here is the instruction on show how to use the `SharedObject` C++ API in your package. \n\n### Step 1\nFor using the C++ API, you must add `SharedObject` to the LinkingTo field of the DESCRIPTION file, e.g.,\n```\nLinkingTo: SharedObject\n```\n### Step 2\nIn C++ files, including the header of the shared object `#include \"SharedObject/sharedMemory.h\"`.\n\n### Step 3\nTo compile and link your package successfully against the `SharedObject` C++ library, you must include a src/Makevars file.\n```\nSHARED_OBJECT_LIBS = $(shell echo 'SharedObject:::pkgconfig(\"PKG_LIBS\")'|\\\n\"${R_HOME}/bin/R\" --vanilla --slave)\nSHARED_OBJECT_CPPFLAGS = $(shell echo 'SharedObject:::pkgconfig(\"PKG_CPPFLAGS\")'|\\\n\"${R_HOME}/bin/R\" --vanilla --slave)\n\nPKG_LIBS := $(PKG_LIBS) $(SHARED_OBJECT_LIBS)\nPKG_CPPFLAGS := $(PKG_CPPFLAGS) $(SHARED_OBJECT_CPPFLAGS)\n```\nNote that `$(shell ...)` is GNU make syntax so you should add GNU make to the SystemRequirements field of the DESCRIPTION file of your package, e.g.,\n```\nSystemRequirements: GNU make\n```\n\nYou can find the documentation of the C++ functions in the header file.\n\n# Session Information\n\n```r\nsessionInfo()\n#\u003e R Under development (unstable) (2020-09-03 r79126)\n#\u003e Platform: x86_64-w64-mingw32/x64 (64-bit)\n#\u003e Running under: Windows 10 x64 (build 19041)\n#\u003e \n#\u003e Matrix products: default\n#\u003e \n#\u003e locale:\n#\u003e [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   \n#\u003e [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          \n#\u003e [5] LC_TIME=English_United States.1252    \n#\u003e \n#\u003e attached base packages:\n#\u003e [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     \n#\u003e \n#\u003e other attached packages:\n#\u003e [1] SharedObject_1.5.3 testthat_2.3.2    \n#\u003e \n#\u003e loaded via a namespace (and not attached):\n#\u003e  [1] Rcpp_1.0.5          rstudioapi_0.11     knitr_1.29          magrittr_1.5        BiocGenerics_0.35.4\n#\u003e  [6] pkgload_1.1.0       R6_2.4.1            rlang_0.4.7         fansi_0.4.1         stringr_1.4.0      \n#\u003e [11] tools_4.1.0         xfun_0.16           cli_2.0.2           withr_2.2.0         htmltools_0.5.0    \n#\u003e [16] yaml_2.2.1          assertthat_0.2.1    rprojroot_1.3-2     digest_0.6.25       crayon_1.3.4       \n#\u003e [21] BiocManager_1.30.10 glue_1.4.2          evaluate_0.14       rmarkdown_2.3       stringi_1.4.6      \n#\u003e [26] compiler_4.1.0      desc_1.2.0          backports_1.1.9     BiocStyle_2.17.0\n```\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiefei-wang%2Fsharedobject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjiefei-wang%2Fsharedobject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiefei-wang%2Fsharedobject/lists"}