{"id":22569021,"url":"https://github.com/henrikbengtsson/profmem","last_synced_at":"2025-04-10T12:36:03.667Z","repository":{"id":56935382,"uuid":"60438716","full_name":"HenrikBengtsson/profmem","owner":"HenrikBengtsson","description":"🔧 R package: profmem - Simple Memory Profiling for R","archived":false,"fork":false,"pushed_at":"2025-02-17T08:34:18.000Z","size":188,"stargazers_count":36,"open_issues_count":4,"forks_count":2,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-03-31T15:24:27.690Z","etag":null,"topics":["cran","memory-profiler","package","performance","r","ram"],"latest_commit_sha":null,"homepage":"https://cran.r-project.org/package=profmem","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HenrikBengtsson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-05T02:00:24.000Z","updated_at":"2025-02-17T08:34:24.000Z","dependencies_parsed_at":"2022-08-21T01:10:12.008Z","dependency_job_id":null,"html_url":"https://github.com/HenrikBengtsson/profmem","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Fprofmem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Fprofmem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Fprofmem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenrikBengtsson%2Fprofmem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HenrikBengtsson","download_url":"https://codeload.github.com/HenrikBengtsson/profmem/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248217149,"owners_count":21066633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cran","memory-profiler","package","performance","r","ram"],"created_at":"2024-12-08T00:17:07.623Z","updated_at":"2025-04-10T12:36:03.642Z","avatar_url":"https://github.com/HenrikBengtsson.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003cdiv id=\"badges\"\u003e\u003c!-- pkgdown markup --\u003e\n\u003ca href=\"https://CRAN.R-project.org/web/checks/check_results_profmem.html\"\u003e\u003cimg border=\"0\" src=\"https://www.r-pkg.org/badges/version/profmem\" alt=\"CRAN check status\"/\u003e\u003c/a\u003e \u003ca href=\"https://github.com/HenrikBengtsson/profmem/actions?query=workflow%3AR-CMD-check\"\u003e\u003cimg border=\"0\" src=\"https://github.com/HenrikBengtsson/profmem/actions/workflows/R-CMD-check.yaml/badge.svg?branch=develop\" alt=\"R CMD check status\"/\u003e\u003c/a\u003e     \u003ca href=\"https://app.codecov.io/gh/HenrikBengtsson/profmem\"\u003e\u003cimg border=\"0\" src=\"https://codecov.io/gh/HenrikBengtsson/profmem/branch/develop/graph/badge.svg\" alt=\"Coverage Status\"/\u003e\u003c/a\u003e \n\u003c/div\u003e\n\n# profmem: Simple Memory Profiling for R \n\n## Introduction\n\nThe `profmem()` function of the [profmem] package provides an easy way to profile the memory usage of an R expression.  It logs all memory allocations done in R.  Profiling memory allocations is helpful when we, for instance, try to understand why a certain piece of R code consumes more memory than expected.\n\nThe `profmem()` function builds upon existing memory profiling features available in R.  It logs _every_ memory allocation done by plain R code as well as those done by native code such as C and Fortran.  For each entry, it records the size (in bytes) and the name of the functions on the call stack.\nFor example,\n\n```r\n\u003e library(\"profmem\")\n\u003e options(profmem.threshold = 2000)\n\u003e p \u003c- profmem({\n+     x \u003c- integer(1000)\n+     Y \u003c- matrix(rnorm(n = 10000), nrow = 100)\n+ })\n\u003e p\nRprofmem memory profiling of:\n{\n    x \u003c- integer(1000)\n    Y \u003c- matrix(rnorm(n = 10000), nrow = 100)\n}\nMemory allocations (\u003e= 2000 bytes):\n       what  bytes               calls\n1     alloc   4048           integer()\n2     alloc  80048 matrix() -\u003e rnorm()\n3     alloc   2552 matrix() -\u003e rnorm()\n4     alloc  80048            matrix()\ntotal       166696                    \n```\nFrom this, we find that 4048 bytes are allocated for integer vector `x`, which is because each integer value occupies 4 bytes of memory.  The additional 40 bytes are due to the internal data structure used for each variable R.  The size of this allocation can also be confirmed by the value of `object.size(x)`.\nWe also see that `rnorm()`, which is called via `matrix()`, allocates 80048 + 2552 bytes, where the first one reflects the 10000 double values each occupying 8 bytes.  The second one reflects some unknown allocation done internally by the native code that `rnorm()` uses.\nFinally, the following entry reflects the memory allocation of 80048 bytes done by `matrix()` itself.\n\n\n## An example where memory profiling can make a difference\n\nAssume we want to set a 100-by-100 matrix with missing values except for element (1,1) that we assign to be zero.  This can be done as:\n```r\n\u003e x \u003c- matrix(nrow = 100, ncol = 100)\n\u003e x[1, 1] \u003c- 0\n\u003e x[1:3, 1:3]\n     [,1] [,2] [,3]\n[1,]    0   NA   NA\n[2,]   NA   NA   NA\n[3,]   NA   NA   NA\n```\nThis looks fairly innocent, but it turns out that it is very inefficient - both when it comes to memory and speed.  The reason is that the default value used by `matrix()` is `NA`, which is of type _logical_.  This means that initially `x` is a _logical_ matrix not a _numeric_ matrix.  When we the assign the (1,1) element the value `0`, which is a _numeric_, the matrix first has to be coerced to _numeric_ internally and then the zero is assigned.  Profiling the memory will reveal this;\n\n\n```r\n\u003e p \u003c- profmem({\n+     x \u003c- matrix(nrow = 100, ncol = 100)\n+     x[1, 1] \u003c- 0\n+ })\n\u003e print(p, expr = FALSE)\nMemory allocations (\u003e= 2000 bytes):\n       what  bytes      calls\n1     alloc  40048   matrix()\n2     alloc  80048 \u003cinternal\u003e\ntotal       120096           \n```\nThe first entry is for the logical matrix with 10,000 elements (= 4 \\* 10,000 bytes + small header) that we allocate.  The second entry reveals the coercion of this matrix to a numeric matrix (= 8 \\* 10,000 elements + small header).\n\nTo avoid this, we make sure to create a numeric matrix upfront as:\n```r\n\u003e p \u003c- profmem({\n+     x \u003c- matrix(NA_real_, nrow = 100, ncol = 100)\n+     x[1, 1] \u003c- 0\n+ })\n\u003e print(p, expr = FALSE)\nMemory allocations (\u003e= 2000 bytes):\n       what bytes    calls\n1     alloc 80048 matrix()\ntotal       80048         \n```\n\nUsing the [microbenchmark] package, we can also quantify the extra overhead in processing time that is introduced due to the logical-to-numeric coercion;\n```r\n\u003e library(\"microbenchmark\")\n\u003e stats \u003c- microbenchmark(bad = {\n+     x \u003c- matrix(nrow = 100, ncol = 100)\n+     x[1, 1] \u003c- 0\n+ }, good = {\n+     x \u003c- matrix(NA_real_, nrow = 100, ncol = 100)\n+     x[1, 1] \u003c- 0\n+ }, times = 100, unit = \"ms\")\n\u003e stats\nUnit: milliseconds\n expr    min    lq  mean median    uq   max neval\n  bad 0.0141 0.018 0.022  0.019 0.022 0.052   100\n good 0.0095 0.011 0.016  0.013 0.016 0.067   100\n```\nThe inefficient approach is 1.5-2 times slower than the efficient one.\n\n\nThe above illustrates the value of profiling your R code's memory usage and thanks to `profmem()` we can compare the amount of memory allocated of two alternative implementations.  Being able to write memory-efficient R code becomes particularly important when working with large data sets, where an inefficient implementation may even prevent us from performing an analysis because we end up running out of memory.  Moreover, each memory allocation will eventually have to be deallocated and in R this is done automatically by the garbage collector, which runs in the background and recovers any blocks of memory that are allocated but no longer in use.  Garbage collection takes time and therefore slows down the overall processing in R even further.\n\n\n\n## What is logged?\n\nThe `profmem()` function uses the `utils::Rprofmem()` function for logging memory allocation events to a temporary file.  The logged events are parsed and returned as an in-memory R object in a format that is convenient to work with.  All memory allocations that are done via the native `allocVector3()` part of R's native API are logged, which means that nearly all memory allocations are logged.  Any objects allocated this way are automatically deallocated by R's garbage collector at some point.  Garbage collection events are _not_ logged by `profmem()`.\nAllocations _not_ logged are those done by non-R native libraries or R packages that use native code `Calloc() / Free()` for internal objects.  Such objects are _not_ handled by the R garbage collector.\n\n### Difference between `utils::Rprofmem()` and `utils::Rprof(memory.profiling = TRUE)`\nIn addition to `utils::Rprofmem()`, R also provides `utils::Rprof(memory.profiling = TRUE)`.  Despite the close similarity of their names, they use completely different approaches for profiling the memory usage.  As explained above, the former logs _all individual_ (`allocVector3()`) memory allocation whereas the latter probes the _total_ memory usage of R at regular time intervals.  If memory is allocated and deallocated between two such probing time points, `utils::Rprof(memory.profiling = TRUE)` will not log that memory whereas `utils::Rprofmem()` will pick it up.  On the other hand, with `utils::Rprofmem()` it is not possible to quantify the total memory _usage_ at a given time because it only logs _allocations_ and does therefore not reflect deallocations done by the garbage collector.\n\n\n## Requirements\n\nIn order for `profmem()` to work, R must have been built with memory profiling enabled.  If not, `profmem()` will produce an error with an informative message.  To manually check whether an R binary was built with this enable or not, do:\n```r\n\u003e capabilities(\"profmem\")\nprofmem \n   TRUE \n```\nThe overhead of running an R installation with memory profiling enabled compared to one without is neglectable / non-measurable.\n\nVolunteers of the R Project provide and distribute pre-built binaries of the R software for all the major operating system via [CRAN].  [It has been confirmed](https://github.com/HenrikBengtsson/profmem/issues/2) that the R binaries for Windows, macOS (both by CRAN and by the AT\u0026T Research Lab), and for Linux (\\*) all have been built with memory profiling enabled.  (\\*) For Linux, this has been confirmed for the Debian/Ubuntu distribution but yet not for the other Linux distributions.\n\n\nIn all other cases, to enable memory profiling, which is _only_ needed if `capabilities(\"profmem\")` returns `FALSE`, R needs to be _configured_ and _built from source_ using:\n```sh\n$ ./configure --enable-memory-profiling\n$ make\n```\nFor more information, please see the 'R Installation and Administration' documentation that comes with all R installations.\n\n\n\n[CRAN]: https://cran.r-project.org/\n[profmem]: https://cran.r-project.org/package=profmem\n[microbenchmark]: https://cran.r-project.org/package=microbenchmark\n\n\n## Installation\nR package profmem is available on [CRAN](https://cran.r-project.org/package=profmem) and can be installed in R as:\n```r\ninstall.packages(\"profmem\")\n```\n\n\n### Pre-release version\n\nTo install the pre-release version that is available in Git branch `develop` on GitHub, use:\n```r\nremotes::install_github(\"HenrikBengtsson/profmem\", ref=\"develop\")\n```\nThis will install the package from source.  \n\n\u003c!-- pkgdown-drop-below --\u003e\n\n\n## Contributing\n\nTo contribute to this package, please see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrikbengtsson%2Fprofmem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenrikbengtsson%2Fprofmem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenrikbengtsson%2Fprofmem/lists"}