{"id":18555407,"url":"https://github.com/bearloga/learning-rcpp","last_synced_at":"2025-04-09T23:32:18.467Z","repository":{"id":43625664,"uuid":"71674116","full_name":"bearloga/learning-rcpp","owner":"bearloga","description":"My notes as I learn C++ and Rcpp for fast machine learning in R","archived":false,"fork":false,"pushed_at":"2017-12-12T18:10:55.000Z","size":52,"stargazers_count":73,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T13:44:16.053Z","etag":null,"topics":["c-plus-plus","learning-by-doing","notes","r","rcpp"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bearloga.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-10-23T00:56:52.000Z","updated_at":"2025-03-22T11:18:31.000Z","dependencies_parsed_at":"2022-09-16T07:50:26.116Z","dependency_job_id":null,"html_url":"https://github.com/bearloga/learning-rcpp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bearloga%2Flearning-rcpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bearloga%2Flearning-rcpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bearloga%2Flearning-rcpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bearloga%2Flearning-rcpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bearloga","download_url":"https://codeload.github.com/bearloga/learning-rcpp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248129922,"owners_count":21052663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","learning-by-doing","notes","r","rcpp"],"created_at":"2024-11-06T21:26:32.565Z","updated_at":"2025-04-09T23:32:17.590Z","avatar_url":"https://github.com/bearloga.png","language":"C++","readme":"---\noutput:\n  md_document:\n    variant: markdown_github\n    includes:\n      in_header: header.md\n    toc: true\n    toc_depth: 4\n---\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(echo = TRUE)\noptions(digits = 4)\n```\n\n# Setup\n\n## Software Libraries\n\n## Mac OS X\n\n```bash\n## To install Homebrew:\n/usr/bin/ruby -e \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)\"\n## Then:\nbrew tap homebrew/science \u0026\u0026 brew update\n# brew install gcc --enable-cxx \u0026\u0026 brew link --overwrite gcc \u0026\u0026 brew link cmake\nbrew install boost --c++11\n# Installing cmake may require: sudo chown -R `whoami` /usr/local/share/man/man7\nbrew install mlpack\n```\n\nMay need to `brew reinstall --build-from-source mlpack` if you get \"Reason: image not found\" error about lack of Armadillo library. When I came back to these notes, Armadillo has been updated to v8 from v7 on my system and that was causing issues.\n\n## Ubuntu/Debian\n\n```bash\nsudo apt-get install libmlpack-dev libdlib-dev\n```\n\n## R Packages\n\n```R\ninstall.packages(c(\n  \"BH\", # Header files for 'Boost' C++ library\n  \"Rcpp\", # R and C++ integration\n  \"RcppArmadillo\", # Rcpp integration for 'Armadillo' linear algebra library\n  \"Rcereal\", # header files of 'cereal', a C++11 library for serialization\n  \"microbenchmark\", # For benchmarking performance\n  \"devtools\", # For installing packages from GitHub\n  \"magrittr\", # For piping\n  \"knitr\", # For printing tables \u0026 data.frames as Markdown\n  \"toOrdinal\" # Cardinal to ordinal number conversion function (e.g. 1 =\u003e \"1st\")\n), repos = \"https://cran.rstudio.com/\")\ndevtools::install_github(\"yihui/printr\") # Prettier table printing\n```\n\nIf you get \"ld: library not found for -lgfortran\" error when trying to install RcppArmadillo, run:\n\n```bash\ncurl -O http://r.research.att.com/libs/gfortran-4.8.2-darwin13.tar.bz2\nsudo tar fvxz gfortran-4.8.2-darwin13.tar.bz2 -C /\n```\n\nSee \"[Rcpp, RcppArmadillo and OS X Mavericks \"-lgfortran\" and \"-lquadmath\" error](http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/)\"\" for more info.\n\n# Rcpp\n\nSee [this section](http://rmarkdown.rstudio.com/authoring_knitr_engines.html#rcpp) in [RMarkdown documentation](http://rmarkdown.rstudio.com/) for details on Rcpp chunks.\n\n```{r load_pkg}\nlibrary(magrittr)\nlibrary(BH)\nlibrary(Rcpp)\nlibrary(RcppArmadillo)\nlibrary(Rcereal)\nlibrary(microbenchmark)\nlibrary(knitr)\nlibrary(printr)\n```\n\n## Basics\n\n```{Rcpp x2_source, cache = TRUE}\n#include \u003cRcpp.h\u003e\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nNumericVector cumSum(NumericVector x) {\n  int n = x.size();\n  NumericVector out(n);\n  out[0] = x[0];\n  for (int i = 1; i \u003c n; ++i) {\n    out[i] = out[i-1] + x[i];\n  }\n  return out;\n}\n```\n\n```{r x2_example, dependson = 'x2_source', cache = TRUE}\nx \u003c- 1:1000\nmicrobenchmark(\n  native = cumsum(x),\n  loop = (function(x) {\n    output \u003c- numeric(length(x))\n    output[1] \u003c- x[1]\n    for (i in 2:length(x)) {\n      output[i] \u003c- output[i - 1] + x[i]\n    }\n    return(output)\n  })(x),\n  Rcpp = cumSum(x)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\n## Using Libraries\n\n### Armadillo vs RcppArmadillo\n\nUse the **depends** attribute to bring in [RcppArmadillo](https://cran.r-project.org/package=RcppArmadillo), which is an Rcpp integration of the templated linear algebra library [Armadillo](http://arma.sourceforge.net/). The code below is an example of a fast linear model from Dirk Eddelbuettel.\n\n```{Rcpp fastlm_source, cache = TRUE}\n// [[Rcpp::depends(RcppArmadillo)]]\n#include \u003cRcppArmadillo.h\u003e\nusing namespace Rcpp;\nusing namespace arma;\n// [[Rcpp::export]]\nList fastLm(const colvec\u0026 y, const mat\u0026 X) {\n  int n = X.n_rows, k = X.n_cols;\n  colvec coef = solve(X, y);\n  colvec resid = y - X*coef;\n  double sig2 = as_scalar(trans(resid) * resid/(n-k));\n  colvec stderrest = sqrt(sig2 * diagvec( inv(trans(X)*X)) );\n  return List::create(_[\"coefficients\"] = coef,\n                      _[\"stderr\"]       = stderrest,\n                      _[\"df.residual\"]  = n - k );\n}\n```\n\n```{r fastlm_example, dependson = 'fastlm_source', cache = TRUE}\ndata(\"mtcars\", package = \"datasets\")\nmicrobenchmark(\n  lm = lm(mpg ~ wt + disp + cyl + hp, data = mtcars),\n  fastLm = fastLm(mtcars$mpg, cbind(1, as.matrix(mtcars[, c(\"wt\", \"disp\", \"cyl\", \"hp\")]))),\n  RcppArm = fastLmPure(cbind(1, as.matrix(mtcars[, c(\"wt\", \"disp\", \"cyl\", \"hp\")])), mtcars$mpg)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\n### Fast K-Means\n\nUnfortunately, [RcppMLPACK](https://cran.r-project.org/package=RcppMLPACK) uses version 1 of [MLPACK](http://www.mlpack.org/) (now in version 2) and only makes the unsupervised learning methods accessible. (Supervised methods would require returning a trained classifier object to R, which is actually a really difficult problem.)\n\nOkay, let's try to get a fast version of \u003cspan title = \"Bradley, P. S., \u0026 Fayyad, U. M. (1998). Refining Initial Points for K-Means Clustering. Icml.\" style = \"font-weight: bold;\"\u003ek-means\u003c/span\u003e.\n\nFirst, install the MLPACK library (see [\u0026sect; Software Libraries](#software-libraries)), then:\n\n```{r mlpack11, cache = FALSE}\n# Thanks to Kevin Ushey for suggesting Rcpp plugins (e.g. Rcpp:::.plugins$openmp)\nregisterPlugin(\"mlpack11\", function() {\n  return(list(env = list(\n    USE_CXX1X = \"yes\",\n    CXX1XSTD = \"-std=c++11\",\n    PKG_LIBS = \"-lmlpack\"\n  )))\n})\n```\n\nWe refer to the documentation for [KMeans](http://www.mlpack.org/docs/mlpack-1.0.6/doxygen.php?doc=kmtutorial.html#kmeans_kmtut) shows, although it seems to incorrectly use `arma::Col\u003csize_t\u003e` for cluster assigments while in practice the cluster assignments are returned as an `arma::Row\u003csize_t\u003e` object.\n\n```{Rcpp kmeans_source, cache = TRUE}\n// [[Rcpp::plugins(mlpack11)]]\n// [[Rcpp::depends(RcppArmadillo)]]\n\n#include \u003cRcppArmadillo.h\u003e\nusing namespace Rcpp;\n#include \u003cmlpack/core/util/log.hpp\u003e\n#include \u003cmlpack/methods/kmeans/kmeans.hpp\u003e\nusing namespace mlpack::kmeans;\nusing namespace arma;\n\n// [[Rcpp::export]]\nNumericVector mlpackKM(const arma::mat\u0026 data, const size_t\u0026 clusters) {\n  Row\u003csize_t\u003e assignments;\n  KMeans\u003c\u003e k;\n  k.Cluster(data, clusters, assignments);\n  // Let's change the format of the output to be a little nicer:\n  NumericVector results(data.n_cols);\n  for (int i = 0; i \u003c assignments.n_cols; i++) {\n    results[i] = assignments(i) + 1; // cluster assignments are 0-based\n  }\n  return results;\n}\n```\n\n(Alternatively: `sourceCpp(\"src/fastKm.cpp\")` which creates `fastKm()` from [src/fastKm.cpp](src/fastKm.cpp))\n\nGetting:\n\n```\nSymbol not found: __ZN6mlpack3Log6AssertEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE\nExpected in: flat namespace\n```\n\nWhich (via `c++filt __ZN6mlpack3Log6AssertEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE`) translates to: \"mlpack::Log::Assert(bool, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e const\u0026)\"\n\n```{r kmeans_example, dependson = 'kmeans_source', cache = TRUE}\ndata(trees, package = \"datasets\"); data(faithful, package = \"datasets\")\n# kmeans coerces data frames to matrix, so it's worth doing that beforehand\nmtrees \u003c- as.matrix(trees)\nmfaithful \u003c- as.matrix(faithful)\n# KMeans in MLPACK requires observations to be in columns, not rows:\nttrees \u003c- t(trees); tfaithful \u003c- t(faithful)\nmicrobenchmark(\n  kmeans_trees = kmeans(mtrees, 3),\n  mlpackKM_trees = mlpackKM(ttrees, 3),\n  kmeans_faithful = kmeans(mfaithful, 2),\n  mlpackKM_faithful = mlpackKM(tfaithful, 2)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\n## Fast Classification\n\nIn this exercise, we will train a [Naive Bayes classifier from MLPACK](http://www.mlpack.org/docs/mlpack-2.1.0/doxygen.php?doc=classmlpack_1_1naive__bayes_1_1NaiveBayesClassifier.html). First, we train and classify in a single step. Then we will store the trained classifier in memory, and then later we will be able to save the model. Storing the trained model requires [serialization](https://en.wikipedia.org/wiki/Serialization), the topic of the next section.\n\n**Update (2017-09-05)**: there is now -- apparently -- a [RcppMLPACK2](https://github.com/rcppmlpack/rcppmlpack2). For more details, refer to [RcppMLPACK2 and the MLPACK Machine Learning Library article](http://gallery.rcpp.org/articles/using-rcppmlpack2/).\n\n```{Rcpp nbc_source, cache = TRUE}\n// [[Rcpp::plugins(mlpack11)]]\n// [[Rcpp::depends(RcppArmadillo)]]\n\n#include \u003cRcppArmadillo.h\u003e\nusing namespace Rcpp;\n\n#include \u003cmlpack/core/util/log.hpp\u003e\n#include \u003cmlpack/methods/naive_bayes/naive_bayes_classifier.hpp\u003e\nusing namespace mlpack::naive_bayes;\n\n// [[Rcpp::export]]\nNumericVector mlpackNBC(const arma::mat\u0026 training_data, const arma::Row\u003csize_t\u003e\u0026 labels, const size_t\u0026 classes, const arma::mat\u0026 new_data) {\n  // Initialization \u0026 training:\n  NaiveBayesClassifier\u003c\u003e nbc(training_data, labels, classes);\n  // Prediction:\n  arma::Row\u003csize_t\u003e predictions;\n  nbc.Classify(new_data, predictions);\n  // Let's change the format of the output to be a little nicer:\n  NumericVector results(predictions.n_cols);\n  for (int i = 0; i \u003c predictions.n_cols; ++i) {\n    results[i] = predictions(i);\n  }\n  return results;\n}\n```\n\nFor the classification example, we'll use the Iris dataset. (Of course.)\n\n```{r iris, cache = TRUE}\ndata(iris, package = \"datasets\")\nset.seed(0)\ntraining_idx \u003c- sample.int(nrow(iris), 0.8 * nrow(iris), replace = FALSE)\ntraining_x \u003c- unname(as.matrix(iris[training_idx, 1:4]))\ntraining_y \u003c- unname(iris$Species[training_idx])\ntesting_x \u003c- unname(as.matrix(iris[-training_idx, 1:4]))\ntesting_y \u003c- unname(iris$Species[-training_idx])\n# For fastNBC:\nttraining_x \u003c- t(training_x)\nttraining_y \u003c- matrix(as.numeric(training_y) - 1, nrow = 1)\nclasses \u003c- length(levels(training_y))\nttesting_x \u003c- t(testing_x)\nttesting_y \u003c- matrix(as.numeric(testing_y) - 1, nrow = 1)\n```\n\nI kept getting \"Mat::col(): index out of bounds\" error when trying to compile. I debugged the heck out of it until I finally looked in **naive_bayes_classifier_impl.hpp** and saw:\n\n```cpp\nfor (size_t j = 0; j \u003c data.n_cols; ++j)\n{\n  const size_t label = labels[j];\n  ++probabilities[label];\n  \n  arma::vec delta = data.col(j) - means.col(label);\n  means.col(label) += delta / probabilities[label];\n  variances.col(label) += delta % (data.col(j) - means.col(label));\n}\n```\n\nHence why we run into a problem when we use `as.numeric(training_y)` in R and turn that factor into 1s, 2s, and 3s. This makes sense in retrospect but would have been nice to explicitly know that MLPACK expects training data class labels to be 0-based.\n\n```{r nbc_example, dependson = c('nbc_source', 'iris'), cache = TRUE}\n# Naive Bayes via e1071\nnaive_bayes \u003c- e1071::naiveBayes(training_x, training_y)\npredictions \u003c- e1071:::predict.naiveBayes(naive_bayes, testing_x, type = \"class\")\nconfusion_matrix \u003c- caret::confusionMatrix(\n  data = predictions,\n  reference = testing_y\n)\nconfusion_matrix$table\nprint(confusion_matrix$overall[\"Accuracy\"])\n\n# Naive Bayes via MLPACK\npredictions \u003c- mlpackNBC(ttraining_x, ttraining_y, classes, ttesting_x)\nconfusion_matrix \u003c- caret::confusionMatrix(\n  data = predictions,\n  reference = ttesting_y\n)\nconfusion_matrix$table\nprint(confusion_matrix$overall[\"Accuracy\"])\n\n# Performance Comparison\nmicrobenchmark(\n  naiveBayes = {\n    naive_bayes \u003c- e1071::naiveBayes(training_x, training_y)\n    predictions \u003c- e1071:::predict.naiveBayes(naive_bayes, testing_x, type = \"class\")\n  },\n  fastNBC = mlpackNBC(ttraining_x, ttraining_y, classes, ttesting_x)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\n### External Pointers\n\nIn the next step, we'll train a Naive Bayes classifier and keep that trained object in memory to make classification a separate step. Notice that we have to:\n\n- declare a pointer: `NaiveBayesClassifier\u003c\u003e* nbc = new NaiveBayesClassifier\u003c\u003e(...)`\n- use Rcpp's external pointers (`Rcpp::XPtr`) and\n- return an [S-expression](http://adv-r.had.co.nz/C-interface.html#c-data-structures) (`SEXP`).\n\n```{Rcpp nb_train_source, cache = TRUE}\n// [[Rcpp::plugins(mlpack11)]]\n// [[Rcpp::depends(RcppArmadillo)]]\n\n#include \u003cRcppArmadillo.h\u003e\nusing namespace Rcpp;\n\n#include \u003cmlpack/core/util/log.hpp\u003e\n#include \u003cmlpack/methods/naive_bayes/naive_bayes_classifier.hpp\u003e\nusing namespace mlpack::naive_bayes;\n\n// [[Rcpp::export]]\nSEXP mlpackNBTrainXPtr(const arma::mat\u0026 training_data, const arma::Row\u003csize_t\u003e\u0026 labels, const size_t\u0026 classes) {\n  // Initialization \u0026 training:\n  NaiveBayesClassifier\u003c\u003e* nbc = new NaiveBayesClassifier\u003c\u003e(training_data, labels, classes);\n  Rcpp::XPtr\u003cNaiveBayesClassifier\u003c\u003e\u003e p(nbc, true);\n  return p;\n}\n```\n\n```{r nbc_example_2a, dependson = 'nbc_example', cache = FALSE}\nfit \u003c- mlpackNBTrainXPtr(ttraining_x, ttraining_y, classes)\nstr(fit)\n```\n\n`fit` is an external pointer to some memory. When we pass it to a C++ function, it's passed as an R data type (SEXP) that we have to convert to an external pointer before we can use the object's methods. Notice that we're now calling `nbc-\u003eClassify()` instead of `nbc.Classify()`.\n\n```{Rcpp nb_classify_source, dependson = 'nb_train_source', cache = TRUE}\n// [[Rcpp::plugins(mlpack11)]]\n// [[Rcpp::depends(RcppArmadillo)]]\n\n#include \u003cRcppArmadillo.h\u003e\nusing namespace Rcpp;\n\n#include \u003cmlpack/core/util/log.hpp\u003e\n#include \u003cmlpack/methods/naive_bayes/naive_bayes_classifier.hpp\u003e\nusing namespace mlpack::naive_bayes;\n\n// [[Rcpp::export]]\nNumericVector mlpackNBClassifyXPtr(SEXP xp, const arma::mat\u0026 new_data) {\n  XPtr\u003cNaiveBayesClassifier\u003c\u003e\u003e nbc(xp);\n  // Prediction:\n  arma::Row\u003csize_t\u003e predictions;\n  nbc-\u003eClassify(new_data, predictions);\n  // Let's change the format of the output to be a little nicer:\n  NumericVector results(predictions.n_cols);\n  for (int i = 0; i \u003c predictions.n_cols; ++i) {\n    results[i] = predictions(i);\n  }\n  return results;\n}\n```\n\n```{r nbc_example_2b, dependson = c('nb_classify_source', 'nbc_example_2a'), cache = FALSE}\nfit_e1071 \u003c- e1071::naiveBayes(training_x, training_y)\n# Performance Comparison\nmicrobenchmark(\n  `e1071 prediction` = e1071:::predict.naiveBayes(fit_e1071, testing_x, type = \"class\"),\n  `MLPACK prediction` = mlpackNBClassifyXPtr(fit, ttesting_x)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\nSee [Exposing C++ functions and classes with Rcpp modules](http://dirk.eddelbuettel.com/code/rcpp/Rcpp-modules.pdf) for more information.\n\n## Random Number Generation\n\n[Rcpp sugar](http://adv-r.had.co.nz/Rcpp.html#rcpp-sugar) provides a bunch of useful features and high-level abstractions, such as statistical distribution functions. Let's create a function that yields a sample of _n_ independent draws from Bernoulli(*p*):\n\n```{Rcpp rng_source, cache = TRUE}\n#include \u003cRcpp.h\u003e\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nNumericVector bernie(const int\u0026 n, const float\u0026 p) {\n  return rbinom(n, 1, p);\n}\n```\n\n\u003e Section 6.3 of [Writing R Extensions](http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-number-generation) describes an additional requirement for calling the R random number generation functions: you must call GetRNGState prior to using them and then PutRNGState afterwards. These functions (respectively) read .Random.seed and then write it out after use. When using Rcpp attributes (as we do via the // [[Rcpp::export]] annotation on the functions above) it is not necessary to call GetRNGState and PutRNGState because this is done automatically within the wrapper code generated for exported functions. In fact, since these calls don’t nest it is actually an error to call them when within a function exported via Rcpp attributes. ([Random number generation](http://gallery.rcpp.org/articles/random-number-generation/))\n\nLet's just do a quick test to see if setting seed does what it should:\n\n```{r rng_example, dependson = 'rng_source', cache = TRUE}\nn \u003c- 100; p \u003c- 0.5; x \u003c- list()\nset.seed(0); x[[1]] \u003c- bernie(n, p)\nset.seed(0); x[[2]] \u003c- bernie(n, p)\nset.seed(42); x[[3]] \u003c- bernie(n, p)\nset.seed(42); x[[4]] \u003c- bernie(n, p)\n```\n```{r rng_results, dependson = 'rng_example', cache = TRUE, results = 'asis', echo = FALSE}\ny \u003c- matrix(logical(length(x)^2), nrow = length(x))\nfor (i in seq_along(x)) {\n  y[i, ] \u003c- vapply(x, function(y) {\n    return(identical(x[[i]], y))\n  }, TRUE)\n}\nrownames(y) \u003c- colnames(y) \u003c- paste0(\"Seed: \", c(rep(0, 2), rep(42, 2)), \", \", vapply(1:2, toOrdinal::toOrdinal, \"\"), \" draw\")\ny[y] \u003c- \"All draws match\"\ny[y == \"FALSE\"] \u003c- \"Draws don't match\"\nknitr::kable(y, format = \"markdown\")\n```\n\nLet's see how the performance differs between it and an R-equivalent when _n_=1,000:\n\n```{r rng_benchmark, dependson = 'rng_source', cache = TRUE}\nrbern \u003c- function(n, p) {\n  return(rbinom(n, 1, p))\n}\nmicrobenchmark(\n  R = rbern(1e3, 0.49),\n  Rcpp = bernie(1e3, 0.49)\n) %\u003e% summary(unit = \"ms\") %\u003e% knitr::kable(format = \"markdown\")\n```\n\n## Serialization\n\n[Serialization](https://en.wikipedia.org/wiki/Serialization) is the process of translating data structures and objects into a format that can be stored. Earlier, we trained a Naive Bayes classifier and kept the trained object in memory, returning an external pointer to it, allowing us to classify new observations as long as it is done within the same session.\n\nThis one requires: C++11 capability (if compiled supports it, enable via `// [[Rcpp::plugins(cpp11)]]`) and [cereal](http://uscilab.github.io/cereal/) serialization library, available via [Rcereal package](https://cran.r-project.org/package=Rcereal).\n\nRoughly, we're going to create a wrapper for NumericMatrix that is serializable. **Note**: unlike previous section where the Rcpp chunks were complete, this section has multiple Rcpp chunks that are stitched together using the \"ref.label\" knitr chunk parameter, allowing me to have notes in-between different functions that should be together. See [Combining Chunks](http://rmarkdown.rstudio.com/authoring_knitr_engines.html#combining-chunks) for more details.\n\n```{Rcpp cerealization, ref.label = c('myclass_definition', 'myclass_test', 'myclass_serialization', 'myclass_deserialization', 'cerealizations_source'), cache = TRUE, include = FALSE}\n```\n\n```{Rcpp myclass_definition, eval = FALSE}\n// [[Rcpp::plugins(cpp11)]]\n// [[Rcpp::depends(Rcereal)]]\n\n// Enables us to keep serialization method internally:\n#include \u003ccereal/access.hpp\u003e\n// see http://uscilab.github.io/cereal/serialization_functions.html#non-public-serialization\n\n// Use std::vector and make it serializable:\n#include \u003cvector\u003e\n#include \u003ccereal/types/vector.hpp\u003e\n// see http://uscilab.github.io/cereal/stl_support.html for more info\n\n// Cereal's binary archiving:\n#include \u003csstream\u003e\n#include \u003ccereal/archives/binary.hpp\u003e\n// see http://uscilab.github.io/cereal/serialization_archives.html\n// and http://uscilab.github.io/cereal/quickstart.html for more info\n\n#include \u003cRcpp.h\u003e\nusing namespace Rcpp;\n\nclass SerializableNumericMatrix\n{\nprivate:\n  int nrows;\n  int ncols;\n  std::vector\u003cdouble\u003e data;\n  friend class cereal::access;\n  // This method lets cereal know which data members to serialize\n  template\u003cclass Archive\u003e\n  void serialize(Archive\u0026 ar)\n  {\n    ar( data, nrows, ncols ); // serialize things by passing them to the archive\n  }\npublic:\n  SerializableNumericMatrix(){};\n  SerializableNumericMatrix(NumericMatrix x) {\n    std::vector\u003cdouble\u003e y = as\u003cstd::vector\u003cdouble\u003e\u003e(x); // Rcpp::as\n    data = y;\n    nrows = x.nrow();\n    ncols = x.ncol();\n  };\n  NumericVector neuemericMatrix() {\n    // Thanks to Kevin Ushey for the tip to use dim attribute\n    // http://stackoverflow.com/a/19866956/1091835\n    NumericVector d = wrap(data);\n    d.attr(\"dim\") = Dimension(nrows, ncols);\n    return d;\n  }\n  NumericMatrix numericMatrix() {\n    NumericMatrix d(nrows, ncols, data.begin());\n    return d;\n  }\n};\n```\n\nLet's just test that everything works before adding [de-]serialization capabilities.\n\n```{Rcpp myclass_test, eval = FALSE}\n// [[Rcpp::export]]\nNumericMatrix testSNM(NumericMatrix x) {\n  SerializableNumericMatrix snm(x);\n  return snm.numericMatrix();\n}\n```\n\n```{r myclass_example, dependson = 'cerealization', cache = TRUE}\nx \u003c- matrix(0:9, nrow = 2, ncol = 5)\n(y \u003c- testSNM(x))\n```\n\nYay! Okay, for this we are going to use the serialization/deserialization example from [Rcereal README.md](https://github.com/wush978/Rcereal/blob/master/README.md).\n\n```{Rcpp myclass_serialization, eval = FALSE}\n// [[Rcpp::export]]\nRawVector serializeSNM(NumericMatrix x) {\n  SerializableNumericMatrix snm(x);\n  std::stringstream ss;\n  {\n    cereal::BinaryOutputArchive oarchive(ss); // Create an output archive\n    oarchive(snm);\n  }\n  ss.seekg(0, ss.end);\n  RawVector retval(ss.tellg());\n  ss.seekg(0, ss.beg);\n  ss.read(reinterpret_cast\u003cchar*\u003e(\u0026retval[0]), retval.size());\n  return retval;\n}\n```\n\n```{Rcpp myclass_deserialization, eval = FALSE}\n//[[Rcpp::export]]\nNumericVector deserializeSNM(RawVector src) {\n  std::stringstream ss;\n  ss.write(reinterpret_cast\u003cchar*\u003e(\u0026src[0]), src.size());\n  ss.seekg(0, ss.beg);\n  SerializableNumericMatrix snm;\n  {\n    cereal::BinaryInputArchive iarchive(ss);\n    iarchive(snm);\n  }\n  return snm.numericMatrix();\n}\n```\n\n```{r cerealization_example, dependson = c('myclass_example', 'cerealization'), cache = TRUE}\n(raw_vector \u003c- serializeSNM(x))\ndeserializeSNM(raw_vector)\n```\n\n### Armadillo Serialization (Not Working)\n\nBasically, we want to be able to serialize Armadillo matrices because then we can serialize things like the Naive Bayes classifier that rely on `arma::Mat`'s.\n\n```{r}\n# Enables us to include files in src/\nregisterPlugin(\"local\", function() {\n  return(list(env = list(\n    PKG_CXXFLAGS = paste0('-I\"', getwd(), '/src\"')\n  )))\n})\n```\n\n```{Rcpp boosted_serialization, ref.label = c('bs_includes', 'bs_boost_serialization_includes', 'bs_RcppArmadilloForward', 'bs_armadillo_include', 'bs_boost_archive_includes', 'bs_mlpack_includes', 'bs_ns', 'bs_test_src', 'bs_serialize_src', 'bs_deserialize_src'), cache = TRUE, include = FALSE}\n```\n```{Rcpp bs_includes, eval = FALSE}\n// [[Rcpp::plugins(local)]]\n// [[Rcpp::plugins(mlpack11)]]\n// [[Rcpp::depends(BH)]]\n// [[Rcpp::depends(RcppArmadillo)]]\n```\n\nInclude everything we'll need for `serialize()`:\n\n```{Rcpp bs_boost_serialization_includes, eval = FALSE}\n#include \u003cboost/serialization/serialization.hpp\u003e\n#include \u003cboost/serialization/nvp.hpp\u003e\n#include \u003cboost/serialization/array.hpp\u003e\n```\n\nIn the next chunk we copy and modify the `RcppArmadillo__RcppArmadilloForward__h` definition from **RcppArmadilloForward.h** so as to use our own Mat extra proto and meat, which is really just:\n\n```cpp\n//! Add a serialization operator.\ntemplate\u003ctypename Archive\u003e\nvoid serialize(Archive\u0026 ar, const unsigned int version);\n\n#include \u003cRcppArmadillo/Mat_proto.h\u003e\n```\n\nThe **Mat_extra_bones.hpp** is included for serialization and **Mat_proto.h** is included so RcppArmadillo plays nice.\n\n```{Rcpp bs_RcppArmadilloForward, eval = FALSE}\n#ifndef RcppArmadillo__RcppArmadilloForward__h\n#define RcppArmadillo__RcppArmadilloForward__h\n\n#include \u003cRcppCommon.h\u003e\n#include \u003cRconfig.h\u003e\n#include \u003cRcppArmadilloConfig.h\u003e\n\n// Costom Mat extension that combines MLPACK with RcppArmadillo's Mat extensions:\n#define ARMA_EXTRA_MAT_PROTO mat_extra_bones.hpp\n#define ARMA_EXTRA_MAT_MEAT  mat_extra_meat.hpp\n\n// Everything else the same:\n#define ARMA_EXTRA_COL_PROTO RcppArmadillo/Col_proto.h\n#define ARMA_EXTRA_COL_MEAT  RcppArmadillo/Col_meat.h\n#define ARMA_EXTRA_ROW_PROTO RcppArmadillo/Row_proto.h\n#define ARMA_EXTRA_ROW_MEAT  RcppArmadillo/Row_meat.h\n#define ARMA_RNG_ALT         RcppArmadillo/Alt_R_RNG.h\n#include \u003carmadillo\u003e\n/* forward declarations */\nnamespace Rcpp {\n    /* support for wrap */\n    template \u003ctypename T\u003e SEXP wrap ( const arma::Mat\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::Row\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::Col\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::field\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::Cube\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::subview\u003cT\u003e\u0026 ) ;\n    template \u003ctypename T\u003e SEXP wrap ( const arma::SpMat\u003cT\u003e\u0026 ) ;\n    \n    template \u003ctypename T1, typename T2, typename glue_type\u003e \n    SEXP wrap(const arma::Glue\u003cT1, T2, glue_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename op_type\u003e\n    SEXP wrap(const arma::Op\u003cT1, op_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename T2, typename glue_type\u003e \n    SEXP wrap(const arma::eGlue\u003cT1, T2, glue_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename op_type\u003e\n    SEXP wrap(const arma::eOp\u003cT1, op_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename op_type\u003e\n    SEXP wrap(const arma::OpCube\u003cT1,op_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename T2, typename glue_type\u003e\n    SEXP wrap(const arma::GlueCube\u003cT1,T2,glue_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename op_type\u003e\n    SEXP wrap(const arma::eOpCube\u003cT1,op_type\u003e\u0026 X ) ;\n    \n    template \u003ctypename T1, typename T2, typename glue_type\u003e\n    SEXP wrap(const arma::eGlueCube\u003cT1,T2,glue_type\u003e\u0026 X ) ;\n\n    template\u003ctypename out_eT, typename T1, typename op_type\u003e\n    SEXP wrap( const arma::mtOp\u003cout_eT,T1,op_type\u003e\u0026 X ) ;\n\n    template\u003ctypename out_eT, typename T1, typename T2, typename glue_type\u003e\n    SEXP wrap( const arma::mtGlue\u003cout_eT,T1,T2,glue_type\u003e\u0026 X );\n    \n    template \u003ctypename eT, typename gen_type\u003e\n    SEXP wrap( const arma::Gen\u003ceT,gen_type\u003e\u0026 X) ;\n    \n    template\u003ctypename eT, typename gen_type\u003e\n    SEXP wrap( const arma::GenCube\u003ceT,gen_type\u003e\u0026 X) ;\n    \n    namespace traits {\n\n\t/* support for as */\n\ttemplate \u003ctypename T\u003e class Exporter\u003c arma::Mat\u003cT\u003e \u003e ;\n\ttemplate \u003ctypename T\u003e class Exporter\u003c arma::Row\u003cT\u003e \u003e ;\n\ttemplate \u003ctypename T\u003e class Exporter\u003c arma::Col\u003cT\u003e \u003e ;\n\ttemplate \u003ctypename T\u003e class Exporter\u003c arma::SpMat\u003cT\u003e \u003e ;\n    \n\ttemplate \u003ctypename T\u003e class Exporter\u003c arma::field\u003cT\u003e \u003e ;\n    // template \u003ctypename T\u003e class Exporter\u003c arma::Cube\u003cT\u003e \u003e ;\n\n    } // namespace traits \n\n    template \u003ctypename T\u003e class ConstReferenceInputParameter\u003c arma::Mat\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ReferenceInputParameter\u003c arma::Mat\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ConstInputParameter\u003c arma::Mat\u003cT\u003e \u003e ;\n    \n    template \u003ctypename T\u003e class ConstReferenceInputParameter\u003c arma::Col\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ReferenceInputParameter\u003c arma::Col\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ConstInputParameter\u003c arma::Col\u003cT\u003e \u003e ;\n    \n    template \u003ctypename T\u003e class ConstReferenceInputParameter\u003c arma::Row\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ReferenceInputParameter\u003c arma::Row\u003cT\u003e \u003e ;\n    template \u003ctypename T\u003e class ConstInputParameter\u003c arma::Row\u003cT\u003e \u003e ;\n    \n}\n\n#endif\n```\n\nOkay, now that that's been defined, let's include **RcppArmadillo.h** which will include **RcppForward.h** but which will (hopefully) *not* redefine our slightly customized `RcppArmadillo__RcppArmadilloForward__h`.\n\n```{Rcpp bs_armadillo_include, eval = FALSE}\n#include \u003cRcppArmadillo.h\u003e\n```\n\nMLPACK's Naive Bayes Classifier:\n\n```{Rcpp bs_mlpack_includes, eval = FALSE}\n#include \u003cmlpack/core/util/log.hpp\u003e\n#include \u003cmlpack/methods/naive_bayes/naive_bayes_classifier.hpp\u003e\nusing namespace mlpack::naive_bayes;\n```\n\nBoost's Archives:\n\n```{Rcpp bs_boost_archive_includes, eval = FALSE}\n// Include everything we'll need for archiving.\n#include \u003cboost/archive/binary_oarchive.hpp\u003e\n#include \u003cboost/archive/binary_iarchive.hpp\u003e\n#include \u003csstream\u003e\n```\n```{Rcpp bs_ns, eval = FALSE}\nusing namespace Rcpp;\nusing namespace arma;\n```\n```{Rcpp bs_test_src, eval = FALSE}\n// [[Rcpp::export]]\nmat test() {\n  mat A(3, 6, fill::randu);\n  return A;\n}\n```\n\n```{r bs_test_ex}\nset.seed(0); (x1 \u003c- test())\nset.seed(0); x2 \u003c- test()\nidentical(x1, x2)\n```\n\n```{Rcpp bs_serialize_src, eval = FALSE}\n// [[Rcpp::export]]\nRawVector test_serialization(Mat\u003cdouble\u003e m) {\n  std::stringstream ss;\n  // save data to archive\n  {\n    boost::archive::binary_oarchive oarchive(ss); // create an output archive\n    oarchive \u0026 m; // write class instance to archive\n    // archive and stream closed when destructors are called\n  }\n  ss.seekg(0, ss.end);\n  RawVector retval(ss.tellg());\n  ss.seekg(0, ss.beg);\n  ss.read(reinterpret_cast\u003cchar*\u003e(\u0026retval[0]), retval.size());\n  return retval;\n}\n```\n\n# References\n\n- Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. New York, NY: Springer Science \u0026 Business Media. http://doi.org/10.1007/978-1-4614-6868-4\n- Wickham, H. A. (2014). Advanced R. Chapman and Hall/CRC. http://doi.org/10.1201/b17487\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbearloga%2Flearning-rcpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbearloga%2Flearning-rcpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbearloga%2Flearning-rcpp/lists"}