Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bearloga/learning-rcpp
My notes as I learn C++ and Rcpp for fast machine learning in R
https://github.com/bearloga/learning-rcpp
c-plus-plus learning-by-doing notes r rcpp
Last synced: about 2 months ago
JSON representation
My notes as I learn C++ and Rcpp for fast machine learning in R
- Host: GitHub
- URL: https://github.com/bearloga/learning-rcpp
- Owner: bearloga
- Created: 2016-10-23T00:56:52.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-12-12T18:10:55.000Z (about 7 years ago)
- Last Synced: 2023-03-24T14:15:26.073Z (almost 2 years ago)
- Topics: c-plus-plus, learning-by-doing, notes, r, rcpp
- Language: C++
- Homepage:
- Size: 50.8 KB
- Stars: 70
- Watchers: 4
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output:
md_document:
variant: markdown_github
includes:
in_header: header.md
toc: true
toc_depth: 4
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(digits = 4)
```# Setup
## Software Libraries
## Mac OS X
```bash
## To install Homebrew:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
## Then:
brew tap homebrew/science && brew update
# brew install gcc --enable-cxx && brew link --overwrite gcc && brew link cmake
brew install boost --c++11
# Installing cmake may require: sudo chown -R `whoami` /usr/local/share/man/man7
brew install mlpack
```May need to `brew reinstall --build-from-source mlpack` if you get "Reason: image not found" error about lack of Armadillo library. When I came back to these notes, Armadillo has been updated to v8 from v7 on my system and that was causing issues.
## Ubuntu/Debian
```bash
sudo apt-get install libmlpack-dev libdlib-dev
```## R Packages
```R
install.packages(c(
"BH", # Header files for 'Boost' C++ library
"Rcpp", # R and C++ integration
"RcppArmadillo", # Rcpp integration for 'Armadillo' linear algebra library
"Rcereal", # header files of 'cereal', a C++11 library for serialization
"microbenchmark", # For benchmarking performance
"devtools", # For installing packages from GitHub
"magrittr", # For piping
"knitr", # For printing tables & data.frames as Markdown
"toOrdinal" # Cardinal to ordinal number conversion function (e.g. 1 => "1st")
), repos = "https://cran.rstudio.com/")
devtools::install_github("yihui/printr") # Prettier table printing
```If you get "ld: library not found for -lgfortran" error when trying to install RcppArmadillo, run:
```bash
curl -O http://r.research.att.com/libs/gfortran-4.8.2-darwin13.tar.bz2
sudo tar fvxz gfortran-4.8.2-darwin13.tar.bz2 -C /
```See "[Rcpp, RcppArmadillo and OS X Mavericks "-lgfortran" and "-lquadmath" error](http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/)"" for more info.
# Rcpp
See [this section](http://rmarkdown.rstudio.com/authoring_knitr_engines.html#rcpp) in [RMarkdown documentation](http://rmarkdown.rstudio.com/) for details on Rcpp chunks.
```{r load_pkg}
library(magrittr)
library(BH)
library(Rcpp)
library(RcppArmadillo)
library(Rcereal)
library(microbenchmark)
library(knitr)
library(printr)
```## Basics
```{Rcpp x2_source, cache = TRUE}
#include
using namespace Rcpp;// [[Rcpp::export]]
NumericVector cumSum(NumericVector x) {
int n = x.size();
NumericVector out(n);
out[0] = x[0];
for (int i = 1; i < n; ++i) {
out[i] = out[i-1] + x[i];
}
return out;
}
``````{r x2_example, dependson = 'x2_source', cache = TRUE}
x <- 1:1000
microbenchmark(
native = cumsum(x),
loop = (function(x) {
output <- numeric(length(x))
output[1] <- x[1]
for (i in 2:length(x)) {
output[i] <- output[i - 1] + x[i]
}
return(output)
})(x),
Rcpp = cumSum(x)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```## Using Libraries
### Armadillo vs RcppArmadillo
Use the **depends** attribute to bring in [RcppArmadillo](https://cran.r-project.org/package=RcppArmadillo), which is an Rcpp integration of the templated linear algebra library [Armadillo](http://arma.sourceforge.net/). The code below is an example of a fast linear model from Dirk Eddelbuettel.
```{Rcpp fastlm_source, cache = TRUE}
// [[Rcpp::depends(RcppArmadillo)]]
#include
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List fastLm(const colvec& y, const mat& X) {
int n = X.n_rows, k = X.n_cols;
colvec coef = solve(X, y);
colvec resid = y - X*coef;
double sig2 = as_scalar(trans(resid) * resid/(n-k));
colvec stderrest = sqrt(sig2 * diagvec( inv(trans(X)*X)) );
return List::create(_["coefficients"] = coef,
_["stderr"] = stderrest,
_["df.residual"] = n - k );
}
``````{r fastlm_example, dependson = 'fastlm_source', cache = TRUE}
data("mtcars", package = "datasets")
microbenchmark(
lm = lm(mpg ~ wt + disp + cyl + hp, data = mtcars),
fastLm = fastLm(mtcars$mpg, cbind(1, as.matrix(mtcars[, c("wt", "disp", "cyl", "hp")]))),
RcppArm = fastLmPure(cbind(1, as.matrix(mtcars[, c("wt", "disp", "cyl", "hp")])), mtcars$mpg)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```### Fast K-Means
Unfortunately, [RcppMLPACK](https://cran.r-project.org/package=RcppMLPACK) uses version 1 of [MLPACK](http://www.mlpack.org/) (now in version 2) and only makes the unsupervised learning methods accessible. (Supervised methods would require returning a trained classifier object to R, which is actually a really difficult problem.)
Okay, let's try to get a fast version of k-means.
First, install the MLPACK library (see [§ Software Libraries](#software-libraries)), then:
```{r mlpack11, cache = FALSE}
# Thanks to Kevin Ushey for suggesting Rcpp plugins (e.g. Rcpp:::.plugins$openmp)
registerPlugin("mlpack11", function() {
return(list(env = list(
USE_CXX1X = "yes",
CXX1XSTD = "-std=c++11",
PKG_LIBS = "-lmlpack"
)))
})
```We refer to the documentation for [KMeans](http://www.mlpack.org/docs/mlpack-1.0.6/doxygen.php?doc=kmtutorial.html#kmeans_kmtut) shows, although it seems to incorrectly use `arma::Col` for cluster assigments while in practice the cluster assignments are returned as an `arma::Row` object.
```{Rcpp kmeans_source, cache = TRUE}
// [[Rcpp::plugins(mlpack11)]]
// [[Rcpp::depends(RcppArmadillo)]]#include
using namespace Rcpp;
#include
#include
using namespace mlpack::kmeans;
using namespace arma;// [[Rcpp::export]]
NumericVector mlpackKM(const arma::mat& data, const size_t& clusters) {
Row assignments;
KMeans<> k;
k.Cluster(data, clusters, assignments);
// Let's change the format of the output to be a little nicer:
NumericVector results(data.n_cols);
for (int i = 0; i < assignments.n_cols; i++) {
results[i] = assignments(i) + 1; // cluster assignments are 0-based
}
return results;
}
```(Alternatively: `sourceCpp("src/fastKm.cpp")` which creates `fastKm()` from [src/fastKm.cpp](src/fastKm.cpp))
Getting:
```
Symbol not found: __ZN6mlpack3Log6AssertEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Expected in: flat namespace
```Which (via `c++filt __ZN6mlpack3Log6AssertEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE`) translates to: "mlpack::Log::Assert(bool, std::__cxx11::basic_string, std::allocator > const&)"
```{r kmeans_example, dependson = 'kmeans_source', cache = TRUE}
data(trees, package = "datasets"); data(faithful, package = "datasets")
# kmeans coerces data frames to matrix, so it's worth doing that beforehand
mtrees <- as.matrix(trees)
mfaithful <- as.matrix(faithful)
# KMeans in MLPACK requires observations to be in columns, not rows:
ttrees <- t(trees); tfaithful <- t(faithful)
microbenchmark(
kmeans_trees = kmeans(mtrees, 3),
mlpackKM_trees = mlpackKM(ttrees, 3),
kmeans_faithful = kmeans(mfaithful, 2),
mlpackKM_faithful = mlpackKM(tfaithful, 2)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```## Fast Classification
In this exercise, we will train a [Naive Bayes classifier from MLPACK](http://www.mlpack.org/docs/mlpack-2.1.0/doxygen.php?doc=classmlpack_1_1naive__bayes_1_1NaiveBayesClassifier.html). First, we train and classify in a single step. Then we will store the trained classifier in memory, and then later we will be able to save the model. Storing the trained model requires [serialization](https://en.wikipedia.org/wiki/Serialization), the topic of the next section.
**Update (2017-09-05)**: there is now -- apparently -- a [RcppMLPACK2](https://github.com/rcppmlpack/rcppmlpack2). For more details, refer to [RcppMLPACK2 and the MLPACK Machine Learning Library article](http://gallery.rcpp.org/articles/using-rcppmlpack2/).
```{Rcpp nbc_source, cache = TRUE}
// [[Rcpp::plugins(mlpack11)]]
// [[Rcpp::depends(RcppArmadillo)]]#include
using namespace Rcpp;#include
#include
using namespace mlpack::naive_bayes;// [[Rcpp::export]]
NumericVector mlpackNBC(const arma::mat& training_data, const arma::Row& labels, const size_t& classes, const arma::mat& new_data) {
// Initialization & training:
NaiveBayesClassifier<> nbc(training_data, labels, classes);
// Prediction:
arma::Row predictions;
nbc.Classify(new_data, predictions);
// Let's change the format of the output to be a little nicer:
NumericVector results(predictions.n_cols);
for (int i = 0; i < predictions.n_cols; ++i) {
results[i] = predictions(i);
}
return results;
}
```For the classification example, we'll use the Iris dataset. (Of course.)
```{r iris, cache = TRUE}
data(iris, package = "datasets")
set.seed(0)
training_idx <- sample.int(nrow(iris), 0.8 * nrow(iris), replace = FALSE)
training_x <- unname(as.matrix(iris[training_idx, 1:4]))
training_y <- unname(iris$Species[training_idx])
testing_x <- unname(as.matrix(iris[-training_idx, 1:4]))
testing_y <- unname(iris$Species[-training_idx])
# For fastNBC:
ttraining_x <- t(training_x)
ttraining_y <- matrix(as.numeric(training_y) - 1, nrow = 1)
classes <- length(levels(training_y))
ttesting_x <- t(testing_x)
ttesting_y <- matrix(as.numeric(testing_y) - 1, nrow = 1)
```I kept getting "Mat::col(): index out of bounds" error when trying to compile. I debugged the heck out of it until I finally looked in **naive_bayes_classifier_impl.hpp** and saw:
```cpp
for (size_t j = 0; j < data.n_cols; ++j)
{
const size_t label = labels[j];
++probabilities[label];
arma::vec delta = data.col(j) - means.col(label);
means.col(label) += delta / probabilities[label];
variances.col(label) += delta % (data.col(j) - means.col(label));
}
```Hence why we run into a problem when we use `as.numeric(training_y)` in R and turn that factor into 1s, 2s, and 3s. This makes sense in retrospect but would have been nice to explicitly know that MLPACK expects training data class labels to be 0-based.
```{r nbc_example, dependson = c('nbc_source', 'iris'), cache = TRUE}
# Naive Bayes via e1071
naive_bayes <- e1071::naiveBayes(training_x, training_y)
predictions <- e1071:::predict.naiveBayes(naive_bayes, testing_x, type = "class")
confusion_matrix <- caret::confusionMatrix(
data = predictions,
reference = testing_y
)
confusion_matrix$table
print(confusion_matrix$overall["Accuracy"])# Naive Bayes via MLPACK
predictions <- mlpackNBC(ttraining_x, ttraining_y, classes, ttesting_x)
confusion_matrix <- caret::confusionMatrix(
data = predictions,
reference = ttesting_y
)
confusion_matrix$table
print(confusion_matrix$overall["Accuracy"])# Performance Comparison
microbenchmark(
naiveBayes = {
naive_bayes <- e1071::naiveBayes(training_x, training_y)
predictions <- e1071:::predict.naiveBayes(naive_bayes, testing_x, type = "class")
},
fastNBC = mlpackNBC(ttraining_x, ttraining_y, classes, ttesting_x)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```### External Pointers
In the next step, we'll train a Naive Bayes classifier and keep that trained object in memory to make classification a separate step. Notice that we have to:
- declare a pointer: `NaiveBayesClassifier<>* nbc = new NaiveBayesClassifier<>(...)`
- use Rcpp's external pointers (`Rcpp::XPtr`) and
- return an [S-expression](http://adv-r.had.co.nz/C-interface.html#c-data-structures) (`SEXP`).```{Rcpp nb_train_source, cache = TRUE}
// [[Rcpp::plugins(mlpack11)]]
// [[Rcpp::depends(RcppArmadillo)]]#include
using namespace Rcpp;#include
#include
using namespace mlpack::naive_bayes;// [[Rcpp::export]]
SEXP mlpackNBTrainXPtr(const arma::mat& training_data, const arma::Row& labels, const size_t& classes) {
// Initialization & training:
NaiveBayesClassifier<>* nbc = new NaiveBayesClassifier<>(training_data, labels, classes);
Rcpp::XPtr> p(nbc, true);
return p;
}
``````{r nbc_example_2a, dependson = 'nbc_example', cache = FALSE}
fit <- mlpackNBTrainXPtr(ttraining_x, ttraining_y, classes)
str(fit)
````fit` is an external pointer to some memory. When we pass it to a C++ function, it's passed as an R data type (SEXP) that we have to convert to an external pointer before we can use the object's methods. Notice that we're now calling `nbc->Classify()` instead of `nbc.Classify()`.
```{Rcpp nb_classify_source, dependson = 'nb_train_source', cache = TRUE}
// [[Rcpp::plugins(mlpack11)]]
// [[Rcpp::depends(RcppArmadillo)]]#include
using namespace Rcpp;#include
#include
using namespace mlpack::naive_bayes;// [[Rcpp::export]]
NumericVector mlpackNBClassifyXPtr(SEXP xp, const arma::mat& new_data) {
XPtr> nbc(xp);
// Prediction:
arma::Row predictions;
nbc->Classify(new_data, predictions);
// Let's change the format of the output to be a little nicer:
NumericVector results(predictions.n_cols);
for (int i = 0; i < predictions.n_cols; ++i) {
results[i] = predictions(i);
}
return results;
}
``````{r nbc_example_2b, dependson = c('nb_classify_source', 'nbc_example_2a'), cache = FALSE}
fit_e1071 <- e1071::naiveBayes(training_x, training_y)
# Performance Comparison
microbenchmark(
`e1071 prediction` = e1071:::predict.naiveBayes(fit_e1071, testing_x, type = "class"),
`MLPACK prediction` = mlpackNBClassifyXPtr(fit, ttesting_x)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```See [Exposing C++ functions and classes with Rcpp modules](http://dirk.eddelbuettel.com/code/rcpp/Rcpp-modules.pdf) for more information.
## Random Number Generation
[Rcpp sugar](http://adv-r.had.co.nz/Rcpp.html#rcpp-sugar) provides a bunch of useful features and high-level abstractions, such as statistical distribution functions. Let's create a function that yields a sample of _n_ independent draws from Bernoulli(*p*):
```{Rcpp rng_source, cache = TRUE}
#include
using namespace Rcpp;// [[Rcpp::export]]
NumericVector bernie(const int& n, const float& p) {
return rbinom(n, 1, p);
}
```> Section 6.3 of [Writing R Extensions](http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-number-generation) describes an additional requirement for calling the R random number generation functions: you must call GetRNGState prior to using them and then PutRNGState afterwards. These functions (respectively) read .Random.seed and then write it out after use. When using Rcpp attributes (as we do via the // [[Rcpp::export]] annotation on the functions above) it is not necessary to call GetRNGState and PutRNGState because this is done automatically within the wrapper code generated for exported functions. In fact, since these calls don’t nest it is actually an error to call them when within a function exported via Rcpp attributes. ([Random number generation](http://gallery.rcpp.org/articles/random-number-generation/))
Let's just do a quick test to see if setting seed does what it should:
```{r rng_example, dependson = 'rng_source', cache = TRUE}
n <- 100; p <- 0.5; x <- list()
set.seed(0); x[[1]] <- bernie(n, p)
set.seed(0); x[[2]] <- bernie(n, p)
set.seed(42); x[[3]] <- bernie(n, p)
set.seed(42); x[[4]] <- bernie(n, p)
```
```{r rng_results, dependson = 'rng_example', cache = TRUE, results = 'asis', echo = FALSE}
y <- matrix(logical(length(x)^2), nrow = length(x))
for (i in seq_along(x)) {
y[i, ] <- vapply(x, function(y) {
return(identical(x[[i]], y))
}, TRUE)
}
rownames(y) <- colnames(y) <- paste0("Seed: ", c(rep(0, 2), rep(42, 2)), ", ", vapply(1:2, toOrdinal::toOrdinal, ""), " draw")
y[y] <- "All draws match"
y[y == "FALSE"] <- "Draws don't match"
knitr::kable(y, format = "markdown")
```Let's see how the performance differs between it and an R-equivalent when _n_=1,000:
```{r rng_benchmark, dependson = 'rng_source', cache = TRUE}
rbern <- function(n, p) {
return(rbinom(n, 1, p))
}
microbenchmark(
R = rbern(1e3, 0.49),
Rcpp = bernie(1e3, 0.49)
) %>% summary(unit = "ms") %>% knitr::kable(format = "markdown")
```## Serialization
[Serialization](https://en.wikipedia.org/wiki/Serialization) is the process of translating data structures and objects into a format that can be stored. Earlier, we trained a Naive Bayes classifier and kept the trained object in memory, returning an external pointer to it, allowing us to classify new observations as long as it is done within the same session.
This one requires: C++11 capability (if compiled supports it, enable via `// [[Rcpp::plugins(cpp11)]]`) and [cereal](http://uscilab.github.io/cereal/) serialization library, available via [Rcereal package](https://cran.r-project.org/package=Rcereal).
Roughly, we're going to create a wrapper for NumericMatrix that is serializable. **Note**: unlike previous section where the Rcpp chunks were complete, this section has multiple Rcpp chunks that are stitched together using the "ref.label" knitr chunk parameter, allowing me to have notes in-between different functions that should be together. See [Combining Chunks](http://rmarkdown.rstudio.com/authoring_knitr_engines.html#combining-chunks) for more details.
```{Rcpp cerealization, ref.label = c('myclass_definition', 'myclass_test', 'myclass_serialization', 'myclass_deserialization', 'cerealizations_source'), cache = TRUE, include = FALSE}
``````{Rcpp myclass_definition, eval = FALSE}
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(Rcereal)]]// Enables us to keep serialization method internally:
#include
// see http://uscilab.github.io/cereal/serialization_functions.html#non-public-serialization// Use std::vector and make it serializable:
#include
#include
// see http://uscilab.github.io/cereal/stl_support.html for more info// Cereal's binary archiving:
#include
#include
// see http://uscilab.github.io/cereal/serialization_archives.html
// and http://uscilab.github.io/cereal/quickstart.html for more info#include
using namespace Rcpp;class SerializableNumericMatrix
{
private:
int nrows;
int ncols;
std::vector data;
friend class cereal::access;
// This method lets cereal know which data members to serialize
template
void serialize(Archive& ar)
{
ar( data, nrows, ncols ); // serialize things by passing them to the archive
}
public:
SerializableNumericMatrix(){};
SerializableNumericMatrix(NumericMatrix x) {
std::vector y = as>(x); // Rcpp::as
data = y;
nrows = x.nrow();
ncols = x.ncol();
};
NumericVector neuemericMatrix() {
// Thanks to Kevin Ushey for the tip to use dim attribute
// http://stackoverflow.com/a/19866956/1091835
NumericVector d = wrap(data);
d.attr("dim") = Dimension(nrows, ncols);
return d;
}
NumericMatrix numericMatrix() {
NumericMatrix d(nrows, ncols, data.begin());
return d;
}
};
```Let's just test that everything works before adding [de-]serialization capabilities.
```{Rcpp myclass_test, eval = FALSE}
// [[Rcpp::export]]
NumericMatrix testSNM(NumericMatrix x) {
SerializableNumericMatrix snm(x);
return snm.numericMatrix();
}
``````{r myclass_example, dependson = 'cerealization', cache = TRUE}
x <- matrix(0:9, nrow = 2, ncol = 5)
(y <- testSNM(x))
```Yay! Okay, for this we are going to use the serialization/deserialization example from [Rcereal README.md](https://github.com/wush978/Rcereal/blob/master/README.md).
```{Rcpp myclass_serialization, eval = FALSE}
// [[Rcpp::export]]
RawVector serializeSNM(NumericMatrix x) {
SerializableNumericMatrix snm(x);
std::stringstream ss;
{
cereal::BinaryOutputArchive oarchive(ss); // Create an output archive
oarchive(snm);
}
ss.seekg(0, ss.end);
RawVector retval(ss.tellg());
ss.seekg(0, ss.beg);
ss.read(reinterpret_cast(&retval[0]), retval.size());
return retval;
}
``````{Rcpp myclass_deserialization, eval = FALSE}
//[[Rcpp::export]]
NumericVector deserializeSNM(RawVector src) {
std::stringstream ss;
ss.write(reinterpret_cast(&src[0]), src.size());
ss.seekg(0, ss.beg);
SerializableNumericMatrix snm;
{
cereal::BinaryInputArchive iarchive(ss);
iarchive(snm);
}
return snm.numericMatrix();
}
``````{r cerealization_example, dependson = c('myclass_example', 'cerealization'), cache = TRUE}
(raw_vector <- serializeSNM(x))
deserializeSNM(raw_vector)
```### Armadillo Serialization (Not Working)
Basically, we want to be able to serialize Armadillo matrices because then we can serialize things like the Naive Bayes classifier that rely on `arma::Mat`'s.
```{r}
# Enables us to include files in src/
registerPlugin("local", function() {
return(list(env = list(
PKG_CXXFLAGS = paste0('-I"', getwd(), '/src"')
)))
})
``````{Rcpp boosted_serialization, ref.label = c('bs_includes', 'bs_boost_serialization_includes', 'bs_RcppArmadilloForward', 'bs_armadillo_include', 'bs_boost_archive_includes', 'bs_mlpack_includes', 'bs_ns', 'bs_test_src', 'bs_serialize_src', 'bs_deserialize_src'), cache = TRUE, include = FALSE}
```
```{Rcpp bs_includes, eval = FALSE}
// [[Rcpp::plugins(local)]]
// [[Rcpp::plugins(mlpack11)]]
// [[Rcpp::depends(BH)]]
// [[Rcpp::depends(RcppArmadillo)]]
```Include everything we'll need for `serialize()`:
```{Rcpp bs_boost_serialization_includes, eval = FALSE}
#include
#include
#include
```In the next chunk we copy and modify the `RcppArmadillo__RcppArmadilloForward__h` definition from **RcppArmadilloForward.h** so as to use our own Mat extra proto and meat, which is really just:
```cpp
//! Add a serialization operator.
template
void serialize(Archive& ar, const unsigned int version);#include
```The **Mat_extra_bones.hpp** is included for serialization and **Mat_proto.h** is included so RcppArmadillo plays nice.
```{Rcpp bs_RcppArmadilloForward, eval = FALSE}
#ifndef RcppArmadillo__RcppArmadilloForward__h
#define RcppArmadillo__RcppArmadilloForward__h#include
#include
#include// Costom Mat extension that combines MLPACK with RcppArmadillo's Mat extensions:
#define ARMA_EXTRA_MAT_PROTO mat_extra_bones.hpp
#define ARMA_EXTRA_MAT_MEAT mat_extra_meat.hpp// Everything else the same:
#define ARMA_EXTRA_COL_PROTO RcppArmadillo/Col_proto.h
#define ARMA_EXTRA_COL_MEAT RcppArmadillo/Col_meat.h
#define ARMA_EXTRA_ROW_PROTO RcppArmadillo/Row_proto.h
#define ARMA_EXTRA_ROW_MEAT RcppArmadillo/Row_meat.h
#define ARMA_RNG_ALT RcppArmadillo/Alt_R_RNG.h
#include
/* forward declarations */
namespace Rcpp {
/* support for wrap */
template SEXP wrap ( const arma::Mat& ) ;
template SEXP wrap ( const arma::Row& ) ;
template SEXP wrap ( const arma::Col& ) ;
template SEXP wrap ( const arma::field& ) ;
template SEXP wrap ( const arma::Cube& ) ;
template SEXP wrap ( const arma::subview& ) ;
template SEXP wrap ( const arma::SpMat& ) ;
template
SEXP wrap(const arma::Glue& X ) ;
template
SEXP wrap(const arma::Op& X ) ;
template
SEXP wrap(const arma::eGlue& X ) ;
template
SEXP wrap(const arma::eOp& X ) ;
template
SEXP wrap(const arma::OpCube& X ) ;
template
SEXP wrap(const arma::GlueCube& X ) ;
template
SEXP wrap(const arma::eOpCube& X ) ;
template
SEXP wrap(const arma::eGlueCube& X ) ;template
SEXP wrap( const arma::mtOp& X ) ;template
SEXP wrap( const arma::mtGlue& X );
template
SEXP wrap( const arma::Gen& X) ;
template
SEXP wrap( const arma::GenCube& X) ;
namespace traits {/* support for as */
template class Exporter< arma::Mat > ;
template class Exporter< arma::Row > ;
template class Exporter< arma::Col > ;
template class Exporter< arma::SpMat > ;
template class Exporter< arma::field > ;
// template class Exporter< arma::Cube > ;} // namespace traits
template class ConstReferenceInputParameter< arma::Mat > ;
template class ReferenceInputParameter< arma::Mat > ;
template class ConstInputParameter< arma::Mat > ;
template class ConstReferenceInputParameter< arma::Col > ;
template class ReferenceInputParameter< arma::Col > ;
template class ConstInputParameter< arma::Col > ;
template class ConstReferenceInputParameter< arma::Row > ;
template class ReferenceInputParameter< arma::Row > ;
template class ConstInputParameter< arma::Row > ;
}#endif
```Okay, now that that's been defined, let's include **RcppArmadillo.h** which will include **RcppForward.h** but which will (hopefully) *not* redefine our slightly customized `RcppArmadillo__RcppArmadilloForward__h`.
```{Rcpp bs_armadillo_include, eval = FALSE}
#include
```MLPACK's Naive Bayes Classifier:
```{Rcpp bs_mlpack_includes, eval = FALSE}
#include
#include
using namespace mlpack::naive_bayes;
```Boost's Archives:
```{Rcpp bs_boost_archive_includes, eval = FALSE}
// Include everything we'll need for archiving.
#include
#include
#include
```
```{Rcpp bs_ns, eval = FALSE}
using namespace Rcpp;
using namespace arma;
```
```{Rcpp bs_test_src, eval = FALSE}
// [[Rcpp::export]]
mat test() {
mat A(3, 6, fill::randu);
return A;
}
``````{r bs_test_ex}
set.seed(0); (x1 <- test())
set.seed(0); x2 <- test()
identical(x1, x2)
``````{Rcpp bs_serialize_src, eval = FALSE}
// [[Rcpp::export]]
RawVector test_serialization(Mat m) {
std::stringstream ss;
// save data to archive
{
boost::archive::binary_oarchive oarchive(ss); // create an output archive
oarchive & m; // write class instance to archive
// archive and stream closed when destructors are called
}
ss.seekg(0, ss.end);
RawVector retval(ss.tellg());
ss.seekg(0, ss.beg);
ss.read(reinterpret_cast(&retval[0]), retval.size());
return retval;
}
```# References
- Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. New York, NY: Springer Science & Business Media. http://doi.org/10.1007/978-1-4614-6868-4
- Wickham, H. A. (2014). Advanced R. Chapman and Hall/CRC. http://doi.org/10.1201/b17487