{"id":32202302,"url":"https://github.com/boennecd/mssm","last_synced_at":"2026-02-21T13:01:47.040Z","repository":{"id":44905170,"uuid":"181791373","full_name":"boennecd/mssm","owner":"boennecd","description":"R package for multivariate state space models","archived":false,"fork":false,"pushed_at":"2022-01-31T12:33:06.000Z","size":20151,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-22T04:04:42.841Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boennecd.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-17T01:05:07.000Z","updated_at":"2025-10-21T15:52:07.000Z","dependencies_parsed_at":"2022-08-26T08:51:10.789Z","dependency_job_id":null,"html_url":"https://github.com/boennecd/mssm","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/boennecd/mssm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boennecd%2Fmssm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boennecd%2Fmssm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boennecd%2Fmssm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boennecd%2Fmssm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boennecd","download_url":"https://codeload.github.com/boennecd/mssm/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boennecd%2Fmssm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29681468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T12:30:22.644Z","status":"ssl_error","status_checked_at":"2026-02-21T12:29:55.402Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-22T04:01:27.105Z","updated_at":"2026-02-21T13:01:47.033Z","avatar_url":"https://github.com/boennecd.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput:\n  github_document:\n    pandoc_args: --webtex=https://latex.codecogs.com/svg.latex?\nbibliography: README.bib\nnocite: |\n  @Polyak92\n---\n\n# Multivariate State Space Models\n[![R-CMD-check](https://github.com/boennecd/mssm/workflows/R-CMD-check/badge.svg)](https://github.com/boennecd/mssm/actions)\n[![](https://www.r-pkg.org/badges/version/mssm)](https://www.r-pkg.org/badges/version/mssm)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/mssm)](https://cran.r-project.org/package=mssm)\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(\n  error = FALSE, cache = \"./README-cache/\", fig.path = \"man/figures/README-\", \n  echo = TRUE)\noptions(digits = 4, scipen = 7)\n.fig_height_small \u003c- 4\nsource(\"render_toc.R\")\npalette(c(\"#000000\", \"#009E73\", \"#e79f00\", \"#9ad0f3\", \"#0072B2\", \"#D55E00\", \n          \"#CC79A7\", \"#F0E442\"))\n```\n\nThis package provides methods to estimate models of the form \n\n$$y_{it} \\sim g(\\eta_{it}),\\qquad i\\in I_t$$ \n$$\\eta_{it} = \\vec\\gamma^\\top\\vec x_{it} +\\vec\\beta_t^\\top\\vec z_{it}$$\n$$\\vec\\beta_t = F\\vec\\beta_{t-1}+\\vec\\epsilon_t, \\qquad \\vec\\epsilon_t\\sim N(\\vec 0, Q)$$\n\nwhere $g$ is simple distribution, we observe $t=1,\\dots,T$ periods, and $I_t$, \n$y_{it}$, $\\vec x_{it}$, and \n$\\vec z_{it}$ are known. What is multivariate is \n$\\vec y_t = \\{y_{it}\\}_{i\\in I_t}$ (though, $\\vec \\beta_t$ can also be \nmultivariate) and this package is written to scale well \nin the cardinality of $I_t$. The package uses independent \nparticle filters as suggested by @Lin05. This particular type of filter \ncan be used in the method suggested by @Poyiadjis11. I will show an example\nof how to use the package through the rest of the document and highlight some \nimplementation details. \n\nThe package can be installed from Github e.g., by \ncalling\n\n```{r github_dl, eval = FALSE}\ndevtools::install_github(\"boennecd/mssm\")\n```\n\nor from CRAN by calling \n\n```{r cran_inst, eval = FALSE}\ninstall.packages(\"mssm\")\n```\n\n\n## Table of Contents\n\n```{r echo = FALSE}\nrender_toc(\"README.Rmd\", toc_header_name = \"Table of Contents\", toc_depth = 3L, base_level = 2L)\n```\n\n## Poisson Example\n\nWe simulate data as follows.\n\n```{r simulate, fig.height = .fig_height_small}\n# simulate path of state variables \nset.seed(78727269)\nn_periods \u003c- 312L\n(F. \u003c- matrix(c(.5, .1, 0, .8), 2L))\n(Q \u003c- matrix(c(.5^2, .1, .1, .7^2), 2L))\n(Q_0 \u003c- matrix(c(0.333, 0.194, 0.194, 1.46), 2L))\n\nbetas \u003c- cbind(crossprod(chol(Q_0),        rnorm(2L)                      ), \n               crossprod(chol(Q  ), matrix(rnorm((n_periods - 1L) * 2), 2)))\nbetas \u003c- t(betas)\nfor(i in 2:nrow(betas))\n  betas[i, ] \u003c- betas[i, ] + F. %*% betas[i - 1L, ]\npar(mar = c(5, 4, 1, 1))\nmatplot(betas, lty = 1, type = \"l\")\n\n# simulate observations\ncfix \u003c- c(-1, .2, .5, -1) # gamma\nn_obs \u003c- 100L\ndat \u003c- lapply(1:n_obs, function(id){\n  x \u003c- runif(n_periods, -1, 1)\n  X \u003c- cbind(X1 = x, X2 = runif(1, -1, 1))\n  z \u003c- runif(n_periods, -1, 1)\n  \n  eta \u003c- drop(cbind(1, X, z) %*% cfix + rowSums(cbind(1, z) * betas))\n  y \u003c- rpois(n_periods, lambda = exp(eta))\n  \n  # randomly drop some\n  keep \u003c- .2 \u003e runif(n_periods)\n  \n  data.frame(y = y, X, Z = z, id = id, time_idx = 1:n_periods)[keep, ]\n})\ndat \u003c- do.call(rbind, dat)\n\n# show some properties \nnrow(dat)\nhead(dat)\ntable(dat$y)\n\n# quick smooth of number of events vs. time\npar(mar = c(5, 4, 1, 1))\nplot(smooth.spline(dat$time_idx, dat$y), type = \"l\", xlab = \"Time\", \n     ylab = \"Number of events\")\n\n# and split by those with `Z` above and below 0\nwith(dat, {\n  z_large \u003c- ifelse(Z \u003e 0, \"large\", \"small\")\n  smooths \u003c- lapply(split(cbind(dat, z_large), z_large), function(x){\n    plot(smooth.spline(x$time_idx, x$y), type = \"l\", xlab = \"Time\", \n     ylab = paste(\"Number of events -\", unique(x$z_large)))\n  })\n})\n```\n\nIn the above, we simulate `r n_periods` (`n_periods`) with `r n_obs` (`n_obs`)\nindividuals. Each individual has a fixed covariate, `X2`, and two time-varying\ncovariates, `X1` and `Z`. One of the time-varying covariates, `Z`, has a \nrandom slope. Further, the intercept is also random. \n\n### Log-Likelihood Approximations\n\nWe start by estimating a generalized linear model without random effects.\n\n```{r fit_glm}\nglm_fit \u003c- glm(y ~ X1 + X2 + Z, poisson(), dat)\nsummary(glm_fit)\nlogLik(glm_fit)\n```\n\nNext, we make a log-likelihood approximation with the implemented particle at \nthe true parameters with the `mssm` function.\n\n```{r fit_mssm}\nlibrary(mssm)\nll_func \u003c- mssm(\n  fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n  # make it explict that there is an intercept (not needed)\n  random = ~ 1 + Z, ti = time_idx, control = mssm_control(\n    n_threads = 5L, N_part = 500L, what = \"log_density\"))\n\nsystem.time(\n  mssm_obj \u003c- ll_func$pf_filter(\n    cfix = cfix, disp = numeric(), F. = F., Q = Q))\n\n# returns the log-likelihood approximation\nlogLik(mssm_obj)\n\n# also shown by print\nmssm_obj\n```\n\nWe get a much larger log-likelihood as expected. We can plot the predicted \nvalues of state variables from the filter distribution.\n\n```{r plot_filter, fig.height = .fig_height_small, fig.width = 12L}\n# get predicted mean and prediction interval \nfilter_means \u003c- plot(mssm_obj, do_plot = FALSE)\n\n# plot with which also contains the true paths\nfor(i in 1:ncol(betas)){\n  be \u003c- betas[, i]\n  me \u003c- filter_means$means[i, ]\n  lb \u003c- filter_means$lbs[i, ]\n  ub \u003c- filter_means$ubs[i, ]\n  \n  #     dashed: true paths\n  # continuous: predicted mean from filter distribution \n  #     dotted: prediction interval\n  par(mar = c(5, 4, 1, 1))\n  matplot(cbind(be, me, lb, ub), lty = c(2, 1, 3, 3), type = \"l\", \n          col = \"black\", ylab = rownames(filter_means$lbs)[i])\n}\n```\n\nWe can also get predicted values from the smoothing distribution. \n\n```{r show_smooths, fig.height = .fig_height_small, fig.width = 12L}\n# get smoothing weights\nmssm_obj \u003c- ll_func$smoother(mssm_obj)\n\n# get predicted mean and prediction interval from smoothing distribution\nsmooth_means \u003c- plot(mssm_obj, do_plot = FALSE, which_weights = \"smooth\")\n\nfor(i in 1:ncol(betas)){\n  be  \u003c- betas[, i]\n  me  \u003c- filter_means$means[i, ]\n  lb  \u003c- filter_means$lbs[i, ]\n  ub  \u003c- filter_means$ubs[i, ]\n  mes \u003c- smooth_means$means[i, ]\n  lbs \u003c- smooth_means$lbs[i, ]\n  ubs \u003c- smooth_means$ubs[i, ]\n  \n  #     dashed: true paths\n  # continuous: predicted mean from filter distribution \n  #     dotted: prediction interval\n  # \n  # smooth predictions are in a different color\n  par(mar = c(5, 4, 1, 1))\n  matplot(cbind(be, me, lb, ub, mes, lbs, ubs), \n          lty = c(2, 1, 3, 3, 1, 3, 3), type = \"l\", \n          col = c(rep(1, 4), rep(2, 3)), ylab = rownames(filter_means$lbs)[i])\n}\n\n# compare mean square error of the two means\nrbind(filter = colMeans((t(filter_means$means) - betas)^2), \n      smooth = colMeans((t(smooth_means$means) - betas)^2))\n```\n\nWe can get the effective sample size at each point in time with the `get_ess`\nfunction.\n\n```{r show_ess, fig.height = .fig_height_small}\n(ess \u003c- get_ess(mssm_obj))\nplot(ess)\n```\n\nWe can compare this what we get by using a so-called bootstrap (like) filter\ninstead.\n\n```{r comp_boot, fig.height = .fig_height_small}\nlocal({\n  ll_boot \u003c- mssm(\n    fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n    random = ~ Z, ti = time_idx, control = mssm_control(\n      n_threads = 5L, N_part = 500L, what = \"log_density\", \n      which_sampler = \"bootstrap\"))\n  \n  print(system.time(\n    boot_fit \u003c- ll_boot$pf_filter(\n      cfix = coef(glm_fit), disp = numeric(), F. = F., Q = Q)))\n  \n  plot(get_ess(boot_fit))\n})\n```\n\nThe above is not much faster (and maybe slower in this run) as the bulk of \nthe computation is not in the sampling step. We can also compare the \nlog-likelihood approximation with what we get if we choose parameters close \nto the GLM estimates.\n\n```{r comp_close_glm}\nmssm_glm \u003c- ll_func$pf_filter(\n  cfix = coef(glm_fit), disp = numeric(), F. = diag(1e-8, 2), \n  Q = diag(1e-4^2, 2))\nlogLik(mssm_glm)\n```\n\n### Antithetic Variables\n\nOne way to reduce the variance of the Monte Carlo estimate is to use \n[antithetic variables](https://en.wikipedia.org/wiki/Antithetic_variates). \nTwo types of antithetic variables are implemented as in @Durbin97. \nThat is, one balanced for location and two balanced for scale. This is currently \nonly implemented with a t-distribution as the proposal distribution. \n\nWe start by giving some details on the locations balanced variable. Suppose \nwe use a t-distribution with $\\nu$ degrees of freedom, a $d$ dimensional mean \nof $\\mu$ and a scale matrix $\\Sigma$. We can then generate a sample by setting\n\n$$\\begin{aligned}  \\vec x \u0026= \\vec\\mu + C \\frac{\\vec z}{\\sqrt{a / \\nu}} \u0026 \\Sigma \u0026= CC^\\top \\\\ \\vec z \u0026\\sim N(\\vec 0, I) \u0026 a \u0026\\sim \\chi^2_\\nu \\end{aligned}$$\n\nThen the location balanced variable is\n\n$$\\widehat{\\vec x} = \\vec\\mu - C \\frac{\\vec z}{\\sqrt{a / \\nu}}$$\n\nFor the scaled balanced variables we use that \n\n$$u = \\frac{\\vec z^\\top\\vec z/ d}{a / \\nu} \\sim F(d, \\nu)$$\n\nWe then define the cumulative distribution function\n\n$$k = P(U \\leq u) = Q(u)$$\n\nand set \n\n$$u' = Q^{-1}(1 - k)$$\n\nThen the two scaled balanced variables are\n\n$$\\begin{aligned} \\widetilde{\\vec x}_1 \u0026= \\vec\\mu + \\sqrt{u'/u} \\cdot C \\frac{\\vec z}{\\sqrt{a / \\nu}} \\\\ \\widetilde{\\vec x}_2 \u0026= \\vec\\mu - \\sqrt{u'/u} \\cdot C \\frac{\\vec z}{\\sqrt{a / \\nu}} \\end{aligned}$$\n\nWe will illustrate the reduction in variance of the log-likelihood estimate. \nTo do so, we run the particle filter with and without antithetic variables\nmultiple times below\nto get an estimate of the error of the approximation.\n\n```{r anti_ex, cache = 1}\nset.seed(12769550)\ncompare_anti \u003c- local({\n  ll_func_no_anti \u003c- mssm(\n    fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n    random = ~ Z, ti = time_idx, control = mssm_control(\n      n_threads = 5L, N_part = 500L, what = \"log_density\")) \n  \n  ll_func_anti \u003c- mssm(\n    fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat,\n    random = ~ Z, ti = time_idx, control = mssm_control(\n      n_threads = 5L, N_part = 500L, what = \"log_density\", \n      use_antithetic = TRUE)) \n  \n  no_anti \u003c- replicate(\n    100, {\n      ti \u003c- system.time(x \u003c- logLik(ll_func_no_anti$pf_filter(\n        cfix = cfix, disp = numeric(), F. = F., Q = Q, seed = NULL)))\n      list(ti = ti, x = x)\n    }, simplify = FALSE)\n  w_anti  \u003c- replicate(\n    100, {\n      ti \u003c- system.time(x \u003c- logLik(ll_func_anti$pf_filter(\n        cfix = cfix, disp = numeric(), F. = F., Q = Q, seed = NULL)))\n      list(ti = ti, x = x)\n    }, simplify = FALSE)\n  \n  list(no_anti = no_anti, w_anti = w_anti)\n})\n```\n\nThe mean estimate of the log-likelihood and standard error of the estimate is\nshown below with and without antithetic variables.\n\n```{r show_anti_ex}\nsapply(compare_anti, function(x){ \n  x \u003c- sapply(x, \"[[\", \"x\")\n  c(mean = mean(x), se = sd(x) / sqrt(length(x)))\n})\n```\n\nUsing antithetic variables is slower. Below we show summary statistics for the \nelapsed time without using antithetic variables and with antithetic variables.\n\n```{r run_anti_ex}\nsapply(compare_anti, function(x){ \n  x \u003c- sapply(x, \"[[\", \"ti\")\n  z \u003c- x[c(\"elapsed\"), ]\n  c(mean = mean(z), quantile(z, c(.5, .25, .75, 0, 1)))\n})\n```\n\n### Parameter Estimation\n\nWe will need to estimate the parameters for real applications. We could do \nthis e.g., with a Monte Carlo expectation-maximization algorithm or by using \na Monte Carlo approximation of the gradient. Currently, the latter is only \navailable and the user will have to write a custom function to perform \nthe estimation. \nI will provide an example below. The `sgd` function is not a part of the \npackage. Instead the package provides a way to approximate the gradient and \nallows the user to perform subsequent maximization (e.g., with constraints or \npenalties). The definition of the \n`sgd` is given at the end of this file as it is somewhat long. We start \nby using a Laplace approximation to get the starting values.\n\n```{r mc_grad_est, echo = FALSE}\n# Stochastic gradient descent for mssm object. The function assumes that the \n# state vector is stationary. The default values are rather arbitrary.\n# \n# Args:\n#   object: an object of class mssmFunc. \n#   n_it: number of iterations. \n#   lrs: learning rates to use. Must have n_it elements. Like problem specific. \n#   avg_start: index to start averaging. See Polyak et al. (1992) for arguments\n#              for averaging.\n#   cfix: starting values for fixed coefficients. \n#   F.: starting value for transition matrix in conditional distribution of the \n#       current state given the previous state.\n#   Q: starting value for covariance matrix in conditional distribution of the \n#      current state given the previous. \n#   verbose: TRUE if output should be printed during estmation. \n#   disp: starting value for dispersion parameter.\n# \n# Returns:\n#   List with estimates and the log-likelihood approximation at each iteration. \nsgd \u003c- function(\n  object, n_it = 150L, \n  lrs = 1e-2 * (1:n_it)^(-1/2), avg_start = max(1L, as.integer(n_it * 4L / 5L)),\n  cfix, F., Q,  verbose = FALSE, disp = numeric())\n{\n  # make checks\n  stopifnot(\n    inherits(object, \"mssmFunc\"), object$control$what == \"gradient\", n_it \u003e 0L, \n    all(lrs \u003e 0.), avg_start \u003e 1L, length(lrs) == n_it)\n  n_fix \u003c- nrow(object$X)\n  n_rng \u003c- nrow(object$Z)\n  \n  # objects for estimates at each iteration, gradient norms, and\n  # log-likelihood approximations\n  has_dispersion \u003c-\n    any(sapply(c(\"^Gamma\", \"^gaussian\"), grepl, x = object$family))\n  n_params \u003c-\n    has_dispersion + n_fix + n_rng * n_rng + n_rng * (n_rng  + 1L) / 2L\n  ests \u003c- matrix(NA_real_, n_it + 1L, n_params)\n  ests[1L, ] \u003c- c(\n    cfix,  if(has_dispersion) disp else NULL, F.,\n    Q[lower.tri(Q, diag = TRUE)])\n  grad_norm \u003c- lls \u003c- rep(NA_real_, n_it)\n  \n  # indices of the different components\n  idx_fix   \u003c- 1:n_fix\n  if(has_dispersion)\n    idx_dip \u003c- n_fix + 1L\n  idx_F     \u003c- 1:(n_rng * n_rng) + n_fix + has_dispersion\n  idx_Q     \u003c- 1:(n_rng * (n_rng  + 1L) / 2L) + n_fix + n_rng * n_rng +\n    has_dispersion\n  \n  # function to set the parameters\n  library(matrixcalc) # TODO: get rid of this\n  dup_mat \u003c- duplication.matrix(ncol(Q))\n  set_parems \u003c- function(i){\n    # select whether or not to average\n    idx \u003c- if(i \u003e avg_start) avg_start:i else i\n    \n    # set new parameters\n    cfix   \u003c\u003c-             colMeans(ests[idx, idx_fix, drop = FALSE])\n    if(has_dispersion)\n      disp \u003c\u003c-             colMeans(ests[idx, idx_dip, drop = FALSE])\n    F.[]   \u003c\u003c-             colMeans(ests[idx, idx_F  , drop = FALSE])\n    Q[]    \u003c\u003c- dup_mat %*% colMeans(ests[idx, idx_Q  , drop = FALSE])\n    \n  }\n    \n  # run algorithm\n  max_half \u003c- 25L\n  for(i in 1:n_it + 1L){\n    # get gradient. First, run the particle filter\n    filter_out \u003c- object$pf_filter(\n      cfix = cfix, disp = disp, F. = F., Q = Q, seed = NULL)\n    lls[i - 1L] \u003c- c(logLik(filter_out))\n    \n    # then get the gradient associated with each particle and the log \n    # normalized weight of the particles\n    grads \u003c- tail(filter_out$pf_output, 1L)[[1L]]$stats\n    ws    \u003c- tail(filter_out$pf_output, 1L)[[1L]]$ws_normalized\n    \n    # compute the gradient and take a small step\n    grad \u003c- colSums(t(grads) * drop(exp(ws)))\n    grad_norm[i - 1L] \u003c- norm(t(grad))\n    \n    lr_i \u003c- lrs[i - 1L]\n    k \u003c- 0L\n    while(k \u003c max_half){\n      ests[i, ] \u003c- ests[i - 1L, ] + lr_i * grad \n      set_parems(i)\n      \n      # check that Q is positive definite and the system is stationary\n      c1 \u003c- all(abs(eigen(F.)$values) \u003c 1)\n      c2 \u003c- all(eigen(Q)$values \u003e 0)\n      c3 \u003c- !has_dispersion || disp \u003e 0.\n      if(c1 \u0026\u0026 c2 \u0026\u0026 c3)\n        break\n      \n      # decrease learning rate\n      lr_i \u003c- lr_i * .5\n      k \u003c- k + 1L\n    }\n    \n    # check if we failed to find a value within our constraints\n    if(k == max_half)\n      stop(\"failed to find solution within constraints\")\n    \n    # print information if requested \n    if(verbose){\n      cat(sprintf(\n        \"\\nIt %5d: log-likelihood (current, max) %12.2f, %12.2f\\n\",\n        i - 1L, logLik(filter_out), max(lls, na.rm = TRUE)),\n        rep(\"-\", 66), \"\\n\", sep = \"\")\n      cat(\"cfix\\n\")\n      print(cfix)\n      if(has_dispersion)\n        cat(sprintf(\"Dispersion: %20.8f\\n\", disp))\n      cat(\"F\\n\")\n      print(F.)\n      cat(\"Q\\n\")\n      print(Q)\n      cat(sprintf(\"Gradient norm: %10.4f\\n\", grad_norm[i - 1L]))\n      print(get_ess(filter_out))\n\n    }\n  } \n  \n  list(estimates = ests, logLik = lls, F. = F., Q = Q, cfix = cfix, \n       disp = disp, grad_norm = grad_norm)\n}\n\n# Stochastic gradient descent for mssm object using the Adam algorithm. The  \n# function assumes that the state vector is stationary.\n# \n# Args:\n#   object: an object of class mssmFunc. \n#   n_it: number of iterations. \n#   mp: decay rate for first moment.\n#   vp: decay rate for secod moment.\n#   lr: learning rate.\n#   cfix: starting values for fixed coefficients. \n#   F.: starting value for transition matrix in conditional distribution of the \n#       current state given the previous state.\n#   Q: starting value for covariance matrix in conditional distribution of the \n#      current state given the previous. \n#   verbose: TRUE if output should be printed during estmation. \n#   disp: starting value for dispersion parameter.\n# \n# Returns:\n#   List with estimates and the log-likelihood approximation at each iteration. \nadam \u003c- function(\n  object, n_it = 150L, mp = .9, vp = .999, lr = .01, cfix, F., Q,\n  verbose = FALSE, disp = numeric())\n{\n  # make checks\n  stopifnot(\n    inherits(object, \"mssmFunc\"), object$control$what == \"gradient\", n_it \u003e 0L,\n    lr \u003e 0., mp \u003e 0, mp \u003c 1, vp \u003e 0, vp \u003c 1)\n  n_fix \u003c- nrow(object$X)\n  n_rng \u003c- nrow(object$Z)\n\n  # objects for estimates at each iteration, gradient norms, and\n  # log-likelihood approximations\n  has_dispersion \u003c-\n    any(sapply(c(\"^Gamma\", \"^gaussian\"), grepl, x = object$family))\n  n_params \u003c-\n    has_dispersion + n_fix + n_rng * n_rng + n_rng * (n_rng  + 1L) / 2L\n  ests \u003c- matrix(NA_real_, n_it + 1L, n_params)\n  ests[1L, ] \u003c- c(\n    cfix,  if(has_dispersion) disp else NULL, F.,\n    Q[lower.tri(Q, diag = TRUE)])\n  grad_norm \u003c- lls \u003c- rep(NA_real_, n_it)\n\n  # indices of the different components\n  idx_fix   \u003c- 1:n_fix\n  if(has_dispersion)\n    idx_dip \u003c- n_fix + 1L\n  idx_F     \u003c- 1:(n_rng * n_rng) + n_fix + has_dispersion\n  idx_Q     \u003c- 1:(n_rng * (n_rng  + 1L) / 2L) + n_fix + n_rng * n_rng +\n    has_dispersion\n\n  # function to set the parameters\n  library(matrixcalc) # TODO: get rid of this\n  dup_mat \u003c- duplication.matrix(ncol(Q))\n  set_parems \u003c- function(i){\n    cfix   \u003c\u003c-             ests[i, idx_fix]\n    if(has_dispersion)\n      disp \u003c\u003c-             ests[i, idx_dip]\n    F.[]   \u003c\u003c-             ests[i, idx_F  ]\n    Q[]    \u003c\u003c- dup_mat %*% ests[i, idx_Q  ]\n\n  }\n\n  # run algorithm\n  max_half \u003c- 25L\n  m \u003c- NULL\n  v \u003c- NULL\n  failed \u003c- FALSE\n  for(i in 1:n_it + 1L){\n    # get gradient. First, run the particle filter\n    filter_out \u003c- object$pf_filter(\n      cfix = cfix, disp = disp, F. = F., Q = Q, seed = NULL)\n    lls[i - 1L] \u003c- c(logLik(filter_out))\n\n    # then get the gradient associated with each particle and the log\n    # normalized weight of the particles\n    grads \u003c- tail(filter_out$pf_output, 1L)[[1L]]$stats\n    ws    \u003c- tail(filter_out$pf_output, 1L)[[1L]]$ws_normalized\n\n    # compute the gradient and take a small step\n    grad \u003c- colSums(t(grads) * drop(exp(ws)))\n    grad_norm[i - 1L] \u003c- norm(t(grad))\n\n    m \u003c- if(is.null(m)) (1 - mp) * grad   else mp * m + (1 - mp) * grad\n    v \u003c- if(is.null(v)) (1 - vp) * grad^2 else vp * v + (1 - vp) * grad^2\n    mh \u003c- m / (1 - mp^(i - 1))\n    vh \u003c- v / (1 - vp^(i - 1))\n    de \u003c- mh / sqrt(vh + 1e-8)\n\n    lr_i \u003c- lr\n    k \u003c- 0L\n    while(k \u003c max_half){\n      ests[i, ] \u003c- ests[i - 1L, ] + lr_i * de\n      set_parems(i)\n\n      # check that Q is positive definite, the dispersion parameter is\n      # positive, and the system is stationary\n      c1 \u003c- all(abs(eigen(F.)$values) \u003c 1)\n      c2 \u003c- all(eigen(Q)$values \u003e 0)\n      c3 \u003c- !has_dispersion || disp \u003e 0.\n      if(c1 \u0026\u0026 c2 \u0026\u0026 c3)\n        break\n\n      # decrease learning rate\n      lr_i \u003c- lr_i * .5\n      k \u003c- k + 1L\n    }\n\n    # check if we failed to find a value within our constraints\n    if(k == max_half){\n      warning(\"failed to find solution within constraints\")\n      failed \u003c- TRUE\n      break\n    }\n\n    # print information if requested\n    if(verbose){\n      cat(sprintf(\n        \"\\nIt %5d: log-likelihood (current, max) %12.2f, %12.2f\\n\",\n        i - 1L, logLik(filter_out), max(lls, na.rm = TRUE)),\n        rep(\"-\", 66), \"\\n\", sep = \"\")\n      cat(\"cfix\\n\")\n      print(cfix)\n      if(has_dispersion)\n        cat(sprintf(\"Dispersion: %20.8f\\n\", disp))\n      cat(\"F\\n\")\n      print(F.)\n      cat(\"Q\\n\")\n      print(Q)\n      cat(sprintf(\"Gradient norm: %10.4f\\n\", grad_norm[i - 1L]))\n      print(get_ess(filter_out))\n\n    }\n  }\n\n  list(estimates = ests, logLik = lls, F. = F., Q = Q, cfix = cfix, \n       disp = disp, failed = failed, grad_norm = grad_norm)\n}\n```\n\n```{r laplace, cache = 1}\n# setup mssmFunc object to use \nll_func \u003c- mssm(  \n  fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n  random = ~ Z, ti = time_idx, control = mssm_control(\n    n_threads = 5L, N_part = 200L, what = \"gradient\", use_antithetic = TRUE))\n\n# use Laplace approximation to get starting values\nsystem.time(\n  sta \u003c- ll_func$Laplace(\n    F. = diag(.5, 2), Q = diag(1, 2), cfix = coef(glm_fit), disp = numeric()))\n```\n\n```{r show_laplace}\n# the function returns an object with the estimated parameters and  \n# approximation log-likelihood\nsta\nsta$Q\n```\n\n\u003c!--\nlibrary(knitr)\nopts_knit$set(output.dir = \".\")\nopts_chunk$set(cache.path = paste0(\n    file.path(\"README_cache\", \"markdown_github\"), .Platform$file.sep))\nload_cache(\"sgd\")\n--\u003e\n\n```{r sgd, cache = 1, dependson = \"laplace\"}\n# use stochastic gradient descent with averaging\nset.seed(25164416)\nsystem.time( \n  res \u003c- sgd(\n    ll_func, F. = sta$F., Q = sta$Q, cfix = sta$cfix, \n    lrs = .001 * (1:50)^(-1/2), n_it = 50L, avg_start = 30L))\n\n# use Adam algorithm instead\nset.seed(25164416)\nsystem.time( \n  resa \u003c- adam(\n    ll_func, F. = sta$F., Q = sta$Q, cfix = sta$cfix, \n    lr = .01, n_it = 50L))\n```\n\nPlots of the approximate log-likelihoods at each iteration is shown below \nalong with the final estimates. \n\n```{r show_use_sgd, fig.height = .fig_height_small}\nprint(tail(res$logLik), digits = 6) # final log-likelihood approximations\npar(mar = c(5, 4, 1, 1))\nplot(res$logLik, type = \"l\", ylab = \"log-likelihood approximation\")\nplot(res$grad_norm, type = \"l\", ylab = \"approximate gradient norm\")\n\n# final estimates\nres$F. \nres$Q\nres$cfix\n\n# compare with output from Adam algorithm\nprint(tail(resa$logLik), digits = 6) # final log-likelihood approximations\nplot(resa$logLik, type = \"l\", ylab = \"log-likelihood approximation\")\nplot(resa$grad_norm, type = \"l\", ylab = \"approximate gradient norm\")\nresa$F. \nresa$Q\nresa$cfix\n```\n\nWe may want to use more particles towards the end when we estimate the \nparameters. To do, we use the approximation described in the next section \nat the final estimates that we arrived at before. \n\n\u003c!--\nlibrary(knitr)\nopts_knit$set(output.dir = \".\")\nopts_chunk$set(cache.path = paste0(\n    file.path(\"README_cache\", \"markdown_github\"), .Platform$file.sep))\nload_cache(\"cont_est\")\n--\u003e\n\n```{r cont_est, cache = 1, dependson = \"use_sgd\"}\nll_func \u003c- mssm(\n  fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n  random = ~ Z, ti = time_idx, control = mssm_control(\n    n_threads = 5L, N_part = 10000L, what = \"gradient\",\n    which_ll_cp = \"KD\", aprx_eps = .01, use_antithetic = TRUE))\n\nset.seed(25164416)\nsystem.time( \n  res_final \u003c- adam(\n    ll_func, F. = resa$F., Q = resa$Q, cfix = resa$cfix, \n    lr = .001, n_it = 25L))\n```\n\n```{r show_cont_est}\nplot(res_final$logLik, type = \"l\", ylab = \"log-likelihood approximation\")\nplot(res_final$grad_norm, type = \"l\", ylab = \"approximate gradient norm\")\nres_final$F. \nres_final$Q\nres_final$cfix\n```\n\n```{r clean_param_ests, echo = FALSE}\n# do this to not use them later by an error\nrm(res, resa)\n```\n\n### Faster Approximation\nOne drawback with the particle filter we use is that it has $\\mathcal{O}(N^2)$ \ncomputational complexity where $N$ is the number of particles. We can see \nthis by changing the number of particles. \n\n```{r comp_w_n_part, cache = 1}\nlocal({\n  # assign function that returns a function that uses a given number of \n  # particles\n  func \u003c- function(N){\n    ll_func \u003c- mssm(\n      fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n      random = ~ Z, ti = time_idx, control = mssm_control(\n        n_threads = 5L, N_part = N, what = \"log_density\", \n        use_antithetic = TRUE))\n    function()\n      ll_func$pf_filter(\n        cfix = coef(glm_fit), disp = numeric(), F. = diag(1e-8, 2), \n        Q = diag(1e-4^2, 2))\n      \n  }\n  \n  f_100  \u003c- func( 100L)\n  f_200  \u003c- func( 200L)\n  f_400  \u003c- func( 400L)\n  f_800  \u003c- func( 800L)\n  f_1600 \u003c- func(1600L)\n  \n  # benchmark. Should ĩncrease at ~ N^2 rate\n  microbenchmark::microbenchmark(\n    `100` = f_100(), `200` = f_200(), `400` = f_400(), `800` = f_800(),\n    `1600` = f_1600(), times = 3L)\n})\n```\n\nA solution is to use the dual k-d tree method I cover later. The computational\ncomplexity is $\\mathcal{O}(N \\log N)$ for this method which is somewhat \nindicated by the run times shown below.\n\n```{r KD_comp_w_n_part, cache = 1}\nlocal({\n  # assign function that returns a function that uses a given number of \n  # particles\n  func \u003c- function(N){\n    ll_func \u003c- mssm(\n      fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat, \n      random = ~ Z, ti = time_idx, control = mssm_control(\n        n_threads = 5L, N_part = N, what = \"log_density\", \n        which_ll_cp = \"KD\", KD_N_max = 6L, aprx_eps = 1e-2, \n        use_antithetic = TRUE))\n    function()\n      ll_func$pf_filter(\n        cfix = coef(glm_fit), disp = numeric(), F. = diag(1e-8, 2), \n        Q = diag(1e-4^2, 2))\n      \n  }\n  \n  f_100   \u003c- func(  100L)\n  f_200   \u003c- func(  200L)\n  f_400   \u003c- func(  400L)\n  f_800   \u003c- func(  800L)\n  f_1600  \u003c- func( 1600L)\n  f_12800 \u003c- func(12800L) # \u003c-- much larger\n  \n  # benchmark. Should increase at ~ N log N rate\n  microbenchmark::microbenchmark(\n    `100` = f_100(), `200` = f_200(), `400` = f_400(), `800` = f_800(), \n    `1600` = f_1600(), `12800` = f_12800(), times = 3L)\n})\n```\n\nThe `aprx_eps` controls the size of the error. To be precise about what this\nvalue does then we need to some notation for the complete likelihood \n(the likelihood where we observe $\\vec\\beta_1,\\dots,\\vec\\beta_T$s). This is\n\n$$L = \\mu_1(\\vec \\beta_1)g_1(\\vec y_1 \\mid \\vec \\beta_1)\\prod_{t=2}^Tf(\\vec\\beta_t \\mid\\vec\\beta_{t-1})g_t(y_t\\mid\\beta_t)$$\n\nwhere $g_t$ is conditional distribution $\\vec y_t$ given $\\vec\\beta_t$, $f$ is \nthe conditional distribution of $\\vec\\beta_t$ given $\\vec\\beta_{t-1}$, and\n$\\mu$ is the time-invariant distribution of $\\vec\\beta_t$. \nLet $w_t^{(j)}$ be the weight of particle \n$j$ at time $t$ and $\\vec \\beta_t^{(j)}$ be the $j$th particle at time $t$. \nThen we ensure the error in our evaluation of terms \n$w_{t-1}^{(j)}f(\\vec\\beta_t^{(i)} \\mid \\vec\\beta_{t-1}^{(j)})$ never \nexceeds\n\n$$w_{t-1}^{(j)} \\frac{u - l}{(u + l)/2}$$\nwhere $u$ and $l$ are respectively an upper and lower bound of \n$f(\\vec\\beta_t^{(i)} \\mid \\vec\\beta_{t-1}^{(j)})$. \nThe question is how big the error is. \nThus, we consider the error in the log-likelihood approximation at the \ntrue parameters. \n\n```{r comp_all_vs_aprx, cache = 1, fig.height = .fig_height_small, message = FALSE}\nll_compare \u003c- local({\n  N_use \u003c- 500L \n  # we alter the seed in each run. First, the exact method\n  ll_no_approx \u003c- sapply(1:200, function(seed){\n    ll_func \u003c- mssm(\n      fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat,\n      random = ~ Z, ti = time_idx, control = mssm_control(\n        n_threads = 5L, N_part = N_use, what = \"log_density\", \n        seed = seed, use_antithetic = TRUE))\n    \n    logLik(ll_func$pf_filter(\n      cfix = cfix, disp = numeric(), F. = F., Q = Q))\n  })\n  \n  # then the approximation\n  ll_approx \u003c- sapply(1:200, function(seed){\n    ll_func \u003c- mssm(\n      fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat,\n      random = ~ Z, ti = time_idx, control = mssm_control(\n        n_threads = 5L, N_part = N_use, what = \"log_density\", \n        KD_N_max = 6L, aprx_eps = 1e-2, seed = seed, \n        which_ll_cp = \"KD\", use_antithetic = TRUE))\n    \n    logLik(ll_func$pf_filter(\n      cfix = cfix, disp = numeric(), F. = F., Q = Q))\n  })\n  \n  list(ll_no_approx = ll_no_approx, ll_approx = ll_approx)\n})\n```\n\n```{r show_comp_arell_aprx, fig.height = .fig_height_small}\npar(mar = c(5, 4, 1, 1))\nhist(\n  ll_compare$ll_no_approx, main = \"\", breaks = 20L, \n  xlab = \"Log-likelihood approximation -- no aprox\")\nhist(\n  ll_compare$ll_approx   , main = \"\", breaks = 20L, \n  xlab = \"Log-likelihood approximation -- aprox\")\n```\n\nWe can make a t-test for whether there is a difference between the two \nlog-likelihood estimates\n\n```{r t_test_comp_ll}\nwith(ll_compare, t.test(ll_no_approx, ll_approx))\n```\n\nThe fact that there may only be a small difference if any is nice because \nnow we can get a much better approximation (in terms of variance) quickly \nof e.g., the log-likelihood as shown below.\n\n```{r show_ll_quick, cache = 1, message = FALSE}\nll_approx \u003c- sapply(1:10, function(seed){\n  N_use \u003c- 10000L # many more particles\n  \n  ll_func \u003c- mssm(\n    fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat,\n    random = ~ Z, ti = time_idx, control = mssm_control(\n      n_threads = 5L, N_part = N_use, what = \"log_density\", \n      KD_N_max = 100L, aprx_eps = 1e-2, seed = seed, \n      which_ll_cp = \"KD\", use_antithetic = TRUE))\n  \n  logLik(ll_func$pf_filter(\n    cfix = cfix, disp = numeric(), F. = F., Q = Q))\n}) \n\n# approximate log-likelihood\nsd(ll_approx)\nprint(mean(ll_approx), digits = 6)\n\n# compare sd with \nsd(ll_compare$ll_no_approx)\nprint(mean(ll_compare$ll_no_approx), digits = 6)\n```\n\n### Approximate Observed Information Matrix\n\nNext, we look at approximating the observed information matrix with the method \nsuggested by @Poyiadjis11.\n\n\u003c!--\nlibrary(knitr)\nopts_knit$set(output.dir = \".\")\nopts_chunk$set(cache.path = paste0(\n    file.path(\"README_cache\", \"markdown_github\"), .Platform$file.sep))\nload_cache(\"apprx_obs_info\")\n--\u003e\n\n```{r apprx_obs_info, cache = 1, dependson=c(\"use_sgd\", \"cont_est\")}\nll_func \u003c- mssm(\n  fixed = y ~ X1 + X2 + Z, family = poisson(), data = dat,\n  random = ~ Z, ti = time_idx, control = mssm_control(\n    n_threads = 5L, N_part = 10000L, what = \"Hessian\",\n    which_ll_cp = \"KD\", aprx_eps = .01, use_antithetic = TRUE))\n\nset.seed(46658529)\nsystem.time(\n  mssm_obs_info \u003c- ll_func$pf_filter(\n    cfix = res_final$cfix, disp = numeric(), F. = res_final$F., \n    Q = res_final$Q))\n```\n\nWe define a function below to get the approximate gradient and approximate \nobserved information matrix from the returned object. Then we compare \nthe output to the GLM we estimated and to the true parameters.\n\n```{r show_apprx_obs_info}\n# Function to subtract the approximate gradient elements and approximate \n# observed information matrix. \n# \n# Args:\n#   object: an object of class mssm.\n#\n# Returns:\n#   list with the approximate gradient elements and approximate observed \n# information matrix. \nget_grad_n_obs_info \u003c- function(object){\n  stopifnot(inherits(object, \"mssm\"))\n\n  # get dimension of the components\n  n_rng   \u003c- nrow(object$Z)\n  dim_fix \u003c- nrow(object$X)\n  dim_rng \u003c- n_rng * n_rng + ((n_rng + 1L) * n_rng) / 2L\n  has_dispersion \u003c-\n    any(sapply(c(\"^Gamma\", \"^gaussian\"), grepl, x = object$family))\n  \n  # get quantities for each particle\n  quants \u003c-     tail(object$pf_output, 1L)[[1L]]$stats\n  ws     \u003c- exp(tail(object$pf_output, 1L)[[1L]]$ws_normalized)\n\n  # get mean estimates\n  meas \u003c- colSums(t(quants) * drop(ws))\n\n  # separate out the different components. Start with the gradient\n  idx \u003c- dim_fix + dim_rng + has_dispersion\n  grad \u003c- meas[1:idx]\n  dim_names \u003c- names(grad)\n\n  # then the observed information matrix\n  I1 \u003c- matrix(\n    meas[-(1:idx)], dim_fix + dim_rng + has_dispersion, \n    dimnames = list(dim_names, dim_names))\n  \n  I2 \u003c- matrix(0., nrow(I1), ncol(I1))\n  for(i in seq_along(ws))\n    I2 \u003c- I2 + ws[i] * tcrossprod(quants[1:idx, i])\n\n  list(grad = grad, obs_info = tcrossprod(grad) - I1 - I2)\n}\n\n# use function\nout \u003c- get_grad_n_obs_info(mssm_obs_info)\n\n# approximate gradient\nout$grad\n\n# approximate standard errors\n(ses \u003c- sqrt(diag(solve(out$obs_info))))\n\n# look at output for parameters in the observational equation. First, compare \n# with glm's standard errors\nsqrt(diag(vcov(glm_fit)))\n\n# and relative to true parameters vs. estimated\nrbind(\n  true             = cfix, \n  glm              = coef(glm_fit), \n  mssm             = res_final$cfix, \n  `standard error` = ses[1:4])\n\n# next look at parameters in state equation. First four are for F.\nrbind(\n  true             = c(F.), \n  mssm             = c(res_final$F.), \n  `standard error` = ses[5:8])\n\n# next three are w.r.t. the lower diagonal part of Q\nrbind(\n  true             =           Q[lower.tri(Q, diag = TRUE)], \n  mssm             = res_final$Q[lower.tri(Q, diag = TRUE)], \n  `standard error` = ses[9:11])\n```\n\n## Supported Families\nThe following families are supported:\n\n - The binomial distribution is supported with logit, probit, and cloglog link. \n - The Poisson distribution is supported with square root and log link. \n - The gamma distribution is supported with log link.\n - The normal distribution with identity link (to compare with e.g., a Kalman \n filter), log link, and the inverse link function.\n\n## Fast Sum-Kernel Approximation\n\nThis package contains a simple implementation of the dual-tree method like the one \nsuggested by @Gray03 and shown in @Klaas06. The problem we want to solve is\nthe sum-kernel problem in @Klaas06. Particularly, we consider the situation \nwhere we have $1,\\dots,N_q$ query particles denoted by \n$\\{\\vec Y_i\\}_{i=1,\\dots,N_q}$ and $1,\\dots,N_s$ source particles denoted by\n$\\{\\vec X_j\\}_{j=1,\\dots,N_s}$. For each query particle, we want to compute \nthe weights \n\n$$W_i = \\frac{\\tilde W_i}{\\sum_{k = 1}^{N_q} \\tilde W_i},\\qquad \\tilde W_i = \\sum_{j=1}^{N_s} \\bar W_j K(\\vec Y_i, \\vec X_j)$$\n\nwhere each source particle has an associated weight $\\bar W_j$ and $K$ is a\nkernel. Computing the above is $\\mathcal{O}(N_sN_q)$ which is major \nbottleneck if $N_s$ and $N_q$ is large. However, one can use a \n[k-d tree](https://en.wikipedia.org/wiki/K-d_tree) for the query particles\nand source particles and exploit that:\n\n- $W_j K(\\vec Y_i, \\vec X_j)$ is almost zero for some pairs of nodes in the \ntwo k-d trees. \n- $K(\\cdot, \\vec X_j)$ is almost identical for some nodes in the k-d tree \nfor the source particles. \n\nThus, a substantial amount of computation can be skipped or approximated \nwith e.g., the centroid in the source node with only a minor\nloss of precision. The dual-tree approximation method is \n$\\mathcal{O}(N_s\\log N_s)$ and $\\mathcal{O}(N_q\\log N_q)$.\nWe start by defining a function to simulate the source \nand query particles (we will let the two sets be identical for simplicity). \nFurther, we plot one draw of simulated points and illustrate the leafs in \nthe k-d tree.\n\n```{r sim_func}\n######\n# define function to simulate data\nmus \u003c- matrix(c(-1, -1, \n                 1, 1, \n                -1, 1), 3L, byrow = FALSE)\nmus \u003c- mus * .75\nSig \u003c- diag(c(.5^2, .25^2))\n\nget_sims \u003c- function(n_per_grp){\n  # simulate X\n  sims \u003c- lapply(1:nrow(mus), function(i){\n    mu \u003c- mus[i, ]\n    X \u003c- matrix(rnorm(n_per_grp * 2L), nrow = 2L)\n    X \u003c- t(crossprod(chol(Sig), X) + mu)\n    \n    data.frame(X, grp = i)\n  })\n  sims \u003c- do.call(rbind, sims)\n  X \u003c- t(as.matrix(sims[, c(\"X1\", \"X2\")]))\n  \n  # simulate weights\n  ws \u003c- exp(rnorm(ncol(X)))\n  ws \u003c- ws / sum(ws)\n  \n  list(sims = sims, X = X, ws = ws)\n}\n\n#####\n# show example \nset.seed(42452654)\ninvisible(list2env(get_sims(5000L), environment()))\n\n# plot points\npar(mar = c(5, 4, 1, 1))\nplot(as.matrix(sims[, c(\"X1\", \"X2\")]), col = sims$grp + 1L)\n\n# find k-d tree and add borders \nout \u003c- mssm:::test_KD_note(X, 50L)\nout$indices \u003c- out$indices + 1L\nn_ele \u003c- drop(out$n_elems)\nidx \u003c- mapply(`:`, cumsum(c(1L, head(n_ele, -1))), cumsum(n_ele))\nstopifnot(all(sapply(idx, length) == n_ele))\nidx \u003c- lapply(idx, function(i) out$indices[i])\nstopifnot(!anyDuplicated(unlist(idx)), length(unlist(idx)) == ncol(X))\n\ngrps \u003c- lapply(idx, function(i) X[, i])\n\nborders \u003c- lapply(grps, function(x) apply(x, 1, range))\ninvisible(lapply(borders, function(b) \n  rect(b[1, \"X1\"], b[1, \"X2\"], b[2, \"X1\"], b[2, \"X2\"])))\n```\n\nNext, we compute the run-times for the previous examples and compare the \napproximations of the un-normalized log weights, $\\log \\tilde W_i$, and \nnormalized weights, $W_i$. The `n_threads` sets the number of threads to \nuse in the methods.\n\n```{r comp_run_times, cache = 1}\n# run-times\nmicrobenchmark::microbenchmark(\n  `dual tree 1` = mssm:::FSKA (X = X, ws = ws, Y = X, N_min = 10L, \n                               eps = 5e-3, n_threads = 1L),\n  `dual tree 4` = mssm:::FSKA (X = X, ws = ws, Y = X, N_min = 10L, \n                               eps = 5e-3, n_threads = 4L),\n  `naive 1`     = mssm:::naive(X = X, ws = ws, Y = X, n_threads = 1L),\n  `naive 4`     = mssm:::naive(X = X, ws = ws, Y = X, n_threads = 4L),\n  times = 10L)\n\n# The functions return the un-normalized log weights. We first compare\n# the result on this scale\no1 \u003c- mssm:::FSKA  (X = X, ws = ws, Y = X, N_min = 10L, eps = 5e-3, \n                    n_threads = 1L)\no2 \u003c- mssm:::naive(X = X, ws = ws, Y = X, n_threads = 4L)\n\nall.equal(o1, o2)\npar(mar = c(5, 4, 1, 1))\nhist((o1 - o2)/ abs((o1 + o2) / 2), breaks = 50, main = \"\", \n     xlab = \"Delta un-normalized log weights\")\n\n# then we compare the normalized weights\nfunc \u003c- function(x){\n  x_max \u003c- max(x)\n  x \u003c- exp(x - x_max)\n  x / sum(x)\n}\n\no1 \u003c- func(o1)\no2 \u003c- func(o2)\nall.equal(o1, o2)\nhist((o1 - o2)/ abs((o1 + o2) / 2), breaks = 50, main = \"\", \n     xlab = \"Delta normalized log weights\")\n```\n\nFinally, we compare the run-times as function of $N = N_s = N_q$. The dashed \nline is \"naive\" method, the continuous line is the dual-tree method, and the \ndotted line is dual-tree method using 1 thread.\n\n```{r run_times_N, cache = 1}\nNs \u003c- 2^(7:14)\nrun_times \u003c- lapply(Ns, function(N){ \n  invisible(list2env(get_sims(N), environment()))\n  microbenchmark::microbenchmark(\n    `dual-tree`   = mssm:::FSKA (X = X, ws = ws, Y = X, N_min = 10L, eps = 5e-3, \n                                 n_threads = 4L),\n    naive         = mssm:::naive(X = X, ws = ws, Y = X, n_threads = 4L),\n    `dual-tree 1` = mssm:::FSKA (X = X, ws = ws, Y = X, N_min = 10L, eps = 5e-3, \n                                 n_threads = 1L), \n    times = 5L)\n}) \n\nNs_xtra \u003c- 2^(15:19) \nrun_times_xtra \u003c- lapply(Ns_xtra, function(N){\n  invisible(list2env(get_sims(N), environment()))\n  microbenchmark::microbenchmark(\n    `dual-tree` = mssm:::FSKA (X = X, ws = ws, Y = X, N_min = 10L, eps = 5e-3, \n                               n_threads = 4L),\n    times = 5L)\n}) \n```\n \n```{r plot_run_times_N}\nlibrary(microbenchmark)\nmeds \u003c- t(sapply(run_times, function(x) summary(x, unit = \"s\")[, \"median\"]))\nmeds_xtra \u003c- \n  sapply(run_times_xtra, function(x) summary(x, unit = \"s\")[, \"median\"])\nmeds \u003c- rbind(meds, cbind(meds_xtra, NA_real_, NA_real_))\ndimnames(meds) \u003c- list(\n  N = c(Ns, Ns_xtra) * 3L, method = c(\"Dual-tree\", \"Naive\", \"Dual-tree 1\"))\nmeds\npar(mar = c(5, 4, 1, 1))\nmatplot(c(Ns, Ns_xtra) * 3L, meds, lty = 1:3, type = \"l\", log = \"xy\", \n        ylab = \"seconds\", xlab = \"N\", col = \"black\")\n```\n\n## Function Definitions\n\n```{r mc_grad_est, eval = FALSE}\n```\n\n## References\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboennecd%2Fmssm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboennecd%2Fmssm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboennecd%2Fmssm/lists"}