{"id":13857378,"url":"https://github.com/MikeJaredS/hermiter","last_synced_at":"2025-07-13T21:32:26.700Z","repository":{"id":46027606,"uuid":"286930814","full_name":"MikeJaredS/hermiter","owner":"MikeJaredS","description":"Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)","archived":false,"fork":false,"pushed_at":"2024-08-31T10:10:58.000Z","size":9253,"stargazers_count":15,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-13T22:04:39.629Z","etag":null,"topics":["cumulative-distribution-function","kendall-correlation-coefficient","online-algorithms","probability-density-function","quantile","spearman-correlation-coefficient","statistics","streaming-algorithms","streaming-data"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MikeJaredS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-08-12T06:05:53.000Z","updated_at":"2024-08-31T10:11:01.000Z","dependencies_parsed_at":"2023-01-23T11:01:02.977Z","dependency_job_id":"363c9423-4bcf-463a-911b-a79eaf3045c6","html_url":"https://github.com/MikeJaredS/hermiter","commit_stats":{"total_commits":111,"total_committers":2,"mean_commits":55.5,"dds":0.036036036036036,"last_synced_commit":"be400e363ede93795b1c41f615bf9c10e403c1b7"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeJaredS%2Fhermiter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeJaredS%2Fhermiter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeJaredS%2Fhermiter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MikeJaredS%2Fhermiter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MikeJaredS","download_url":"https://codeload.github.com/MikeJaredS/hermiter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225920299,"owners_count":17545465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cumulative-distribution-function","kendall-correlation-coefficient","online-algorithms","probability-density-function","quantile","spearman-correlation-coefficient","statistics","streaming-algorithms","streaming-data"],"created_at":"2024-08-05T03:01:34.987Z","updated_at":"2024-11-22T15:30:37.137Z","avatar_url":"https://github.com/MikeJaredS.png","language":"R","readme":"# hermiter\n\n\u003c!-- badges: start --\u003e\n[![codecov](https://codecov.io/gh/MikeJaredS/hermiter/branch/master/graph/badge.svg)](https://app.codecov.io/gh/MikeJaredS/hermiter)\n[![CRANstatus](https://www.r-pkg.org/badges/version/hermiter)](https://cran.r-project.org/package=hermiter)\n![](https://cranlogs.r-pkg.org/badges/grand-total/hermiter?color=green)\n\u003c!-- badges: end --\u003e\n\n\n## What does hermiter do?\n\n`hermiter` is an R package that facilitates the estimation of the probability \ndensity function and cumulative distribution function in univariate and \nbivariate settings using Hermite series based estimators. In addition, \n`hermiter` allows the estimation of the quantile function in the univariate case\nand nonparametric correlation coefficients in the bivariate case. The package is\napplicable to streaming, batch and grouped data. The core methods of the package\nare written in C++ for speed.\n\nThese estimators are particularly useful in the sequential setting (both \nstationary and non-stationary data streams). In addition, they are useful in \nefficient, one-pass batch estimation which is particularly relevant in the \ncontext of large data sets. Finally, the Hermite series based estimators are \napplicable in decentralized (distributed) settings in that estimators formed on \nsubsets of the data can be consistently merged. The Hermite series based \nestimators have the distinct advantage of being able to estimate the full \ndensity function, distribution function and quantile function (univariate \nsetting) along with the Spearman Rho and Kendall Tau correlation coefficients\n(bivariate setting) in an online manner. The theoretical and empirical \nproperties of most of these estimators have been studied in-depth in the \narticles below. The investigations demonstrate that the Hermite series based \nestimators are particularly effective in distribution function, quantile \nfunction and Spearman correlation estimation.\n\n* [Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. \"Sequential quantiles via Hermite series density estimation.\" Electronic Journal of Statistics 11.1 (2017): 570-607.](https://projecteuclid.org/euclid.ejs/1488531636) \n* [Stephanou, Michael and Varughese, Melvin. \"On the properties of hermite series based distribution function estimators.\" Metrika (2020).](https://link.springer.com/article/10.1007/s00184-020-00785-z)\n* [Stephanou, Michael and Varughese, Melvin. \"Sequential estimation of Spearman rank correlation using Hermite series estimators.\" Journal of Multivariate Analysis (2021)](https://www.sciencedirect.com/science/article/pii/S0047259X21000610)\n\nA summary of the estimators and algorithms in `hermiter` can be found in the \narticle below.\n\n* [Stephanou, Michael and Varughese, Melvin. \"hermiter: R package for Sequential Nonparametric Estimation.\" Computational Statistics (2023)](https://doi.org/10.1007/s00180-023-01382-0)\n\n## Features\n\n### Univariate\n\n* fast batch estimation of pdf, cdf and quantile function\n* consistent merging of estimates\n* fast sequential estimation of pdf, cdf and quantile function on streaming data\n* adaptive sequential estimation on non-stationary streams via exponential \nweighting\n* provides online, O(1) time complexity estimates of arbitrary quantiles e.g. \nmedian at any point in time along with probability densities and cumulative \nprobabilities at arbitrary x\n* uses small and constant memory for the estimator\n* provides a very compact, simultaneous representation of the pdf, cdf and \nquantile function that can be efficiently stored and communicated using e.g. \nsaveRDS and readRDS functions\n\n### Bivariate\n\n* fast batch estimation of bivariate pdf, cdf and nonparametric correlation \ncoefficients (Spearman Rho and Kendall Tau)\n* consistent merging of estimates\n* fast sequential estimation of bivariate pdf, cdf and nonparametric correlation \ncoefficients on streaming data\n* adaptive sequential estimation on non-stationary bivariate streams via \nexponential weighting\n* provides online, O(1) time complexity estimates of bivariate probability \ndensities and cumulative probabilities at arbitrary points, x\n* provides online, O(1) time complexity estimates of the Spearman and Kendall \nrank correlation coefficients\n* uses small and constant memory for the estimator\n\n## Installation\n\nThe release version of `hermiter` can be installed from CRAN with:\n\n```r\ninstall.packages(\"hermiter\")\n```\n\nThe development version of `hermiter` can be installed using `devtools` with:\n\n```r\ndevtools::install_github(\"MikeJaredS/hermiter\")\n```\n\n## Load Package\n\nIn order to utilize the hermiter package, the package must be loaded using the \nfollowing command:\n\n```{r}\nlibrary(hermiter)\n```\n\n## Construct Estimator\n\nA hermite_estimator S3 object is constructed as below. The argument, N, adjusts \nthe number of terms in the Hermite series based estimator and controls the \ntrade-off between bias and variance. A lower N value implies a higher bias but \nlower variance and vice versa for higher values of N. The argument, standardize,\ncontrols whether or not to standardize observations before applying the \nestimator. Standardization usually yields better results and is recommended \nfor most estimation settings. \n\nA univariate estimator is constructed as follows (note that the default \nestimator type is univariate, so this argument does not need to be explicitly \nset):\n\n```{r}\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"univariate\")\n```\n\nSimilarly for constructing a bivariate estimator:\n\n```{r}\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\")\n```\n\n## Batch Estimator Updating\n\nA hermite_estimator object can be initialized with a batch of observations as \nbelow.\n\nFor univariate observations:\n\n```{r}\nobservations \u003c- rlogis(n=1000)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, observations = \n                                   observations)\n```\n\nFor bivariate observations:\n\n```{r}\nobservations \u003c- matrix(data = rnorm(2000),nrow = 1000, ncol=2)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\", observations = \n                                   observations)\n```\n\n## Sequential Estimator Updating\n\nIn the sequential setting, observations are revealed one at a time. A \nhermite_estimator object can be updated sequentially with a single new \nobservation by utilizing the update_sequential method. Note that when updating \nthe Hermite series based estimator sequentially, observations are also \nstandardized sequentially if the standardize argument is set to true in the \nconstructor.\n\n### Standard syntax\n\nFor univariate observations:\n\n```{r}\nobservations \u003c- rlogis(n=1000)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE)\nfor (idx in seq_along(observations)) {\n  hermite_est \u003c- update_sequential(hermite_est,observations[idx])\n}\n```\n\nFor bivariate observations:\n\n```{r}\nobservations \u003c- matrix(data = rnorm(2000),nrow = 1000, ncol=2)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\")\nfor (idx in seq_len(nrow(observations))) {\n  hermite_est \u003c- update_sequential(hermite_est,observations[idx,])\n}\n```\n\n### Piped syntax\n\nFor univariate observations:\n\n```{r}\nobservations \u003c- rlogis(n=1000)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE)\nfor (idx in seq_along(observations)) {\n  hermite_est \u003c- hermite_est %\u003e% update_sequential(observations[idx])\n}\n```\n\nFor bivariate observations:\n\n```{r}\nobservations \u003c- matrix(data = rnorm(2000),nrow = 1000, ncol=2)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\")\nfor (idx in seq_len(nrow(observations))) {\n  hermite_est \u003c- hermite_est %\u003e% update_sequential(observations[idx,])\n}\n```\n\n## Merging Hermite Estimators\n\nHermite series based estimators can be consistently combined/merged in both\nthe univariate and bivariate settings. In particular, when standardize = FALSE,\nthe results obtained from combining/merging distinct hermite_estimators updated \non subsets of a data set are exactly equal to those obtained by constructing a \nsingle hermite_estimator and updating on the full data set (corresponding to the \nconcatenation of the aforementioned subsets). This holds true for the pdf, cdf \nand quantile results in the univariate case and the pdf, cdf\nand nonparametric correlation results in the bivariate case. When standardize = \nTRUE, the equivalence is no longer exact, but is accurate enough to be \npractically useful. Combining/merging hermite_estimators is illustrated below.\n\nFor the univariate case:\n\n```{r}\nobservations_1 \u003c- rlogis(n=1000)\nobservations_2 \u003c- rlogis(n=1000)\nhermite_est_1 \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                   observations = observations_1)\nhermite_est_2 \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                   observations = observations_2)\nhermite_est_merged \u003c- merge_hermite(list(hermite_est_1,hermite_est_2))\n```\n\nFor the bivariate case:\n\n```{r}\nobservations_1 \u003c- matrix(data = rnorm(2000),nrow = 1000, ncol=2)\nobservations_2 \u003c- matrix(data = rnorm(2000),nrow = 1000, ncol=2)\nhermite_est_1 \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\", \n                                 observations = observations_1)\nhermite_est_2 \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 est_type = \"bivariate\", \n                                 observations = observations_2)\nhermite_est_merged \u003c- merge_hermite(list(hermite_est_1,hermite_est_2))\n```\n\nThe ability to combine/merge estimators is particularly useful in applications\ninvolving grouped data (see package vignette).\n\n## Estimate univariate pdf, cdf and quantile function\n\nThe central advantage of Hermite series based estimators is that they can be \nupdated in a sequential/one-pass manner as above and subsequently probability \ndensities and cumulative probabilities at arbitrary x values can be obtained, \nalong with arbitrary quantiles. The hermite_estimator object only maintains a \nsmall and fixed number of coefficients and thus uses minimal memory. The syntax \nto calculate probability densities, cumulative probabilities and quantiles in \nthe univariate setting is presented below.\n\n### Standard syntax\n\n```{r}\nobservations \u003c- rlogis(n=2000)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 observations = observations)\n\nx \u003c- seq(-15,15,0.1)\npdf_est \u003c- dens(hermite_est,x)\ncdf_est \u003c- cum_prob(hermite_est,x)\n\np \u003c- seq(0.05,1,0.05)\nquantile_est \u003c- quant(hermite_est,p)\n```\n\n### Piped syntax\n\n```{r}\nobservations \u003c- rlogis(n=2000)\nhermite_est \u003c- hermite_estimator(N=10, standardize=TRUE, \n                                 observations = observations)\n\nx \u003c- seq(-15,15,0.1)\npdf_est \u003c- hermite_est %\u003e% dens(x)\ncdf_est \u003c- hermite_est %\u003e% cum_prob(x)\n\np \u003c- seq(0.05,0.95,0.05)\nquantile_est \u003c- hermite_est %\u003e% quant(p)\n```\n\n```{r}\nactual_pdf \u003c- dlogis(x)\nactual_cdf \u003c- plogis(x)\ndf_pdf_cdf \u003c- data.frame(x,pdf_est,cdf_est,actual_pdf,actual_cdf)\n\nactual_quantiles \u003c- qlogis(p)\ndf_quant \u003c- data.frame(p,quantile_est,actual_quantiles)\n```\n\n### Comparing Estimated versus Actual\n\n```{r}\nggplot(df_pdf_cdf,aes(x=x)) + geom_line(aes(y=pdf_est, colour=\"Estimated\")) +\n  geom_line(aes(y=actual_pdf, colour=\"Actual\")) +\n  scale_colour_manual(\"\", \n                      breaks = c(\"Estimated\", \"Actual\"),\n                      values = c(\"blue\", \"black\")) + ylab(\"Probability Density\")\n```\n![](./vignettes/pdf_static.png)\n\n```{r}\nggplot(df_pdf_cdf,aes(x=x)) + geom_line(aes(y=cdf_est, colour=\"Estimated\")) +\n  geom_line(aes(y=actual_cdf, colour=\"Actual\")) +\n  scale_colour_manual(\"\", \n                      breaks = c(\"Estimated\", \"Actual\"),\n                      values = c(\"blue\", \"black\")) +\n  ylab(\"Cumulative Probability\")\n```\n![](./vignettes/cdf_static.png)\n\n```{r}\nggplot(df_quant,aes(x=actual_quantiles)) + geom_point(aes(y=quantile_est),\n                                                      color=\"blue\") +\n  geom_abline(slope=1,intercept = 0) +xlab(\"Theoretical Quantiles\") +\n  ylab(\"Estimated Quantiles\")\n```\n![](./vignettes/quantile_static.png)\n\n### Convenience functions\n\nNote that there are also generic methods facilitating summarizing and plotting\nunivariate densities and distribution functions as illustrated below.\n\n```{r}\nh_dens \u003c- density(hermite_est)\nprint(h_dens)\nplot(h_dens)\n```\n\n![](./vignettes/pdf_convenience_fnc.png)\n\n```{r}\nh_cdf \u003c- hcdf(hermite_est)\nprint(h_cdf)\nplot(h_cdf)\nsummary(h_cdf)\n```\n\n![](./vignettes/cdf_convenience_fnc.png)\n\nFinally there are the following convenience functions providing familiar syntax\nto the ordinary R functions.\n\n```{r}\nquantile(hermite_est)\n\nmedian(hermite_est)\n\nIQR(hermite_est)\n```\n\n\n## Estimate bivariate pdf, cdf and nonparametric correlation\n\nThe aforementioned suitability of Hermite series based estimators in sequential \nand one-pass batch estimation settings extends to the bivariate case. \nProbability densities and cumulative probabilities can be obtained at arbitrary \npoints. The syntax to calculate probability densities and  \ncumulative probabilities along with the Spearman and Kendall correlation \ncoefficients in the bivariate setting is presented below.\n\n### Standard syntax\n\n```{r}\n# Prepare bivariate normal data\nsig_x \u003c- 1\nsig_y \u003c- 1\nnum_obs \u003c- 4000\nrho \u003c- 0.5\nobservations_mat \u003c- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2),\n          sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), \n          nrow=2,ncol=2, byrow = TRUE))\n\nhermite_est \u003c- hermite_estimator(N = 30, standardize = TRUE, \n                                 est_type = \"bivariate\", \n                                 observations = observations_mat) \nvals \u003c- seq(-5,5,by=0.25)\nx_grid \u003c- as.matrix(expand.grid(X=vals, Y=vals))\npdf_est \u003c- dens(hermite_est,x_grid)\ncdf_est \u003c- cum_prob(hermite_est,x_grid)\nspear_est \u003c- spearmans(hermite_est)\nkendall_est \u003c- kendall(hermite_est)\n```\n\n### Piped syntax\n\n```{r}\nsig_x \u003c- 1\nsig_y \u003c- 1\nnum_obs \u003c- 4000\nrho \u003c- 0.5\nobservations_mat \u003c- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2),\n        sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), \n          nrow=2, ncol=2, byrow = TRUE))\n\nhermite_est \u003c- hermite_estimator(N = 30, standardize = TRUE, \n                                 est_type = \"bivariate\", \n                                 observations = observations_mat) \n\nvals \u003c- seq(-5,5,by=0.25)\nx_grid \u003c- as.matrix(expand.grid(X=vals, Y=vals))\npdf_est \u003c- hermite_est %\u003e% dens(x_grid, clipped = TRUE)\ncdf_est \u003c- hermite_est %\u003e% cum_prob(x_grid, clipped = TRUE)\nspear_est \u003c- hermite_est %\u003e% spearmans()\nkendall_est \u003c- hermite_est %\u003e% kendall()\n```\n\n```{r}\nactual_pdf \u003c-mvtnorm::dmvnorm(x_grid,mean=rep(0,2),\n            sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), \n                           nrow=2,ncol=2, byrow = TRUE))\nactual_cdf \u003c- rep(NA,nrow(x_grid))\nfor (row_idx in seq_len(nrow(x_grid))) {\n  actual_cdf[row_idx] \u003c-  mvtnorm::pmvnorm(lower = c(-Inf,-Inf),\n    upper=as.numeric(x_grid[row_idx,]),mean=rep(0,2),sigma = matrix(c(sig_x^2, \n        rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), nrow=2,ncol=2,byrow = TRUE))\n}\nactual_spearmans \u003c- cor(observations_mat,method = \"spearman\")[1,2]\nactual_kendall \u003c- cor(observations_mat,method = \"kendall\")[1,2]\ndf_pdf_cdf \u003c- data.frame(x_grid,pdf_est,cdf_est,actual_pdf,actual_cdf)\n```\n\n### Comparing Estimated versus Actual\n\n```{r}\np1 \u003c- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= actual_pdf)) +\n  scale_fill_continuous_sequential(palette=\"Oslo\",\n                                   breaks=seq(0,.2,by=.05),\n                                   limits=c(0,.2))\n\np2 \u003c- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= pdf_est)) +\n  scale_fill_continuous_sequential(palette=\"Oslo\",\n                                   breaks=seq(0,.2,by=.05),\n                                   limits=c(0,.2))\n\np1+ ggtitle(\"Actual PDF\")+ theme(legend.title = element_blank()) + p2 +\n  ggtitle(\"Estimated PDF\") +theme(legend.title = element_blank()) +\n  plot_layout(guides = 'collect')\n```\n\n![](./vignettes/pdf_bivar_static.png)\n\n```{r}\np1 \u003c- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= actual_cdf)) +\n  scale_fill_continuous_sequential(palette=\"Oslo\",\n                       breaks=seq(0,1,by=.2),\n                       limits=c(0,1))\n\np2 \u003c- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= cdf_est)) +\n  scale_fill_continuous_sequential(palette=\"Oslo\",\n                                   breaks=seq(0,1,by=.2),\n                                   limits=c(0,1))\n\np1+ ggtitle(\"Actual CDF\") + theme(legend.title = element_blank()) + p2 +\n  ggtitle(\"Estimated CDF\") + theme(legend.title = element_blank())+\n  plot_layout(guides = 'collect')\n```\n\n![](./vignettes/cdf_bivar_static.png)\n\n\nSpearman's correlation coefficient results:\n\n|             | Spearman's Correlation |\n| ----------- | ----------- |\n| Actual      | 0.453       |\n| Estimated   | 0.447        |\n\nKendall correlation coefficient results:\n\n|             | Kendall Correlation |\n| ----------- | ----------- |\n| Actual      | 0.312       |\n| Estimated   | 0.308        |\n\n## Applying to stationary data (sequential setting)\n\n### Univariate Example\n\nAnother useful application of the hermite_estimator class is to obtain pdf, cdf \nand quantile function estimates on streaming data. The speed of estimation \nallows the pdf, cdf and quantile functions to be estimated in real time. We \nillustrate this below for cdf and quantile estimation with a sample Shiny \napplication. We reiterate that the particular usefulness is that the full pdf, \ncdf and quantile functions are updated in real time. Thus, any arbitrary \nquantile can be evaluated at any point in time. We include a stub for reading \nstreaming data that generates micro-batches of standard exponential i.i.d. \nrandom data. This stub can easily be swapped out for a method reading \nmicro-batches from a Kafka topic or similar.\n\nThe Shiny sample code below can be pasted into a single app.R file and run \ndirectly.\n\n```{r eval=FALSE}\n# Not Run. Copy and paste into app.R and run.\nlibrary(shiny)\nlibrary(hermiter)\nlibrary(ggplot2)\nlibrary(magrittr)\n\nui \u003c- fluidPage(\n    titlePanel(\"Streaming Statistics Analysis Example: Exponential \n               i.i.d. stream\"),\n    sidebarLayout(\n        sidebarPanel(\n            sliderInput(\"percentile\", \"Percentile:\",\n                        min = 0.01, max = 0.99,\n                        value = 0.5, step = 0.01)\n        ),\n        mainPanel(\n           plotOutput(\"plot\"),\n           textOutput(\"quantile_text\")\n        )\n    )\n)\n\nserver \u003c- function(input, output) {\n    values \u003c- reactiveValues(hermite_est = \n                                 hermite_estimator(N = 10, standardize = TRUE))\n    x \u003c- seq(-15, 15, 0.1)\n    # Note that the stub below could be replaced with code that reads streaming \n    # data from various sources, Kafka etc.  \n    read_stream_stub_micro_batch \u003c- reactive({\n        invalidateLater(1000)\n        new_observation \u003c- rexp(10)\n        return(new_observation)\n    })\n    updated_cdf_calc \u003c- reactive({\n        micro_batch \u003c- read_stream_stub_micro_batch()\n        for (idx in seq_along(micro_batch)) {\n            values[[\"hermite_est\"]] \u003c- isolate(values[[\"hermite_est\"]]) %\u003e%\n                update_sequential(micro_batch[idx])\n        }\n        cdf_est \u003c- isolate(values[[\"hermite_est\"]]) %\u003e%\n            cum_prob(x, clipped = TRUE)\n        df_cdf \u003c- data.frame(x, cdf_est)\n        return(df_cdf)\n    })\n    updated_quantile_calc \u003c- reactive({\n        values[[\"hermite_est\"]]  %\u003e% quant(input$percentile)\n    })\n    output$plot \u003c- renderPlot({\n        ggplot(updated_cdf_calc(), aes(x = x)) + geom_line(aes(y = cdf_est)) +\n            ylab(\"Cumulative Probability\")\n    }\n    )\n    output$quantile_text \u003c- renderText({ \n        return(paste(input$percentile * 100, \"th Percentile:\", \n                     round(updated_quantile_calc(), 2)))\n    })\n}\nshinyApp(ui = ui, server = server)\n```\n\n![](./vignettes/shiny_stream_example.gif)\n\n## Applying to non-stationary data (sequential setting)\n\n### Univariate Example\n\nThe hermite_estimator is also applicable to non-stationary data streams.\nA weighted form of the Hermite series based estimator can be applied to handle \nthis case. The estimator will adapt to the new distribution and \n\"forget\" the old distribution as illustrated in the example below. In this \nunivariate example, the  distribution from which the observations are drawn \nswitches from a Chi-square distribution to a logistic distribution and finally \nto a normal distribution. In order to use the exponentially weighted form of the \nhermite_estimator, the exp_weight_lambda argument must be set to a non-NA value.\nTypical values for this parameter are 0.01, 0.05 and 0.1. The lower the \nexponential weighting parameter, the slower the estimator adapts and vice versa \nfor higher values of the parameter. However, variance increases with higher \nvalues of exp_weight_lambda, so there is a trade-off to bear in mind.\n\n```{r}\n# Prepare Test Data\nnum_obs \u003c-2000\ntest \u003c- rchisq(num_obs,5)\ntest \u003c- c(test,rlogis(num_obs))\ntest \u003c- c(test,rnorm(num_obs))\n```\n\n```{r}\n# Calculate theoretical pdf, cdf and quantile values for comparison\nx \u003c- seq(-15,15,by=0.1)\nactual_pdf_lognorm \u003c- dchisq(x,5)\nactual_pdf_logis \u003c- dlogis(x)\nactual_pdf_norm \u003c- dnorm(x)\nactual_cdf_lognorm \u003c- pchisq(x,5)\nactual_cdf_logis \u003c- plogis(x)\nactual_cdf_norm \u003c- pnorm(x)\np \u003c- seq(0.05,0.95,by=0.05)\nactual_quantiles_lognorm \u003c- qchisq(p,5)\nactual_quantiles_logis \u003c- qlogis(p)\nactual_quantiles_norm \u003c- qnorm(p)\n```\n\n```{r}\n# Construct Hermite Estimator \nh_est \u003c- hermite_estimator(N=20,standardize = TRUE,exp_weight_lambda = 0.005)\n```\n\n```{r}\n# Loop through test data and update h_est to simulate observations arriving \n# sequentially\ncount \u003c- 1\nres \u003c- data.frame()\nres_q \u003c- data.frame()\nfor (idx in seq_along(test)) {\n  h_est \u003c- h_est %\u003e% update_sequential(test[idx])\n  if (idx %% 100 == 0){\n    if (floor(idx/num_obs)==0){\n      actual_cdf_vals \u003c- actual_cdf_lognorm\n      actual_pdf_vals \u003c-actual_pdf_lognorm\n      actual_quantile_vals \u003c- actual_quantiles_lognorm\n    }\n    if (floor(idx/num_obs)==1){\n      actual_cdf_vals \u003c- actual_cdf_logis\n      actual_pdf_vals \u003c-actual_pdf_logis\n      actual_quantile_vals \u003c- actual_quantiles_logis\n    }\n    if (floor(idx/num_obs)==2){\n      actual_cdf_vals \u003c- actual_cdf_norm\n      actual_pdf_vals \u003c- actual_pdf_norm\n      actual_quantile_vals \u003c- actual_quantiles_norm\n    }\n    idx_vals \u003c- rep(count,length(x))\n    cdf_est_vals \u003c- h_est %\u003e% cum_prob(x, clipped=TRUE)\n    pdf_est_vals \u003c- h_est %\u003e% dens(x, clipped=TRUE)\n    quantile_est_vals \u003c- h_est %\u003e% quant(p)\n    res \u003c- rbind(res,data.frame(idx_vals,x,cdf_est_vals,actual_cdf_vals,\n                                pdf_est_vals,actual_pdf_vals))\n    res_q \u003c- rbind(res_q,data.frame(idx_vals=rep(count,length(p)),p,\n                                    quantile_est_vals,actual_quantile_vals))\n    count \u003c- count +1\n  }\n}\nres \u003c- res %\u003e% mutate(idx_vals=idx_vals*100)\nres_q \u003c- res_q %\u003e% mutate(idx_vals=idx_vals*100)\n```\n\n```{r eval=FALSE}\n# Visualize Results for PDF (Not run, requires gganimate, gifski and transformr\n# packages)\np \u003c- ggplot(res,aes(x=x)) + geom_line(aes(y=pdf_est_vals, colour=\"Estimated\")) +\ngeom_line(aes(y=actual_pdf_vals, colour=\"Actual\")) +\n  scale_colour_manual(\"\", \n                      breaks = c(\"Estimated\", \"Actual\"),\n                      values = c(\"blue\", \"black\")) + \n            ylab(\"Probability Density\") +\n            transition_states(idx_vals,transition_length = 2,state_length = 1) +\n  ggtitle('Observation index {closest_state}')\nanim_save(\"pdf.gif\",p)\n```\n\n![](./vignettes/pdf.gif)\n\n```{r eval=FALSE}\n# Visualize Results for CDF (Not run, requires gganimate, gifski and transformr\n# packages)\np \u003c- ggplot(res,aes(x=x)) + geom_line(aes(y=cdf_est_vals, colour=\"Estimated\")) +\ngeom_line(aes(y=actual_cdf_vals, colour=\"Actual\")) +\n  scale_colour_manual(\"\", \n                      breaks = c(\"Estimated\", \"Actual\"),\n                      values = c(\"blue\", \"black\")) +\n  ylab(\"Cumulative Probability\") + \n  transition_states(idx_vals, transition_length = 2,state_length = 1) +\n  ggtitle('Observation index {closest_state}')\nanim_save(\"cdf.gif\", p)\n```\n\n![](./vignettes/cdf.gif)\n\n```{r eval=FALSE}\n# Visualize Results for Quantiles (Not run, requires gganimate, gifski and \n# transformr packages)\np \u003c- ggplot(res_q,aes(x=actual_quantile_vals)) +\n  geom_point(aes(y=quantile_est_vals), color=\"blue\") +\n  geom_abline(slope=1,intercept = 0) +xlab(\"Theoretical Quantiles\") +\n  ylab(\"Estimated Quantiles\") + \n  transition_states(idx_vals,transition_length = 2, state_length = 1) +\n  ggtitle('Observation index {closest_state}')\nanim_save(\"quant.gif\",p)\n```\n\n![](./vignettes/quant.gif)\n\n### Bivariate Example\n\nWe illustrate tracking a non-stationary bivariate data stream with another \nsample Shiny application. The bivariate Hermite estimator leverages an \nexponential weighting scheme as described in the univariate case and does not \nneed to maintain a sliding window. We include a stub for reading streaming data \nthat generates micro-batches of bivariate normal i.i.d. random data with a \nchosen Spearman's correlation coefficient (as this is easily linked to the \nstandard correlation matrix). This stub can again be readily swapped out for a \nmethod reading micro-batches from a Kafka topic or similar.\n\nThe Shiny sample code below can be pasted into a single app.R file and run \ndirectly.\n\n```{r eval=FALSE}\n# Not Run. Copy and paste into app.R and run.\nlibrary(shiny)\nlibrary(hermiter)\nlibrary(ggplot2)\nlibrary(magrittr)\n\nui \u003c- fluidPage(\n  titlePanel(\"Bivariate Streaming Statistics Analysis Example\"),\n  sidebarLayout(\n    sidebarPanel(\n      sliderInput(\"spearmans\", \"True Spearman's Correlation:\",\n                  min = -0.9, max = 0.9,\n                  value = 0, step = 0.1)\n    ),\n    mainPanel(\n      plotOutput(\"plot\"),\n      textOutput(\"spearman_text\")\n    )\n  )\n)\n\nserver \u003c- function(input, output) {\n  values \u003c- reactiveValues(hermite_est = \n                             hermite_estimator(N = 10, standardize = TRUE,\n                                               exp_weight_lambda = 0.01,\n                                               est_type=\"bivariate\"))\n  # Note that the stub below could be replaced with code that reads streaming \n  # data from various sources, Kafka etc.  \n  read_stream_stub_micro_batch \u003c- reactive({\n    invalidateLater(1000)\n    sig_x \u003c- 1\n    sig_y \u003c- 1\n    num_obs \u003c- 100\n    rho \u003c- 2 *sin(pi/6 * input$spearmans)\n    observations_mat \u003c- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2), \n    sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2),\n    nrow=2,ncol=2, byrow = TRUE))\n    return(observations_mat)\n  })\n  updated_spear_calc \u003c- reactive({\n    micro_batch \u003c- read_stream_stub_micro_batch()\n    for (idx in seq_len(nrow(micro_batch))) {\n      values[[\"hermite_est\"]] \u003c- isolate(values[[\"hermite_est\"]]) %\u003e%\n        update_sequential(micro_batch[idx,])\n    }\n    spear_est \u003c- isolate(values[[\"hermite_est\"]]) %\u003e%\n      spearmans(clipped = TRUE)\n    return(spear_est)\n  })\n  output$plot \u003c- renderPlot({\n    vals \u003c- seq(-5,5,by=0.25)\n    x_grid \u003c- as.matrix(expand.grid(X=vals, Y=vals))\n    rho \u003c- 2 *sin(pi/6 * input$spearmans)\n    actual_pdf \u003c-mvtnorm::dmvnorm(x_grid,mean=rep(0,2), \n    sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), \n    nrow=2,ncol=2, byrow = TRUE))\n    df_pdf \u003c- data.frame(x_grid,actual_pdf)\n    p1 \u003c- ggplot(df_pdf) + geom_tile(aes(X, Y, fill= actual_pdf)) +\n      scale_fill_gradient2(low=\"blue\", mid=\"cyan\", high=\"purple\",\n                           midpoint=.2,    \n                           breaks=seq(0,.4,by=.1), \n                           limits=c(0,.4)) +ggtitle(paste(\"True Bivariate \n                    Normal Density with matched Spearman's correlation\")) +\n       theme(legend.title = element_blank()) \n    p1\n  }\n  )\n  output$spearman_text \u003c- renderText({ \n    return(paste(\"Spearman's Correlation Estimate from Hermite Estimator:\", \n                 round(updated_spear_calc(), 1)))\n  })\n}\nshinyApp(ui = ui, server = server)\n```\n\n![](./vignettes/shiny_stream_example2.gif)\n\n## Citation Information\n\nTo cite this package, one can use the following code to generate the citation.\n\n```{r eval=FALSE}\ncitation(\"hermiter\")\n```\n\nThis yields:\n\nMichael S, Melvin V (2024). _hermiter: Efficient Sequential and Batch\nEstimation of Univariate and Bivariate Probability Density Functions and\nCumulative Distribution Functions along with Quantiles (Univariate) and\nNonparametric Correlation (Bivariate)_. R package version 2.3.1,\n\u003chttps://github.com/MikeJaredS/hermiter\u003e.\n\nMichael S, Melvin V (2023). “hermiter: R package for sequential\nnonparametric estimation.” _Computational Statistics_.\n\u003chttps://doi.org/10.1007/s00180-023-01382-0\u003e.\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMikeJaredS%2Fhermiter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMikeJaredS%2Fhermiter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMikeJaredS%2Fhermiter/lists"}