{"id":16511353,"url":"https://github.com/pbenner/gp.regression","last_synced_at":"2025-10-28T03:32:00.780Z","repository":{"id":69897596,"uuid":"42407371","full_name":"pbenner/gp.regression","owner":"pbenner","description":"Gaussian Process Regression Library for GNU-R","archived":false,"fork":false,"pushed_at":"2018-07-12T20:01:33.000Z","size":555,"stargazers_count":6,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T11:24:11.865Z","etag":null,"topics":["gaussian-processes","gnu-r","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-09-13T18:23:22.000Z","updated_at":"2021-07-29T20:42:11.000Z","dependencies_parsed_at":"2023-02-22T03:00:45.076Z","dependency_job_id":null,"html_url":"https://github.com/pbenner/gp.regression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgp.regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgp.regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgp.regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgp.regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/gp.regression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238590596,"owners_count":19497350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gaussian-processes","gnu-r","r"],"created_at":"2024-10-11T15:59:54.295Z","updated_at":"2025-10-28T03:32:00.228Z","avatar_url":"https://github.com/pbenner.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Installation\n\nTo install the package simply type\n\n\tmake install\n\nThe package requires *roxygen2*, *ggplot2*, *Matrix* and a couple of other libraries.\n\n#### OpenBLAS\n\nIt is recommended to use this library with a threaded version of OpenBLAS. See\n\n\thttps://cran.r-project.org/doc/manuals/r-devel/R-admin.html#BLAS\n\n## Introduction\n\nA first example:\n\n\tgp \u003c- new.gp(36.8, kernel.squared.exponential(200, 0.1))\n\nThis creates a Gaussian process with prior mean *36.8* and a squared exponential kernel. The likelihood model is a Gaussian with variance *1*. With\n\n\tgp \u003c- new.gp(36.8, kernel.squared.exponential(200, 0.1),\n\t      \t     likelihood=new.likelihood(\"normal\", 0.1))\n\nthe variance of the likelihood model is set to *0.1*. Samples from the prior Gaussian process can be drawn with\n\n\tdraw.sample(gp, 1:300*10, ep=0.000001)\n\n![Gaussian process samples](demo/gp1d.samples.png)\n\nAssuming we have the following observations\n\n\tlibrary(MASS)\n\n\txp \u003c- beav1$time\n\typ \u003c- beav1$temp\n\nwe may compute the posterior Gaussian processs with\n\n\tgp \u003c- posterior(gp, xp, yp)\n\nThe posterior distribution can be summarized and visualized at locations *x* with\n\n\tx \u003c- 1:300*10\n\n\tsummarize(gp, x)\n\n\tplot(gp, x)\n\n![One-dimensional Gaussian process](demo/gp1d.png)\n\nData with higher-dimensional covariantes can be analysed in the same way, e.g. for two dimensions\n\n\tnp \u003c- 400\n\txp \u003c- cbind(x1 = runif(np), x2 = runif(np))\n\typ \u003c- sin(pi*xp[,1]) + cos(2*pi*xp[,2]) + rnorm(np, 0, 1)\n\n\tgp \u003c- new.gp(0.5, kernel.squared.exponential(0.5, 1), dim=2)\n\tgp \u003c- posterior(gp, xp, yp, 1)\n\n\tx  \u003c- as.matrix(expand.grid(x = 1:100/100, y = 1:100/100))\n\tplot(gp, x, plot.scatter=TRUE, plot.variance=FALSE)\n\n![Two-dimensional Gaussian process](demo/gp2d.png)\n\n### Kernel functions\n\nName | Constructor |Parameters\n-----|-------------|----------\nLinear | *kernel.linear* | variance_0, variance, c (offset)\nSquared exponential | *kernel.squared.exponential* | l, variance\nGamma exponential | *kernel.gamma.exponential* | l, variance, gamma\nPeriodic | *kernel.periodic* | l, variance, p (periodicity)\nLocally periodic | *kernel.locally.periodic* | l, variance, p (periodicity)\nOrnstein-Uhlenbeck | *kernel.ornstein.uhlenbeck* | l, variance\nMatern | *kernel.matern* | l, variance, nu\nCombined kernel | *kernel.combined* | k1, k2, ..., kn (kernel functions)\n\n![Samples with different kernel functions](demo/kernel1.png)\n\n### Likelihood models and link functions\n\n#### Gamma likelihood model\n\nThe following example creates a Gaussian process with gamma likelihood. Since the domain of the gamma distribution is the positive reals, we need a link function, such as the *logistic* function, to transform the process.\n\n\tgp \u003c- new.gp(1.0, kernel.squared.exponential(1.0, 5.0),\n\t\t     likelihood=new.likelihood(\"gamma\", 1.0),\n\t\t     link=new.link(\"logistic\"))\n\nThe shape of the gamma likelihood is set to *1.0*, whereas the mean is determined by the Gaussian process. Given the observations\n\n\tn  \u003c- 1000\n\txp \u003c- 10*runif(n)\n\typ \u003c- rgamma(n, 1, 2)\n\nwe obtain the posterior distribution with\n\n\t# add some tiny noise to the diagonal for numerical stability\n\tgp \u003c- posterior(gp, xp, yp, ep=0.01, verbose=TRUE)\n\tsummarize(gp, 0:10/5)\n\n\tplot(gp, 1:100/10)\n\n![Gamma likelihood](demo/gamma.png)\n\n#### Binomial likelihood\n\nThe *probit* link function can be used for binomial observations. In this case there is no specific likelihood model needed.\n\n\tgp \u003c- new.gp(0.5, kernel.squared.exponential(1, 0.25),\n\t\t     likelihood=NULL,\n\t\t     link=new.link(\"probit\"))\n\nThe observations are given by\n\n\txp \u003c- c(1,2,3,4)\n\typ \u003c- matrix(0, 4, 2)\n\typ[1,] \u003c- c(2, 14)\n\typ[2,] \u003c- c(4, 12)\n\typ[3,] \u003c- c(7, 10)\n\typ[4,] \u003c- c(15, 8)\n\nwhere *xp* is the locations of the observations and *yp* contains the count statistics (i.e. number of heads and tails).\n\n### Heteroscedastic Gaussian process\n\nHeteroscedasticity can be modeled with a second Gaussian process for the variance of the likelihood model. An example is given by\n\n\tgp \u003c- new.gp.heteroscedastic(\n\t\tnew.gp( 0.0, kernel.squared.exponential(4, 100)),\n\t\tnew.gp(10.0, kernel.squared.exponential(4,  10),\n\t\t       likelihood=new.likelihood(\"gamma\", 1),\n\t\t       link=new.link(\"logistic\")),\n\t\ttransform     = sqrt,\n\t\ttransform.inv = function(x) x^2)\n\nwhere the second Gaussian process uses a gamma likelihood model in combination with a logistic link function. The empirical variances are transformed by taking the square root. Testing the model on the *mcycle* data set\n\n\tdata(\"mcycle\", package = \"MASS\")\n\n\tgp \u003c- posterior(gp, mcycle$times, mcycle$accel, 0.00001,\n\t                step = 0.1,\n\t                epsilon = 0.000001,\n\t                verbose=T)\n\ngives the following result\n\n![Heteroscedastic GP](demo/mcycle.png)\n\n### Robust regression with heavy-tailed distributions\n\nHeavy-tailed distributions, such as Student's t-distribution, are useful for modeling data that include outliers. Rasmussen's mode finding (a stabilized Newton's method, originally implemented in [GPML](http://www.gaussianprocess.org/gpml/code/matlab/doc/)) can be used for performing inference with Student's-t likelihood, which can be used for robust regression. Let us first generate a corrupted sin wave by\n\n\tmakeCorruptedSin \u003c- function(x, basenoise, number_of_corruption, corruptionwidth){\n            y_clean = sin(x)\n            y = y_clean + rnorm(length(x),sd = basenoise)\n            corruption_points = floor(runif(min=1, max=length(x),n = number_of_corruption))\n            for (index in corruption_points){\n                y[[index]] = y[[index]] + runif(min=-corruptionwidth, max=corruptionwidth, n = 1)\n            }\n            return(y)\n\t}\n        \n\tx = seq(from=-3.14, to=3.14, by=0.1)\n\ty = makeCorruptedSin(x, basenoise = 0.1, number_of_corruption = 25, corruptionwidth = 4)\n\non which a Gaussian process with a Gaussian likelihood model \n\n\tgp_n \u003c- new.gp(0, kernel.squared.exponential(2, 2),\n\tlikelihood=new.likelihood(\"normal\", 0.1))\n\tgp_n \u003c- posterior(gp_n, x, y)\n\tplot(gp_n,x)\n\n![Heavy-tail](demo/heavytail_normal.png)\n\nperforms poorly. In contrast, a Gaussian process with a Student's t likelihood \n\t\n\tgp_t \u003c- new.gp(0, kernel.squared.exponential(2, 2),\n             likelihood=new.likelihood(\"t\", 2.1, 0.1))\n\tgp_t \u003c- posterior(gp_t, x, y, ep= 0.00000001, epsilon = 0.000001,\n                    verbose = TRUE, modefinding='rasmussen')\n\tplot(gp_t,x)\n\ncaptures the ground truth well. \n\n![Heavy-tail](demo/heavytail_student.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fgp.regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbenner%2Fgp.regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fgp.regression/lists"}