{"id":32199776,"url":"https://github.com/simulatr/simrel","last_synced_at":"2026-02-21T13:01:37.875Z","repository":{"id":56934617,"uuid":"74118388","full_name":"simulatr/simrel","owner":"simulatr","description":"Simulation of Multivariate Linear Model Data","archived":false,"fork":false,"pushed_at":"2022-11-15T21:11:29.000Z","size":16620,"stargazers_count":3,"open_issues_count":6,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-09T19:17:19.905Z","etag":null,"topics":["bivariate-simulation","multivariate-simulation","relevant-predictor-components","simulated-data","simulation","univariate-simulation"],"latest_commit_sha":null,"homepage":"https://simulatr.github.io/simrel/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simulatr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-18T10:08:00.000Z","updated_at":"2022-11-15T21:11:36.000Z","dependencies_parsed_at":"2023-01-21T19:33:23.122Z","dependency_job_id":null,"html_url":"https://github.com/simulatr/simrel","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/simulatr/simrel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simulatr%2Fsimrel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simulatr%2Fsimrel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simulatr%2Fsimrel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simulatr%2Fsimrel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simulatr","download_url":"https://codeload.github.com/simulatr/simrel/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simulatr%2Fsimrel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29681468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T12:30:22.644Z","status":"ssl_error","status_checked_at":"2026-02-21T12:29:55.402Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bivariate-simulation","multivariate-simulation","relevant-predictor-components","simulated-data","simulation","univariate-simulation"],"created_at":"2025-10-22T03:27:11.330Z","updated_at":"2026-02-21T13:01:37.863Z","avatar_url":"https://github.com/simulatr.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"Simulation of Multivariate Linear Model Data\n================\nRaju Rimal, Trygve Almøy \u0026 Solve Sæbø\n\n[![CRAN\nstatus](https://www.r-pkg.org/badges/version/simrel)](https://cran.r-project.org/package=simrel)\n[![Build\nStatus](https://travis-ci.org/simulatr/simrel.svg?branch=master)](https://travis-ci.org/simulatr/simrel)\n[![Codecov test\ncoverage](https://codecov.io/gh/simulatr/simrel/branch/master/graph/badge.svg)](https://codecov.io/gh/simulatr/simrel?branch=master)\n\n# Introduction\n\n`Simrel` r-package is a versatile tool for simulation of multivariate\nlinear model data. The package consist of four core functions –\n`unisimrel`, `bisimrel`, `multisimrel` and `simrel` for simulation. It\nalso has two more functions – one for plotting covariance and rotation\nmatrices and another for plotting different properties of simulated\ndata. As the name suggests, `unisimrel` function is used for simulating\nunivariate linear model data, `bisimrel` simulates bivariate linear\nmodel data where user can specify the correlation between two responses\nwith and without given **X**. In addition, this function allows users to\nget responses (**y**) having common relevant components.\n\nAn extension of `bisimrel` and `unisimrel` is `multisimrel`, by which\nuser can simulate multivariate linear model data with multiple\nresponses. In this simulation, each response must have exclusive set of\npredictors and relevant predictors components. Following examples will\ngive a clear picture of these functions. The forth simulation function\n`simrel` wraps around these function and calls them according to what\ntype of data a user is simulating. Following section discusses about the\narguments required for each of these simulation function.\n\n## Simulation Parameters\n\nA tool for simulating linear model data with single response discussed\nin sæbø et.al. (2015) and multi-response discussed in rimal et.al.\n(2018) is the basis for these functions. The function require following\narguments which are also parameters for the simulation.\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003en\u003c/code\u003e: Number of training samples (n)\u003c/summary\u003e An\ninteger for number of training samples. For example: \u003ccode\u003en =\n1000\u003c/code\u003e simulates 1000 training observations.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003ep:\u003c/code\u003e Number of predictor variables (p)\u003c/summary\u003e An\ninteger for number of predictor variables. \u003ccode\u003ep = 150\u003c/code\u003e gives\ndata with 150 predictor variables.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003eq:\u003c/code\u003e Number of relevant predictors (q)\u003c/summary\u003e An\ninteger for the number of predictor variables that are relevant for the\nresponse. For example: \u003ccode\u003eq = 15\u003c/code\u003e results 15 predictors out of\n\u003ccode\u003ep\u003c/code\u003e relevant for the response.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003erelpos:\u003c/code\u003e Position of relevant components\u003c/summary\u003e\nA vector of position index of relevant principal components of\n\u003cstrong\u003ex\u003c/strong\u003e. For instance, \u003ccode\u003erelpos = c(1, 2, 3, 5)\u003c/code\u003e\nwill give data with 4 relevant components at position 1, 2, 3 and 5.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003eR2:\u003c/code\u003e Coefficient of determination\u003c/summary\u003e A\ndecimal value between 0 and 1 specifying the coefficient of\ndetermination. Input of \u003ccode\u003eR2 = 0.8\u003c/code\u003e gives data with 0.8\ncoefficient of determination.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003egamma:\u003c/code\u003e Decay factor for exponential decay of\neigenvalues of predictor variables\u003c/summary\u003e A numeric value greater\nthan 0. It is a factor controlling exponential decay of eigenvalues of\npredictor variables. For \u003ccode\u003ep\u003c/code\u003e predictors, the eigenvalues are\ncomputed as \u003ccode\u003eexp(-gamma(i-1))\u003c/code\u003e for \u003ccode\u003ei=1, 2, …, p\u003c/code\u003e\nso that, higher the value of \u003ccode\u003egamma\u003c/code\u003e steeper will be the\ndecay of eigenvalues. Since steeper eigenvalues corresponds to high\nmulticollinearity in data, \u003ccode\u003egamma\u003c/code\u003e also controls the\nmulticollinearity present in the simulated data.\n\n\u003cimg src=\"figure/gamma-decay-1.svg\" width=\"100%\" /\u003e\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003eeta:\u003c/code\u003e Decay factor for exponential decay of\neigenvalues of response variables\u003c/summary\u003e Similar to\n\u003ccode\u003egamma\u003c/code\u003e, it is a factor controlling exponential decay of\neigenvalues of response variables. For \u003ccode\u003em\u003c/code\u003e responses, the\neigenvalues are computed as \u003ccode\u003eexp(-eta(j-1))\u003c/code\u003e for \u003ccode\u003ej=1,\n2, …, m\u003c/code\u003e so that, higher the value of \u003ccode\u003eeta\u003c/code\u003e steeper\nwill be the decay of eigenvalues.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003em:\u003c/code\u003eNumber of Response Variables (Only applicable\nfor Univariate Simulation)\u003c/summary\u003e An integer specifying the number of\nresponse variables to simulate. This is only applicable in Multivariate\nSimulation (\u003ccode\u003emultisimrel\u003c/code\u003e).\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003eypos:\u003c/code\u003ePosition of response components indices to\ncombine together\u003c/summary\u003e The true dimension of response matrix\ncontaining the information can be smaller than the dimension that is\ndefined by all the response variables. Lets suppose for an example that\nonly two response components actually contains information that the\npredictor (or a subset of predictor) can explain. However, In the\nsimulation data user might want 5 response variables that contains the\nsame inforamtion contained in two latent component of the respone\nvariables. The \u003ccode\u003eypos\u003c/code\u003e parameters if specified as\n\u003ccode\u003elist(c(1, 3), c(2, 4, 5))\u003c/code\u003e will mix up the inforamtion in\nresponse component 1 with uninformative component 3 so that the response\nvariable 1 and 3 contains the same information that was contained in\nresponse component 1. A similar description can be made for the second\nresponse component which is mixed with non-informative response\ncomponents 4 and 5.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003ccode\u003erho:\u003c/code\u003eCorrelation between two response variables\n(Only applicable for Bivariate Simulation\u003c/summary\u003e A vector of two\nnumbers specifying the correlation between the the two response variable\nin bivariate simulation. The first number is for the correlation without\nthe knownledge of predictors while the second number is for the\ncorrelation given the predictors. These values should be between the\nrange of a correlation, i.e. -1 to 1.\n\n\u003c/details\u003e\n\n# Installation\n\nThe package is now available in CRAN and can be installed using\n`install.packages` as,\n\n``` r\ninstall.packages(\"simrel\")\n```\n\nA more recent stable version can be download from GitHub as,\n\n``` r\n# install.pacakges(\"devtools\")\ndevtools::install_github(\"simulatr/simrel\")\ndevtools::install_bitbucket(\"simulatr/simrel\")\n```\n\n# Examples\n\n## Univariate Simulation:\n\nSimulate a univariate linear model data with 100 training samples and\n500 test samples having 10 predictors (**X**) where only 8 of them are\nrelevant for the variation in the response vector. The population model\nshould explain 80% of the variation present in the response. In\naddition, only 1st and 3rd principal components of **X** should be\nrelevant for *y* and the eigenvalues of **X** decreases exponentially by\na factor of 0.7.\n\n``` r\nlibrary(simrel)\nsim_obj \u003c-\n  simrel(\n    n      = 100,         # 100 training samples\n    p      = 10,          # 10 predictor variables\n    q      = 8,           # only 8 of them are relevant\n    R2     = 0.8,         # 80% of variation is explained by the model\n    relpos = c(1, 3),     # First and third principal components are relevant\n    gamma  = 0.7,         # decay factor of eigenvalue of X is 7\n    ntest  = 500,         # 500 Test observations\n    type   = \"univariate\" # Univariate linear model data simulation\n)\n```\n\nHere `sim_obj` is a object with class `simrel` and constitue of a list\nof simulated linear model data along with other relevant properties.\nLets use `plot` function to overview the situation,\n\n``` r\nplot_simrel(sim_obj, which = c(1, 2, 4),\n            layout = matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))\n```\n\n\u003cimg src=\"figure/simrel1-plot-1.svg\" width=\"100%\" /\u003e\n\n## Bivariate Simulation\n\nThe wrapper function `simrel` uses `bisimrel` for simulating bivariate\nlinear model data. Lets consider a situation to simulate data from\nbivariate distribution with 100 training and 500 test samples. The\nresponse vectors **y**\u003csub\u003e1\u003c/sub\u003e and **y**\u003csub\u003e2\u003c/sub\u003e have\ncorrelation of 0.8 without given **X** and 0.6 with given **X**. Among\n10 total predictor variables, 5 are relevant for **y**\u003csub\u003e1\u003c/sub\u003e and 5\nare relevant for **y**\u003csub\u003e2\u003c/sub\u003e. However 3 of them are relevant for\nboth of them. Let the predictors explain 80% and 70% of total variation\npresent in population of **y**\u003csub\u003e1\u003c/sub\u003e and **y**\u003csub\u003e2\u003c/sub\u003e\nrespectively. In addition, let 1, 2 and 3 components are relevant for\n**y**\u003csub\u003e1\u003c/sub\u003e and 3 and 4 components are relevant for\n**y**\u003csub\u003e2\u003c/sub\u003e. In this case, the third component is relevant for\nboth responses. Let the decay factor of eigenvalues of **X** be 0.8.\n\n``` r\nsimrel2_obj \u003c-\n  simrel(\n    n      = 100,                       # 100 training samples\n    p      = 10,                        # 10 predictor variables\n    q      = c(5, 5, 3),                # relevant variables for y1 and y2\n    relpos = list(c(1, 2, 3), c(3, 4)), # relevant components for y1 and y2\n    R2     = c(0.8, 0.7),               # Coefficient of variation for y1 and y2\n    rho    = c(0.8, 0.6),               # correlation between y1 and y2 with and without given X\n    gamma  = 0.8,                       # decay factor of eigenvalues of X\n    ntest  = 500,                       # 500 test samples\n    type   = \"bivariate\"\n  )\n```\n\nLets look at the plot,\n\n``` r\nplot_simrel(simrel2_obj, which = c(1, 3, 4),\n            layout = matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))\n```\n\n\u003cimg src=\"figure/simrel2_plot-1.svg\" width=\"100%\" /\u003e\n\n## Multivariate Simulation\n\nMultivariate simulation uses `multisimrel` function and can simulate\nmultiple responses. Lets simulate 100 training samples and 500 test\nsamples. The simulated data has 5 responses and 15 predictors. These 5\nresponses spans 5 latent space out of which only 3 are related to the\npredictors. Lets denote them by **w**\u003csub\u003ei\u003c/sub\u003e. Let 5, 4 and 4\npredictors are relevant for response components **w**\u003csub\u003e1\u003c/sub\u003e,\n**w**\u003csub\u003e1\u003c/sub\u003e and **w**\u003csub\u003e1\u003c/sub\u003e respectively. The position of\nrelevant predictor components for **w**\u003csub\u003e1\u003c/sub\u003e be 1, 2, 3; for\n**w**\u003csub\u003e2\u003c/sub\u003e be 4 and 5. Similarly, predictor components 6 and 8\nare relevant for **w**\u003csub\u003e3\u003c/sub\u003e.\n\nSince we need 5 response variables, we mix-up these 3 informative\nresponse components with 2 remaining uninformative components so that\nall simulated response contains information that **X** are related. Lets\ncombine **w**\u003csub\u003e1\u003c/sub\u003e with **w**\u003csub\u003e4\u003c/sub\u003e and **w**\u003csub\u003e3\u003c/sub\u003e\nwith **w**\u003csub\u003e5\u003c/sub\u003e. So that the predictors that are relevant for\nresponse components **w**\u003csub\u003e1\u003c/sub\u003e will be relevant for response\n**y**\u003csub\u003e1\u003c/sub\u003e and **y**\u003csub\u003e3\u003c/sub\u003e and so on.\n\nIn addition to these latent space requirements, let **X** explains 80%\nvariation present in **w**\u003csub\u003e1\u003c/sub\u003e, 50% in **w**\u003csub\u003e2\u003c/sub\u003e and 70%\nin **w**\u003csub\u003e3\u003c/sub\u003e. The eigenvalues of X reduces by the factor of 0.8.\n\n``` r\nsimrel_m_obj \u003c-\nsimrel(\n    n      = 100,                                # 100 training samples\n    p      = 15,                                 # 15 predictor variables\n    q      = c(5, 4, 4),                         # relevant variables for w1, w2 and w3\n    relpos = list(c(1, 2, 3), c(4, 5), c(6, 8)), # relevant components for w1, w2 and y3\n    R2     = c(0.8, 0.5, 0.7),                   # Coefficient of variation for w1, w2 and y3\n    ypos   = list(c(1, 4), c(2), c(3, 5)),       # combining response components together\n    m      = 5,                                  # Number of response\n    gamma  = 0.8,                                # decay factor of eigenvalues of X\n    ntest  = 500,                                # 500 test samples\n    type   = \"multivariate\"                      # multivariate simulation\n  )\n```\n\nLets look at the `simrel` plot;\n\n``` r\nplot_simrel(simrel_m_obj, which = 1:4,\n            layout = matrix(1:4, 2, 2, byrow = TRUE))\n```\n\n\u003cimg src=\"figure/simrelm_plot-1.svg\" width=\"100%\" /\u003e\n\n## RStudio Addins\n\nTo make the process easier to use, we have created a shiny gadget as an\nrstudio addons. If you are using Rstuio, you can access this app from\nTools \\\u003e Addins \\\u003e simulatr. But you can also access this app using\n`simrel::app_simulatr()`. This will open the app in a browser from where\nyou can choose all your parameter, see the true population parametrs you\nwill get from the simulation. When the app is closed, it will give an\ncommand output on your R console.\n\n![App-Simulatr-Screenshot](figure/AppSimrel.png)\n\n## References\n\n  - Sæbø, S., Almøy, T., \u0026 Helland, I. S. (2015). simrel—A versatile\n    tool for linear model data simulation based on the concept of a\n    relevant subspace and relevant predictors. \u003cem\u003eChemometrics and\n    Intelligent Laboratory Systems\u003c/em\u003e, 146, 128-135.\n  - Rimal, R., Almøy, T., \u0026 Sæbø, S. (2018). A tool for simulating\n    multi-response linear model data. \u003cem\u003eChemometrics and Intelligent\n    Laboratory Systems\u003c/em\u003e, 176, 1-10.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimulatr%2Fsimrel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimulatr%2Fsimrel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimulatr%2Fsimrel/lists"}