{"id":19167556,"url":"https://github.com/thomasweise/regressor","last_synced_at":"2026-06-18T12:30:14.460Z","repository":{"id":90948412,"uuid":"126432344","full_name":"thomasWeise/regressoR","owner":"thomasWeise","description":"An R Package for Fitting Models to Two-Dimensional (x-y) Data","archived":false,"fork":false,"pushed_at":"2018-07-30T01:32:25.000Z","size":3501,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-03T23:12:04.615Z","etag":null,"topics":["cross-validation","modeling","parallelism","r","regression"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasWeise.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-03-23T04:17:04.000Z","updated_at":"2018-07-30T01:32:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"d5945d48-2423-4e4d-b84b-8acf01a01d40","html_url":"https://github.com/thomasWeise/regressoR","commit_stats":{"total_commits":78,"total_committers":2,"mean_commits":39.0,"dds":"0.16666666666666663","last_synced_commit":"1d0fff6c5560dd53bd6378229bb533a7404a2dbf"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasWeise%2FregressoR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasWeise%2FregressoR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasWeise%2FregressoR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasWeise%2FregressoR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasWeise","download_url":"https://codeload.github.com/thomasWeise/regressoR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240247611,"owners_count":19771335,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-validation","modeling","parallelism","r","regression"],"created_at":"2024-11-09T09:38:14.655Z","updated_at":"2026-06-18T12:30:14.407Z","avatar_url":"https://github.com/thomasWeise.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# An R Package for Fitting Models to 2-Dimensional (x-y) Data\n\n[\u003cimg alt=\"Travis CI Build Status\" src=\"https://img.shields.io/travis/thomasWeise/regressoR/master.svg\" height=\"20\"/\u003e](https://travis-ci.org/thomasWeise/regressoR/)\n\n## 1. Introduction\n\nIn many scenarios, we have two-dimensional data in the form of `(x, y)` tuples.\nWe then want to know the relationship between the `x` and the `y` coordinates.\nThe resulting model could be any type of function `f`.\nIn the ideal case, `f(x)=y` will hold, but of course if the data results from measurements, there may be some errors and deviations in it.\nThe goal is then to find the function `f` for which these deviations are as small as possible.\n\nHowever, this goal could be misleading:\nIf we have `n` points, we can always use a polynomial of degree `n-1` to go exactly through all points: a constant for one point, a linear function through two points, a quadratic function through three points, a cubic polynomial through four points, etc.\nSuch a model would not necessarily make much sense.\nActually, such a model would not give us any new information since it basically just \"encodes\" the `(x, y)`\ntuples, similar to a mathematical version of a lookup table.\nThis is called [overfitting](http://en.wikipedia.org/wiki/Overfitting), because this model fits well to the data that we know but won't fit to data that we will measure in the future from the same source.\nWhat we want are thus models that *a)* fit well to the observed data but *b)* are not overfitted,\ni.e., are likely to generalize and to fit to future observations. Ideally, of course, *c)* the\nmodels should have few parameters and be compact.\n\nHere, we try to make a package for getting such models automatically.\nOur package uses a library of functional models which it applies to the measured data.\nAdditionally, it can also fit multi-layer perceptrons (by treating them as simple functions, same as the other models) and splines.\nThe fitting of the functional models is done using other `R` package, e.g., via the [Levenberg-Marquardt Algorithm](http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm) and numerical optimization methods.\nThe package uses cross-validation to pick a model which seems to generalize well and then trains it again on the complete data set.\nOur library furthermore attempts to deal with the fact that sometimes, models will not fit well on the raw data but on a log-scaled version of the data.\nIt does so by creating multiple different representations of the data and include these representations and the models in the learning step.\nIf two models seemingly fit similarly well to the data, then the smaller one will be selected.\n\nAll of this comes at the cost of higher runtime.\nHence, we provide a parameter `q` that can be used to tune the fitting effort from `0` (low) to `1` (maximum).\nOur package is still in a very early version, it is not efficient, it is slow.\nBut it already somewhat works.\n\n## 2. Examples\n\n### 2.1. Motivating Example\n\nLet's say we have some data `dx` and `dy` of more or less unknown nature and want to find\na function `f` which can properly represent the relationships between `dx` and `dy`.\nThe easiest way to do this with our package is simply invoking `regressoR.learn(x=dx, y=dy)`.\nThis will apply the complete palette of solvers and models to the data, internally perform cross-validation to fit the best fitting model, fit this model again on the complete data, and return it.\n\n    dx \u003c- rnorm(100);\n    dy \u003c- rnorm(n=100, mean=50*dx*dx-33);\n    plot(dx, dy)\n    result \u003c- regressoR.learn(x=dx, y=dy);\n    result@f\n    # function (x)\n    # -32.5442186855071 + (x * (0.776119279549966 + (x * 49.7907873618706)))\n    result@quality\n    # [1] 0.2455075\n    dx.sorted \u003c- sort(dx)\n    lines(dx.sorted, result@f(dx.sorted), col=\"red\")\n\nOf course, the approach is limited to the functions currently in the model library.\nBut it also automatically tries to apply the models to transformed versions of the data, such as log-scaled variant.\nThis way, even with just a linear and a quadratic model in the library, we can represent exponential relationships:\n    \n    dx \u003c- runif(100) + 0.1\n    dy.raw \u003c- 1 + 0.4* exp(5 - 3*dx + 0.6*dx*dx)\n    dy \u003c- rnorm(n=100, mean=dy.raw)\n    plot(dx, dy)\n    result \u003c- regressoR.learn(x=dx, y=dy);\n    dx.sorted \u003c- sort(dx)\n    lines(dx.sorted, result@f(dx.sorted), col=\"blue\")\n    result@f\n    # function (x) \n    # exp(x = ((1.14651435794791 + (x * (-1.4311431348238 + (x * 0.439385151660909)))) * \n    #     4.51412008649805) - 0.112652709296488)\n    \n### 2.2. Bigger Example\n\nHere you can find some example fitting results for different levels of fitting effort `q`, from minimal (`q=0`) to maximal (`q=1`).\nAt the current, probably inefficient version of our code, running this example takes quite some time.\n\n![Bigger Example with Functions Fitted at Different Fitting Powers](examples/fitAndPlot.png)\n\n    set.seed(4577745L);\n    \n    library(utilizeR);\n    library(plotteR);\n    library(regressoR);\n    library(parallel);\n    library(regressoR.functional);\n    library(regressoR.functional.models);\n    \n    if(!exists(\"log\")) { log \u003c- makeLogger(TRUE); }\n    log(\"Welcome to the regression example.\");\n    log(\"We will create three example data sets and then fit them using different fitting powers.\");\n    log(\"We will utilize parallel computing if possible.\");\n    \n    # make an example\n    make.example \u003c- function(f) {\n      log(\"Now creating example for function \", functionToString(f), \".\");\n      n \u003c- 225; # make 225 points\n      x \u003c- sort(runif(n=n, min=0, max=3)); # generate x data\n      y \u003c- rnorm(n=n, mean=f(x), s=0.1);  # noisy y\n      x \u003c- rnorm(n=n, mean=x, s=0.1); # noisy x\n      return(list(x=x, y=y, f=f));\n    }\n    \n    # the three base functions\n    f \u003c- c(function(x) 1 - 0.2*x + 0.75*x*x - 0.3*x*x*x,\n           function(x) 0.1 * exp(3 - x),\n           function(x) 1.2 + 0.7*sin(2*x));\n    \n    # create the three example data sets\n    examples \u003c- lapply(X=f, FUN=make.example);\n    # get the minimum and maximal actual x coordinates\n    min.x    \u003c- min(vapply(X=examples, FUN=function(z) min(z$x), FUN.VALUE=-1));\n    max.x    \u003c- max(vapply(X=examples, FUN=function(z) max(z$x), FUN.VALUE=4));\n    log(\"The minimum actual x value is \", min.x, \" and the maximum value is \", max.x, \".\");\n    \n    range.x \u003c- max.x - min.x;\n    start.x \u003c- floor(10*(min.x - 0.1*range.x))/10;\n    end.x   \u003c- ceiling(10*(max.x + 0.1*range.x))/10;\n    log(\"We will draw the diagrams from x=\", start.x,\n        \" to x=\", end.x,\n        \" to test the generalization ability of the regression results.\");\n    \n    if(!exists(\"n\")) { n \u003c- 3L; total \u003c- n + 1L;}\n    if(!exists(\"arrangement\")) {\n      arrangement \u003c- plots.arrange(total);\n    }\n    \n    # we want to put the figues next to each other: the original data/function and\n    # fitting results at five quality levels\n    log(\"Setting a grid of \", arrangement[1],\n        \" rows and \", arrangement[2],\n        \" columns for the \", total, \" diagrams.\")\n    old.par \u003c- par(mfrow=arrangement);\n    \n    log(\"First, we plot the original data.\");\n    \n    # plot the original data\n    batchPlot.list(examples,\n                   names=c(\"f1\", \"f2\", \"f3\"),\n                   main=\"Original Data and Function Values for x\",\n                   legend=list(x=\"bottom\", horiz=TRUE),\n                   x.min.lower=start.x,x.min.upper=start.x,\n                   x.max.lower=end.x, x.max.upper=end.x, x.add=TRUE);\n    abline(v=min.x, col=\"gray\");\n    abline(v=max.x, col=\"gray\");\n    \n    # create the fitting tasks to be solved\n    tasks \u003c- unlist(lapply(X=seq_len(n),\n                    FUN=function(i) {\n                      lapply(X=examples, FUN=function(d) {\n                        d$q = round((i-1L)/(n-1L), 2); # add power\n                        d # return example function + fitting power\n                        })\n                    }), recursive=FALSE);\n    log(\"We defined \", length(tasks), \" tasks to be solved in parallel.\");\n    \n    # get the number of cores\n    cores \u003c- getOption(\"mc.cores\");\n    if(is.null(cores)) {\n      cores \u003c- parallel::detectCores();\n      options(mc.cores=cores);\n      log(\"Detected \", cores, \" cores for parallel computation.\");\n    } else {\n      log(\"Number of cores for parallel computation provided as \", cores);\n    }\n    \n    # compute the results\n    log(\"Now we apply the fitting procedure in parallel using \",\n        getOption(\"mc.cores\", 2L), \" cores.\");\n    \n    # learn in parallel\n    results \u003c- mclapply(X=tasks, FUN=function(task)\n                        regressoR.learnForExport(\n                        x=task$x, y=task$y, q=task$q));\n    log(\"Done with the fitting, now plotting.\");\n    \n    x.dummy \u003c- start.x + ((end.x - start.x) * ((0:100)/100));\n    \n    # Automatically learn models and plot the results\n    for(i in seq_len(n)) {\n      # select the right results\n      start \u003c- (i-1L)*length(examples) + 1L;\n      end   \u003c- (i * length(examples));\n      res   \u003c- results[start:end];\n    \n      # plot the regression results, while extending the models slightly beyond the original range\n      # to see how they generalize\n      batchPlot.RegressionResults(res, plotXY=TRUE, plotXF=TRUE,\n        main=paste(\"q=\", round(tasks[[start]]$q, 3), sep=\"\", collapse=\"\"),\n        # as names, use the fitting quality\n        names=vapply(res, FUN=function(r)\n                     paste(r@name, \":\", round(r@result@quality, 3),\n                           sep=\"\", collapse=\"\"),\n                     FUN.VALUE=\"\"),\n        x.min.lower=start.x, x.min.upper=start.x,\n        x.max.lower=end.x, x.max.upper=end.x, x.add=TRUE,\n        legend=list(x=\"bottomleft\", horiz=FALSE, ncol=1L,\n                    cex=0.75));\n      abline(v=min.x, col=\"gray\");\n      abline(v=max.x, col=\"gray\");\n      lines(x=x.dummy, y=f[[1]](x.dummy), col=\"gray\");\n      lines(x=x.dummy, y=f[[2]](x.dummy), col=\"gray\");\n      lines(x=x.dummy, y=f[[3]](x.dummy), col=\"gray\");\n    }\n    \n    # restore old settings\n    invisible(par(old.par));\n    \n    log(\"All done.\");\n\n## 3. Installation\n\nYou can install the package directl from GitHub by using the package\n[`devtools`](http://cran.r-project.org/web/packages/devtools/index.html) as\nfollows:\n\n    library(devtools)\n    install_github(\"thomasWeise/regressoR\")\n\nIf `devtools` is not yet installed on your machine, you need to FIRST do\n\n    install.packages(\"devtools\")\n    \n## 4. License\n\nThe copyright holder of this package is Prof. Dr. Thomas Weise (see Contact).\nThe package is licensed under the  GNU LESSER GENERAL PUBLIC LICENSE Version 3, 29 June 2007.\n    \n## 5. Contact\n\nIf you have any questions or suggestions, please contact\n[Prof. Dr. Thomas Weise](http://iao.hfuu.edu.cn/team/director) of the\n[Institute of Applied Optimization](http://iao.hfuu.edu.cn/) at\n[Hefei University](http://www.hfuu.edu.cn) in\nHefei, Anhui, China via\nemail to [tweise@hfuu.edu.cn](mailto:tweise@hfuu.edu.cn).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasweise%2Fregressor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasweise%2Fregressor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasweise%2Fregressor/lists"}