{"id":21028475,"url":"https://github.com/dirkschumacher/ndarray-linear-regression","last_synced_at":"2025-05-15T10:33:21.824Z","repository":{"id":57309354,"uuid":"108644538","full_name":"dirkschumacher/ndarray-linear-regression","owner":"dirkschumacher","description":"Linear regression (with QR decomposition) with ndarrays","archived":false,"fork":false,"pushed_at":"2020-05-25T04:07:08.000Z","size":24,"stargazers_count":4,"open_issues_count":4,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-10T10:16:53.808Z","etag":null,"topics":["data-science","linear-regression","machine-learning-algorithms","ndarray","statistics"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dirkschumacher.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-28T11:43:50.000Z","updated_at":"2018-08-29T21:28:07.000Z","dependencies_parsed_at":"2022-09-09T09:11:12.653Z","dependency_job_id":null,"html_url":"https://github.com/dirkschumacher/ndarray-linear-regression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirkschumacher%2Fndarray-linear-regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirkschumacher%2Fndarray-linear-regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirkschumacher%2Fndarray-linear-regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirkschumacher%2Fndarray-linear-regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dirkschumacher","download_url":"https://codeload.github.com/dirkschumacher/ndarray-linear-regression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225346426,"owners_count":17459977,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","linear-regression","machine-learning-algorithms","ndarray","statistics"],"created_at":"2024-11-19T11:55:46.666Z","updated_at":"2024-11-19T11:55:47.339Z","avatar_url":"https://github.com/dirkschumacher.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ndarray-linear-regression\n\nFit [linear regression](https://en.wikipedia.org/wiki/Linear_regression) models using [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) on [ndarray](https://github.com/scijs/ndarray) datastructures. It currently supports fitting, prediction intervals and standard errors for coefficients.\n\n[![npm version](https://img.shields.io/npm/v/ndarray-linear-regression.svg)](https://www.npmjs.com/package/ndarray-linear-regression)\n[![build status](https://img.shields.io/travis/dirkschumacher/ndarray-linear-regression.svg)](https://travis-ci.org/dirkschumacher/ndarray-linear-regression)\n![ISC-licensed](https://img.shields.io/github/license/dirkschumacher/ndarray-linear-regression.svg)\n\n## Installing\n\n```shell\nnpm install ndarray-linear-regression\n```\n\n## Usage\n\nAn example on how to fit a linear regression model to the `mtcars` dataset.\nThe model is `mpg ~ hp + cyl`. I.e. can we predict miles per gallon by a linear combination of `hp` and `cyl`.\n\n```js\nconst fit = require(\"ndarray-linear-regression\")\nconst mtcars = require(\"mtcars\")\nconst ndarray = require(\"ndarray\")\nconst pool = require(\"ndarray-scratch\")\n\nconst mpg = mtcars.map((x) =\u003e x.mpg)\nconst m = mpg.length\nconst n = 2\nconst hp = mtcars.map((x) =\u003e x.hp)\nconst cyl = mtcars.map((x) =\u003e x.cyl)\nconst response = ndarray(new Float64Array(mpg), [m])\n\nconst designMatrix = pool.zeros([m, n])\nconst newDataMatrix = pool.zeros([m, n])\nfor (let i = 0; i \u003c m; i++) {\n  for(let j = 0; j \u003c n; j++) {\n    const value = j == 0 ? hp[i] : cyl[i]\n    designMatrix.set(i, j, value)\n    newDataMatrix.set(i, j, value)\n  }\n}\n\n// fit the model\n// note, the response and designMatrix will be reused during the fitting process\n// That means the values in those data structures should not be used by any other\n// functions\nconst model = fit(response, designMatrix)\n\n// the coeffients are here\nconst coefficents = model.coefficents\n\n// you can use the resulting model object to make predictions for new data\nconst prediction = model.predict(newDataMatrix)\n\n// you can compute the standard errors for the coefficents\nconst SEs = model.computeCoefficentSEs()\n\n// and also predictions intervals\nconst predIntervals = model.predictionInterval(0.05, newDataMatrix)\n```\n\n## API\n\n### Fit\n\nIn order to fit a linear regression model you need to have two datastructures.\n\n* One is a response vector, an `ndarray` of floats of dimension `m`\n* The other one is a so called [design matrix](https://en.wikipedia.org/wiki/Design_matrix). It is encoded as an `ndarray`of\n  dimension `[m, n]`. So one row per element in your response. In machine learning,\n  the columns in that matrix are called \"features\".\n\nUsing the design matrix, you try to find a linear model that can predict the values in the response vector.\n\nThe following call shows how to fit a model:\n\n```js\nconst model = fit(response, designMatrix)\n```\n\nThe returned result is an object whose named elements are described in subsequent sections.\n\nIt is very important to note that both the `response` and the `designMatrix` will\nbe mutated during the fitting process. Other internal functions depend on the\ncorrectness of those values. This means that you need to make sure that the two\ndata structures are not used elsewhere. The consequence is that the memory footprint is lower, but we have mutable state 🙈\n\n\n### Model diagnostics, interpretation and inference\n\nThe following options are available to asses the fitted model:\n\n* `coefficients` - is an `ndarray` of dimension `[n]` with the estimated coefficients of the fitted model.\n* `residuals` - an `ndarray` of dimension `[m]` having the residuals. The residuals is the initial response vector minus the fitted values (i.e. the prediction on the training dataset).\n* `computeCoefficentSEs()` - the function computes the standard errors for the model `coefficents`. It returns and `ndarray` of dimension `[n]`. These values can be used to tests if your model variables have a statistical significant effect on the response.\n* `computeVcov()` - a function that computes the variance-covariance matrix of the model coefficients.\n\n### Prediction\n\nIn order to make predictions, use the functions below:\n\n* `predict(newData)` - is a function that takes a new design matrix and uses the fitted model to make predictions on unseen data. It returns an `ndarray` of dimension `[m]`\n* `predictionInterval(alpha, newData)` - is a function with two parameters:\n    * The first parameter `alpha`, a float between 0 and 1, is the so called significance level. A good choice for `alpha` is `0.05` :). The smaller this value, the larger your prediction intervals.\n    * The second parameter is a new design matrix, similar to the function `predict`.\n    * It returns an object with three elements `fit`, `lowerLimit` and `upperLimit`. The first one is the expected value of your prediction and the other two are the lower and upper limits of your `(1 - alpha)` [prediction intervals](https://robjhyndman.com/hyndsight/intervals/). This is especially handy when you want to give an estimate of uncertainty around your prediction.\n\n## Inspiration\n\nThe following links give more information and inspired the creation of this package. \n\n* https://www.stat.wisc.edu/courses/st849-bates/lectures/Orthogonal.pdf\n* https://stackoverflow.com/questions/38109501/how-does-predict-lm-compute-confidence-interval-and-prediction-interval\n* https://genomicsclass.github.io/book/pages/qr_and_regression.html\n\n## Contributing\n\nIf you have a question or have difficulties using `ndarray-linear-regression`, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to [the issues page](https://github.com/dirkschumacher/ndarray-linear-regression/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdirkschumacher%2Fndarray-linear-regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdirkschumacher%2Fndarray-linear-regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdirkschumacher%2Fndarray-linear-regression/lists"}