{"id":19891415,"url":"https://github.com/unine-chyn/multilstsq","last_synced_at":"2025-03-01T05:17:28.828Z","repository":{"id":92506365,"uuid":"124050963","full_name":"UniNE-CHYN/multilstsq","owner":"UniNE-CHYN","description":"Python 3 module for doing simultaneous linear regression","archived":false,"fork":false,"pushed_at":"2018-05-08T14:51:56.000Z","size":159,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-11T19:36:17.928Z","etag":null,"topics":["abstract-syntax-tree","algorithm","expression-evaluator","linear-regression","python-3","python-library","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UniNE-CHYN.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-06T09:05:29.000Z","updated_at":"2018-07-10T10:17:36.000Z","dependencies_parsed_at":"2023-04-28T14:25:52.763Z","dependency_job_id":null,"html_url":"https://github.com/UniNE-CHYN/multilstsq","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniNE-CHYN%2Fmultilstsq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniNE-CHYN%2Fmultilstsq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniNE-CHYN%2Fmultilstsq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UniNE-CHYN%2Fmultilstsq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UniNE-CHYN","download_url":"https://codeload.github.com/UniNE-CHYN/multilstsq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241317693,"owners_count":19943203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-syntax-tree","algorithm","expression-evaluator","linear-regression","python-3","python-library","python3"],"created_at":"2024-11-12T18:18:11.967Z","updated_at":"2025-03-01T05:17:28.822Z","avatar_url":"https://github.com/UniNE-CHYN.png","language":"Python","readme":"MultiLstSq\n==========\n\nLeast squares fitting is a underlying method for numerous applications, the most common one being linear regression. It consists in finding the parameters vector ``β°`` which minimizes ``‖ε‖₂`` in the equation ``y = Xβ + ε``, where `X` is the design matrix, `y` the observation vector, and `ε` the error vector.\n\nSince it is a fundamental algorithm, a number of Python 3 implementations exist, with different feature sets and performance, such as:  `numpy.linalg.lstsq`, `scipy.stats.linregress`, `sklearn.linear_model.LinearRegression` and `statsmodel.OLS`.\n\nHowever, the current available libraries are not designed to work on a large quantity of simultaneous problems, for example solving a least square problem for each pixel of an image. Iterating over a large number of small problems is inefficient. Moreover, when doing linear regression, it is often tedious to build the design matrix `X`.\n\nThe goal of `multilstsq` is to work on arrays of problems, with good performance, low memory requirements, reliability and flexibility. It also provides a way to automate the construction of the relevant structures (mostly the design matrix), using a model given as a string. It however does not strive to be a complete statistical library such as what would be provided by `statsmodel` or the language `R`.\n\nTo reach these goals, `multilstsq` uses the following techniques:\n\n- It is possible to compute ``β°=(XᵀX)⁻¹Xᵀy`` incrementally, due to the linearity of ``XᵀX`` and ``Xᵀy``, by providing data in chunks.\n- Inverting ``XᵀX`` is done by explicit formulas when the dimension is small. This has the advantage of being vector operations which can be applied simultaneously on all problems.\n- Masked data are handled as lines of zeros in the design matrix and the observation, which in fact have no effect. This allows adding different amount of data in different subproblems.\n- For regression, an expression evaluator is implemented, which converts the input model from the user (for example `b0+b1*x0`) into the complex expression needed to build the design matrix from the vector `X` provided by the user. In that example, it is: `np.concatenate([np.ones(o_dim)[(..., np.newaxis)], ((X)[..., :, 0])[(..., np.newaxis)]])`. This expression evaluator also may be useful for other purposes in other libraries.\n\nAs shown in the following figure, this ensures the algorithm has good performance compared to a loop:\n\n![Parallel performance of multilstsq, constant data size.](https://raw.githubusercontent.com/UniNE-CHYN/multilstsq/master/doc/benchmark.png).\n\nExample use\n===========\n\n```python\n\nimport numpy as np\nfrom multilstsq import MultiRegression\n\nx1 = np.array([1.47, 1.50, 1.52, 1.55, 1.57, 1.60, 1.63, 1.65, 1.68, 1.70, 1.73, 1.75, 1.78, 1.80, 1.83])\ny1 = np.array([52.21, 53.12, 54.48, 55.84, 57.20, 58.57, 59.93, 61.29, 63.11, 64.47, 66.28, 68.10, 69.92, 72.19, 74.46])\n\nx2 = np.arange(10)\ny2 = np.arange(10)\n\nX = np.ma.masked_all((2, max(len(x1), len(x2)), 1))\ny = np.ma.masked_all((2, max(len(x1), len(x2)), 1))\n\nX[0, :len(x1), 0] = x1\nX[1, :len(x2), 0] = x2\n\ny[0, :len(y1), 0] = y1\ny[1, :len(y2), 0] = y2\n\nmr = MultiRegression((2,), 'b0 + b1*x0 + b2*(x0**2)')\nmr.add_data(X,y)\nmr.switch_to_variance()\nmr.add_data(X,y)\n\nprint(mr.beta)\n#Identify parameter names in the parameter vector\nprint(mr.beta_names)\n\n#Get the covariance matrix for the first problem\nprint(mr.variance[0])\n\n#Get the expression to predict for the first problem\nexpr = mr.get_expr_for_idx((0,))\n\n#Evaluate at x=1.79\nprint(expr(1.79))\n```\n\nThe nice thing about this module is that it is possible to change the model by only changing the line instanciating the MultiRegression object. For example, for a quadratic regression:\n\n```python\nmr = MultiRegression((2,), 'b0 + b1*x0 + b2*(x0**2)')\n```\n\nDocumentation\n=============\n\nDocumentation is available at http://multilstsq.readthedocs.io/\n\nContributing\n============\n\nPlease post issues and pull requests on github. Alternatively, you can also send your patches by email.\n\nThe following tools are used to ensure good code quality:\n\nTool         | Status\n------------ | -------------\ntravis-ci | [![Build Status](https://travis-ci.org/UniNE-CHYN/multilstsq.svg?branch=master)](https://travis-ci.org/UniNE-CHYN/multilstsq)\nAppVeyor | [![Build status](https://ci.appveyor.com/api/projects/status/38upk18lcu4mogot?svg=true)](https://ci.appveyor.com/project/lfasnacht/multilstsq)\nCoveralls | [![Coverage Status](https://coveralls.io/repos/github/UniNE-CHYN/multilstsq/badge.svg?branch=master)](https://coveralls.io/github/UniNE-CHYN/multilstsq?branch=master)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funine-chyn%2Fmultilstsq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funine-chyn%2Fmultilstsq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funine-chyn%2Fmultilstsq/lists"}