{"id":15063963,"url":"https://github.com/tirthajyoti/mlr","last_synced_at":"2025-06-28T23:36:00.384Z","repository":{"id":57442352,"uuid":"199944736","full_name":"tirthajyoti/mlr","owner":"tirthajyoti","description":"Multiple linear regression with statistical inference, residual analysis, direct CSV loading, and other features","archived":false,"fork":false,"pushed_at":"2019-08-12T03:11:02.000Z","size":576,"stargazers_count":33,"open_issues_count":0,"forks_count":10,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-24T10:12:21.314Z","etag":null,"topics":["analytics","data-analytics","data-science","linear-regression","machine-learning","modeling","predictive-modeling","python","regression","scikit-learn","statiscal-learning","statistical-analysis","statistics"],"latest_commit_sha":null,"homepage":"https://mlr.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tirthajyoti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-31T23:55:41.000Z","updated_at":"2024-12-29T03:28:19.000Z","dependencies_parsed_at":"2022-09-26T18:00:41.638Z","dependency_job_id":null,"html_url":"https://github.com/tirthajyoti/mlr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2Fmlr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2Fmlr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2Fmlr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tirthajyoti%2Fmlr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tirthajyoti","download_url":"https://codeload.github.com/tirthajyoti/mlr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248208635,"owners_count":21065203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","data-analytics","data-science","linear-regression","machine-learning","modeling","predictive-modeling","python","regression","scikit-learn","statiscal-learning","statistical-analysis","statistics"],"created_at":"2024-09-25T00:09:25.902Z","updated_at":"2025-04-10T11:26:42.607Z","avatar_url":"https://github.com/tirthajyoti.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mlr (`pip install mlr`)\n\n![top](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/top_image_1.PNG)\n\nA lightweight, easy-to-use Python package that combines the `scikit-learn`-like simple API with the power of **statistical inference tests**, **visual residual analysis**, **outlier visualization**, **multicollinearity test**, found in packages like `statsmodels` and R language.\n\nAuthored and maintained by **Dr. Tirthajyoti Sarkar ([Website](https://tirthajyoti.github.io), [LinkedIn profile](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/))**\n\n### Useful regression metrics,\n* MSE, SSE, SST \n* R^2, Adjusted R^2\n* AIC (Akaike Information Criterion), and BIC (Bayesian Information Criterion)\n\n### Inferential statistics,\n* Standard errors\n* Confidence intervals\n* p-values \n* t-test values \n* F-statistic\n\n### Visual residual analysis,\n* Plots of fitted vs. features, \n* Plot of fitted vs. residuals, \n* Histogram of standardized residuals\n* Q-Q plot of standardized residuals\n\n### Outlier detection\n* Influence plot\n* Cook's distance plot\n\n### Multicollinearity\n* Pairplot\n* Variance infletion factors (VIF)\n* Covariance matrix\n* Correlation matrix\n* Correlation matrix heatmap\n\n## Requirements\n\n* numpy (`pip install numpy`)\n* pandas (`pip install pandas`)\n* matplotlib (`pip install matplotlib`)\n* seaborn (`pip install seaborn`)\n* scipy (`pip install scipy`)\n* statsmodels (`pip install statsmodels`)\n\n## Install\n\n(On Linux and Windows) You can use ``pip``\n\n```pip install mlr```\n\n(On Mac OS), first install pip,\n```\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython get-pip.py\n```\nThen proceed as above.\n\nFor the latest additions, you can always clone this Github repo and run the setup.py.\n\n---\n\n## Quick Start\n\nImport the `MyLinearRegression` class,\n\n```\nfrom mlr.MLR import MyLinearRegression as mlr\nimport numpy as np\n```\n\nGenerate some random data\n\n```\nnum_samples=40\nnum_dim = 5\nX = 10*np.random.random(size=(num_samples,num_dim))\ncoeff = np.array([2,-3.5,1.2,4.1,-2.5])\ny = np.dot(coeff,X.T)+10*np.random.randn(num_samples)\n```\n\nMake a model instance,\n\n```\nmodel = mlr()\n```\n\nIngest the data\n\n```\nmodel.ingest_data(X,y)\n```\n\nFit,\n\n```\nmodel.fit()\n```\n---\n\n## Directly read from a Pandas DataFrame\nYou can read directly from a Pandas DataFrame. Just give the features/predictors' column names as a list and the target column name as a string to the `fit_dataframe` method.\n\nAt this point, only numerical features/targets are supported but in future releases we will support categorical variables too. \n\n```\n\u003c... obtain a Pandas DataFrame by some processing\u003e\ndf = pd.DataFrame(...)\nfeature_cols = ['X1','X2','X3']\ntarget_col = 'output'\n\nmodel = mlr()\nmodel.fit_dataframe(X=feature_cols,y = target_col,dataframe=df)\n```\n\n---\n\n## Metrics\nSo far, it looks similar to the linear regression estimator of Scikit-Learn, doesn't it?\n\u003cbr\u003eHere comes the difference,\n\n### Print all kinds of regression model metrics, one by one,\n\n```\nprint (\"R-squared: \",model.r_squared())\nprint (\"Adjusted R-squared: \",model.adj_r_squared())\nprint(\"MSE: \",model.mse())\n\n\u003e\u003e R-squared:  0.8344327025902752\n   Adjusted R-squared:  0.8100845706182569\n   MSE:  72.2107655649954\n\n```\n\n### Or, print all the metrics at once!\n\n```\nmodel.print_metrics()\n\n\u003e\u003e sse:     2888.4306\n   sst:     17445.6591\n   mse:     72.2108\n   r^2:     0.8344\n   adj_r^2: 0.8101\n   AIC:     296.6986\n   BIC:     306.8319\n```\n---\n\n## Correlation matrix, heatmap, covariance\n\nWe can build the correlation matrix right after ingesting the data. This matrix gives us an indication how much multicollinearity is present among the features/predictors.\n\n### Correlation matrix\n```\nmodel.ingest_data(X,y)\nmodel.corrcoef()\n\n\u003e\u003e array([[ 1.        ,  0.18424447, -0.00207883,  0.144186  ,  0.08678109],\n       [ 0.18424447,  1.        , -0.08098705, -0.05782733,  0.19119872],\n       [-0.00207883, -0.08098705,  1.        ,  0.03602977, -0.17560097],\n       [ 0.144186  , -0.05782733,  0.03602977,  1.        ,  0.05216212],\n       [ 0.08678109,  0.19119872, -0.17560097,  0.05216212,  1.        ]])\n```\n\n### Covariance\n\n```\nmodel.covar()\n\n\u003e\u003e array([[10.28752086,  1.51237819, -0.01770701,  1.47414685,  0.79121778],\n       [ 1.51237819,  6.54969628, -0.5504233 , -0.47174359,  1.39094876],\n       [-0.01770701, -0.5504233 ,  7.05247111,  0.30499622, -1.32560195],\n       [ 1.47414685, -0.47174359,  0.30499622, 10.16072256,  0.47264283],\n       [ 0.79121778,  1.39094876, -1.32560195,  0.47264283,  8.08036806]])\n```\n\n### Correlation heatmap\n\n```\nmodel.corrplot(cmap='inferno',annot=True)\n```\n![corrplot](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/corrplot1.PNG)\n\n## Statistical inference\n\n### Perform the F-test of overall significance\nIt retunrs the F-statistic and the p-value of the test. \n\nIf the p-value is a small number you can reject the Null hypothesis that all the regression coefficient is zero. That means a small p-value (generally \u003c 0.01) indicates that the overall regression is statistically significant.\n```\nmodel.ftest()\n\n\u003e\u003e (34.270912591948814, 2.3986657277649282e-12)\n```\n\n### How about p-values, t-test statistics, and standard errors of the coefficients?\nStandard errors and corresponding t-tests give us the p-values for each regression coefficient, which tells us whether that particular coefficient is statistically significant or not (based on the given data).\n\n```\nprint(\"P-values:\",model.pvalues())\nprint(\"t-test values:\",model.tvalues())\nprint(\"Standard errors:\",model.std_err())\n\n\u003e\u003e P-values: [8.33674608e-01 3.27039586e-03 3.80572234e-05 2.59322037e-01 9.95094748e-11 2.82226752e-06]\n   t-test values: [ 0.21161008  3.1641696  -4.73263963  1.14716519  9.18010412 -5.60342256]\n   Standard errors: [5.69360847 0.47462621 0.59980706 0.56580141 0.47081187 0.5381103 ]\n\n```\n\n### Confidence intervals\n```\nmodel.conf_int()\n\n\u003e\u003e array([[-10.36597959,  12.77562953],\n       [  0.53724132,   2.46635435],\n       [ -4.05762528,  -1.61971606],\n       [ -0.50077913,   1.79891449],\n       [  3.36529718,   5.27890687],\n       [ -4.10883113,  -1.92168771]])\n\n```\n\n## Visual analysis of the residuals\nResidual analysis is crucial to check the assumptions of a linear regression model. `mlr` helps you check those assumption easily by providing straight-forward visual analytis methods for the residuals.\n\n### Fitted vs. residuals plot\nCheck the assumption of constant variance and uncorrelated features (independence) with this plot\n```\nmodel.fitted_vs_residual()\n```\n![fit_vs_resid](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/fitted_vs_residuals.PNG)\n\n### Fitted vs features plot\nCheck the assumption of linearity with this plot\n```\nmodel.fitted_vs_features()\n```\n![fit_vs_features](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/fitted_vs_features.PNG)\n\n### Histogram and Q-Q plot of standardized residuals\nCheck the normality assumption of the error terms using these plots,\n```\nmodel.histogram_resid()\n```\n![hist_resid](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/hist_resid.PNG)\n\u003cbr\u003e\n```\nmodel.qqplot_resid()\n```\n![](https://raw.githubusercontent.com/tirthajyoti/mlr/master/images/QQ_plot_resid.PNG)\n\n## Do more\n\nDo more fun stuff with your regression model.\nMore features will be added in the future releases!\n\n* Outlier detection and plots\n* Multicollinearity checks\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftirthajyoti%2Fmlr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftirthajyoti%2Fmlr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftirthajyoti%2Fmlr/lists"}