{"id":18430155,"url":"https://github.com/friendly/mvinfluence","last_synced_at":"2025-09-18T22:42:09.881Z","repository":{"id":56937268,"uuid":"128774860","full_name":"friendly/mvinfluence","owner":"friendly","description":"Influence Measures and Diagnostic Plots for Multivariate Linear Models","archived":false,"fork":false,"pushed_at":"2025-08-08T19:39:15.000Z","size":5613,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-09-12T00:18:16.574Z","etag":null,"topics":["multivariate-analysis","multivariate-linear-regression","r","r-package","statistics","visualization"],"latest_commit_sha":null,"homepage":"https://friendly.github.io/mvinfluence/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/friendly.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-04-09T13:19:03.000Z","updated_at":"2025-08-08T19:39:19.000Z","dependencies_parsed_at":"2025-08-03T06:45:23.372Z","dependency_job_id":null,"html_url":"https://github.com/friendly/mvinfluence","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/friendly/mvinfluence","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2Fmvinfluence","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2Fmvinfluence/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2Fmvinfluence/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2Fmvinfluence/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/friendly","download_url":"https://codeload.github.com/friendly/mvinfluence/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2Fmvinfluence/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275845028,"owners_count":25538995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-18T02:00:09.552Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["multivariate-analysis","multivariate-linear-regression","r","r-package","statistics","visualization"],"created_at":"2024-11-06T05:19:45.796Z","updated_at":"2025-09-18T22:42:09.842Z","avatar_url":"https://github.com/friendly.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file and knit again --\u003e\n\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  warning = FALSE,   # avoid warnings and messages in the output\n  message = FALSE,\n  collapse = TRUE,\n  fig.width = 5,\n  fig.height = 5,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\"\n)\n\npar(mar=c(3,3,1,1)+.1)\noptions(digits=3)\n```\n\n```{r, echo=FALSE}\nlibrary(mvinfluence)\n```\n\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/mvinfluence)](https://cran.r-project.org/package=mvinfluence)\n[![R_Universe](https://friendly.r-universe.dev/badges/mvinfluence)](https://friendly.r-universe.dev/mvinfluence)\n[![Last Commit](https://img.shields.io/github/last-commit/friendly/mvinfluence)](https://github.com/friendly/mvinfluence/)\n[![](http://cranlogs.r-pkg.org/badges/grand-total/mvinfluence)](https://cran.r-project.org/package=mvinfluence)\n[![DOI](https://zenodo.org/badge/128774860.svg)](https://zenodo.org/badge/latestdoi/128774860)\n[![pkgdown](https://img.shields.io/badge/pkgdown%20site-blue)](https://friendly.github.io/mvinfluence/)\n\n\n\n# mvinfluence \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"200px\" /\u003e\n\n**Influence Measures and Diagnostic Plots for Multivariate Linear Models**\n\n\u003c!-- Version 0.9.3 --\u003e\nVersion `r getNamespaceVersion(\"mvinfluence\")`\n\n\nFunctions in this package compute regression deletion diagnostics for multivariate linear models following\nmethods proposed by  Barrett \u0026 Ling (1992)\nand provide some associated\ndiagnostic plots.  The diagnostic measures include hat-values (leverages), generalized Cook's distance, and\ngeneralized squared 'studentized' residuals.  Several types of plots to detect influential observations are\nprovided.\n\nIn addition, the functions provide diagnostics for deletion of\nsubsets of observations of size `m\u003e1`. This case is theoretically interesting\nbecause sometimes pairs (`m=2`) of influential observations can mask each other,\nsometimes they can have joint influence far exceeding their individual effects,\nas well as other interesting phenomena described by Lawrence (1995).\nAssociated methods for the case\n`m\u003e1` are still under development in this package.\n\n## Documentation\nDocumentation for the package is now available at [https://friendly.github.io/mvinfluence/](https://friendly.github.io/mvinfluence/).\n\n\n## Installation\n\nGet the released CRAN version or the development version, here or [R-universe](https://friendly.r-universe.dev)\n\n|                     |                                                                                |\n|---------------------|--------------------------------------------------------------------------------|\n| CRAN version        | `install.packages(\"mvinfluence\")`                                              |\n| R-universe          | `install.packages(\"mvinfluence\", repos = c('https://friendly.r-universe.dev')` |\n| Development version | `remotes::install_github(\"friendly/mvinfluence\")`                              |\n\n## Goals\n\nThe design goal for this package is that, as an extension of standard methods for univariate linear models, you should be able to fit a linear model with a **multivariate** response,\n\n    mymlm \u003c- lm( cbind(y1, y2, y3) ~ x1 + x2 + x3, data=mydata)\n\nand then get useful diagnostics and plots with:\n\n    influence(mymlm)\n    hatvalues(mymlm)\n    cooks.distance(mymlm)\n    influencePlot(mymlm, ...)  \n\nAs is done in comparable univariate functions in the `car` package, *noteworthy* points are identified in printed output and graphs.\n\n\n## Examples\n\nThe `Rohwer` data contains data on kindergarten children designed to examine how well performance on a set of paired-associate (PA) \nlearning tasks can predict performance on some measures of aptitude and achievement---\n`SAT` (a scholastic aptitude test),\n`PPVT` (Peabody Picture Vocabulary Test), and \n`Raven` ( Raven Progressive Matrices Test). The PA tasks differ in how the stimulus item was presented:\n`n` (named), \n`s` (still), \n`ns` (named still), \n`na` (named action) and \n`ss` (sentence still).\n\nHere, we fit a MLM to a subset of the Rohwer data (the Low SES group). \n\n```{r rohwer1}\ndata(Rohwer, package=\"heplots\")\nRohwer2 \u003c- subset(Rohwer, subset=group==2)\nrownames(Rohwer2)\u003c- 1:nrow(Rohwer2)\nRohwer.mod \u003c- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)\n\ncar::Anova(Rohwer.mod)\n```\n\n### Influence plots\n\nThe default influence plot (`type=\"stres\"`) shows the squared standardized residual against the Hat value. The areas of the circles representing the observations are proportional to generalized Cook's distances.\n\n```{r rohwer2}\n(infl \u003c-influencePlot(Rohwer.mod, id.n=4, type = \"stres\"))\n```\n\nAs you can see above,\nthe function returns a data frame of the influence statistics for the identified points. \"Noteworthy\" points are those that are unusual on *either* Hat value (H) or the squared studentized residual (Q), so more points will be shown than the `id.n` value. It is often more useful to sort these in descending order by one of the influence measures.\n\n```{r infl}\ninfl |\u003e dplyr::arrange(desc(H))\n```\n\nAn alternative (`type=\"LR\"`) plots residual components against leverage components, both on log scales. Because influence is a product of residual $\\times$ Leverage, this plot had the property that contours of constant Cook's distance fall on diagonal lines with slope = -1. Each successive dashed line represents \na **multiple** of Cook's D.\nThis plot is often easier to read than the standard version. \n\n```{r rohwer3}\ninfluencePlot(Rohwer.mod, id.n=4, type=\"LR\")\n```\n\nWe observe that case 5 has the largest leverage and it is highly influential.\nCase 25 has the largest residual component and middling leverage, so it is moderately influential.\nCases 14, 29, 27 have nearly identical residuals, and their influence increases from left to right\nwith leverage.\n\n### Index plots\nIf you wish to see how the observations fare on each of the the measures (as well as Mahalanobis $D^2$ of the residuals from the origin), \nthe `inflIndexPlot()` function gives you index plots. \n\nThere are extensive options for identifying and labeling \"noteworthy\"\nobservations, with various methods. These rely on `car::showLabels()`, where the default `id.method = \"y\"` label points whose\nY coordinate is very large.\n\n```{r indexplot, fig.width=9}\ninfIndexPlot(Rohwer.mod, \n             id.n=3, id.col = \"red\", id.cex=1.5, id.location=\"ab\")\n```\n\nIn this example, note that while case 5 stands out as influential, it does not have an exceptionally large Mahalanobis\nsquared distance, $D^2$ of the residuals.\n\n# Robust MLMs\n\nInfluential cases and those with large residuals can sometimes be dealt with by fitting a **robust** version of\nthe multivariate model.  The function `heplots::robmlm()` uses a simple M-estimator that down-weights cases\nwith large residuals. Fitting is done by iterated re-weighted least squares (IWLS), using weights based on the Mahalanobis squared distances of the current residuals from the origin, and a scaling (covariance) matrix calculated by `MASS::cov.trob()`.\n\n```{r robmlm1}\nRohwer.rmod \u003c- heplots::robmlm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, \n                               data=Rohwer2)\n```\n\nThe returned object has a `weights` component, the weight for each case in the final iteration. Which ones are less than 0.9 here?\n```{r rob-weights}\nwhich(Rohwer.rmod$weights \u003c .9)\n```\n\nA simple index plot makes the down-weighted observations stand out. Case 5 is not among them, but I label it anyway.\n```{r rob-index-plot, fig.width=8, fig.height=4}\npar(mar = c(4,4,1,1)+.1)\nwts \u003c- Rohwer.rmod$weights\nidx \u003c- c(5, which(wts \u003c .9))\nplot(wts, type=\"h\",\n     xlab = \"Case index\", \n     ylab = \"Robust mlm weight\",\n     cex.lab = 1.25)\nrect(0, .9, 33, 1.1, \n     col=scales::alpha(\"gray\", .25), \n     border=NA)\npoints(wts, pch = 16, \n       cex = ifelse(wts \u003c .9, 1.5, 1),\n       col = ifelse(wts \u003c .9, \"red\", \"black\"))\ntext(idx, wts[idx], label=idx, pos=3, cex=1.2, xpd=NA )\n```\n\nWhat's up with case 5? It had the largest leverage, but it's Mahalanobis $D^2$ was not large.\nThus, it was not down-weighted, even though it is an influential observation.\n\nWhat difference do these observations make in the fitted regression? This calculates the percentage relative difference\nbetween the coefficients in the standard `lm()` and the robust version.  The largest changes are for the coefficients\nof the `ss` task, but there is an even greater one for `PPVT` on the `n` task.\n```{r}\n100 * abs(coef(Rohwer.mod) - coef(Rohwer.rmod)) / coef(Rohwer.mod)\n```\n\n## Citation\nTo cite `mvinfluence` in publications, use:\n```{r citation}\ncitation(\"mvinfluence\")\n```\n\n\n## References\n\nBarrett, B. E. and Ling, R. F. (1992).\nGeneral Classes of Influence Measures for Multivariate Regression.\n*Journal of the American Statistical Association*, **87**(417), 184-191.\n\nBarrett, B. E. (2003). Understanding Influence in Multivariate Regression.\n*Communications in Statistics -- Theory and Methods*, **32**, 3, 667-680.\n\nLawrence, A. J. (1995). Deletion Influence and Masking in Regression.\n*Journal of the Royal Statistical Society. Series B (Methodological)* , **57**, No. 1, pp. 181-189. \n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendly%2Fmvinfluence","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffriendly%2Fmvinfluence","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendly%2Fmvinfluence/lists"}