https://github.com/friendly/mvinfluence

Influence Measures and Diagnostic Plots for Multivariate Linear Models
https://github.com/friendly/mvinfluence

multivariate-analysis multivariate-linear-regression r r-package statistics visualization

Last synced: 3 months ago
JSON representation

Influence Measures and Diagnostic Plots for Multivariate Linear Models

Host: GitHub
URL: https://github.com/friendly/mvinfluence
Owner: friendly
Created: 2018-04-09T13:19:03.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-09-22T22:16:05.000Z (almost 3 years ago)
Last Synced: 2025-04-01T21:55:53.374Z (3 months ago)
Topics: multivariate-analysis, multivariate-linear-regression, r, r-package, statistics, visualization
Language: R
Homepage: https://friendly.github.io/mvinfluence/
Size: 4.73 MB
Stars: 2
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  warning = FALSE,   # avoid warnings and messages in the output

  message = FALSE,

  collapse = TRUE,

  fig.width = 5,

  fig.height = 5,

  comment = "#>",

  fig.path = "man/figures/README-"

)

par(mar=c(3,3,1,1)+.1)

options(digits=3)

```

```{r, echo=FALSE}

library(mvinfluence)

```

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/mvinfluence)](https://cran.r-project.org/package=mvinfluence)

[![](http://cranlogs.r-pkg.org/badges/grand-total/mvinfluence)](https://cran.r-project.org/package=mvinfluence)

[![DOI](https://zenodo.org/badge/128774860.svg)](https://zenodo.org/badge/latestdoi/128774860)

# mvinfluence 

**Influence Measures and Diagnostic Plots for Multivariate Linear Models**

Version 0.9-1

Functions in this package compute regression deletion diagnostics for multivariate linear models following

methods proposed by  Barrett & Ling (1992)

and provide some associated

diagnostic plots.  The diagnostic measures include hat-values (leverages), generalized Cook's distance, and

generalized squared 'studentized' residuals.  Several types of plots to detect influential observations are

provided.

In addition, the functions provide diagnostics for deletion of

subsets of observations of size `m>1`. This case is theoretically interesting

because sometimes pairs (`m=2`) of influential observations can mask each other,

sometimes they can have joint influence far exceeding their individual effects,

as well as other interesting phenomena described by Lawrence (1995).

Associated methods for the case

`m>1` are still under development in this package.

## Documentation

Documentation for the package is now available at [https://friendly.github.io/mvinfluence/](https://friendly.github.io/mvinfluence/).

## Installation

Get the released CRAN version or this development version:

|                     |                                                   |

|---------------------|---------------------------------------------------|

| CRAN version        | `install.packages("mvinfluence")`                 |

| Development version | `remotes::install_github("friendly/mvinfluence")` |

## Goals

The design goal for this package is that, as an extension of standard methods for univariate linear models, you should be able to fit a linear model with a **multivariate** response,

    mymlm <- lm( cbind(y1, y2, y3) ~ x1 + x2 + x3, data=mydata)

and then get useful diagnostics and plots with:

    influence(mymlm)

    hatvalues(mymlm)

    cooks.distance(mymlm)

    influencePlot(mymlm, ...)  

As is done in comparable univariate functions in the `car` package, *noteworthy* points are identified in printed output and graphs.

## Examples

The `Rohwer` data contains data on kindergarten children designed to examine how well performance on a set of paired-associate (PA) 

learning tasks can predict performance on some measures of aptitude and achievement---

`SAT` (a scholastic aptitude test),

`PPVT` (Peabody Picture Vocabulary Test), and 

`Raven` ( Raven Progressive Matrices Test). The PA tasks differ in how the stimulus item was presented:

`n` (named), 

`s` (still), 

`ns` (named still), 

`na` (named action) and 

`ss` (sentence still).

Here, we fit a MLM to a subset of the Rohwer data (the Low SES group). 

```{r rohwer1}

data(Rohwer, package="heplots")

Rohwer2 <- subset(Rohwer, subset=group==2)

rownames(Rohwer2)<- 1:nrow(Rohwer2)

Rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)

car::Anova(Rohwer.mod)

```

### Influence plots

The default influence plot (`type="stres"`) shows the squared standardized residual against the Hat value. The areas of the circles representing the observations are proportional to generalized Cook's distances.

```{r rohwer2}

(infl <-influencePlot(Rohwer.mod, id.n=4, type = "stres"))

```

As you can see above,

the function returns a data frame of the influence statistics for the identified points. "Noteworthy" points are those that are unusual on *either* Hat value (H) or the squared studentized residual (Q), so more points will be shown than the `id.n` value. It is often more useful to sort these in descending order by one of the influence measures.

```{r infl}

infl |> dplyr::arrange(desc(H))

```

An alternative (`type="LR"`) plots residual components against leverage components, both on log scales. Because influence is a product of residual $\times$ Leverage, this plot had the property that contours of constant Cook's distance fall on diagonal lines with slope = -1. Each successive dashed line represents 

a **multiple** of Cook's D.

This plot is often easier to read than the standard version. 

```{r rohwer3}

influencePlot(Rohwer.mod, id.n=4, type="LR")

```

We observe that case 5 has the largest leverage and it is highly influential.

Case 25 has the largest residual component and middling leverage, so it is moderately influential.

Cases 14, 29, 27 have nearly identical residuals, and their influence increases from left to right

with leverage.

### Index plots

If you wish to see how the observations fare on each of the the measures (as well as Mahalanobis $D^2$ of the residuals from the origin), 

the `inflIndexPlot()` function gives you index plots. 

There are extensive options for identifying and labeling "noteworthy"

observations, with various methods. These rely on `car::showLabels()`, where the default `id.method = "y"` label points whose

Y coordinate is very large.

```{r indexplot, fig.width=9}

infIndexPlot(Rohwer.mod, 

             id.n=3, id.col = "red", id.cex=1.5, id.location="ab")

```

In this example, note that while case 5 stands out as influential, it does not have an exceptionally large Mahalanobis

squared distance, $D^2$ of the residuals.

# Robust MLMs

Influential cases and those with large residuals can sometimes be dealt with by fitting a **robust** version of

the multivariate model.  The function `heplots::robmlm()` uses a simple M-estimator that down-weights cases

with large residuals. Fitting is done by iterated re-weighted least squares (IWLS), using weights based on the Mahalanobis squared distances of the current residuals from the origin, and a scaling (covariance) matrix calculated by `MASS::cov.trob()`.

```{r robmlm1}

Rohwer.rmod <- heplots::robmlm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 

                               data=Rohwer2)

```

The returned object has a `weights` component, the weight for each case in the final iteration. Which ones are less than 0.9 here?

```{r rob-weights}

which(Rohwer.rmod$weights < .9)

```

A simple index plot makes the down-weighted observations stand out. Case 5 is not among them, but I label it anyway.

```{r rob-index-plot, fig.width=8, fig.height=4}

par(mar = c(4,4,1,1)+.1)

wts <- Rohwer.rmod$weights

idx <- c(5, which(wts < .9))

plot(wts, type="h",

     xlab = "Case index", 

     ylab = "Robust mlm weight",

     cex.lab = 1.25)

rect(0, .9, 33, 1.1, 

     col=scales::alpha("gray", .25), 

     border=NA)

points(wts, pch = 16, 

       cex = ifelse(wts < .9, 1.5, 1),

       col = ifelse(wts < .9, "red", "black"))

text(idx, wts[idx], label=idx, pos=3, cex=1.2, xpd=NA )

```

What's up with case 5? It had the largest leverage, but it's Mahalanobis $D^2$ was not large.

Thus, it was not down-weighted, even though it is an influential observation.

What difference do these observations make in the fitted regression? This calculates the percentage relative difference

between the coefficients in the standard `lm()` and the robust version.  The largest changes are for the coefficients

of the `ss` task, but there is an even greater one for `PPVT` on the `n` task.

```{r}

100 * abs(coef(Rohwer.mod) - coef(Rohwer.rmod)) / abs(coef(Rohwer.mod))

```

## Citation

To cite `mvinfluence` in publications, use:

```{r citation}

citation("mvinfluence")

```

## References

Barrett, B. E. and Ling, R. F. (1992).

General Classes of Influence Measures for Multivariate Regression.

*Journal of the American Statistical Association*, **87**(417), 184-191.

Barrett, B. E. (2003). Understanding Influence in Multivariate Regression.

*Communications in Statistics -- Theory and Methods*, **32**, 3, 667-680.

Lawrence, A. J. (1995). Deletion Influence and Masking in Regression.

*Journal of the Royal Statistical Society. Series B (Methodological)* , **57**, No. 1, pp. 181-189.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/friendly/mvinfluence

Awesome Lists containing this project

README