https://github.com/btskinner/rmargins

Stata-like margins in R by hand
https://github.com/btskinner/rmargins

margins r stata

Last synced: about 1 year ago
JSON representation

Stata-like margins in R by hand

Host: GitHub
URL: https://github.com/btskinner/rmargins
Owner: btskinner
Created: 2019-10-29T15:07:05.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-10-29T15:41:03.000Z (over 6 years ago)
Last Synced: 2025-01-17T15:44:51.646Z (about 1 year ago)
Topics: margins, r, stata
Language: R
Size: 31.3 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          Notes

=====

The main file, `margins.R`, shows how to manually compute Stata-like

margins in R in the context of logistic regression. It’s mostly just to

show the intuition underlying Stata’s `-margins-` command, but you can

use the results to make nice margins figures with ggplot. Output from R

can be checked in Stata with `margins_check.do` and `fake_data.csv` can

be recreated with `make_fake_data.R`.

For a more complete suite of ready-to-go commands, there’s the

[`margins`](https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html)

R package.

Steps

-----

### Run logistic regression

    ## read in fake data

    df <- read.csv('./fake_data.csv')

    ## run logit

    mod <- glm(y ~ x1 + x2 + x3 + x4, data = df, family = binomial(link = 'logit'))

    summary(mod)

    ## 

    ## Call:

    ## glm(formula = y ~ x1 + x2 + x3 + x4, family = binomial(link = "logit"), 

    ##     data = df)

    ## 

    ## Deviance Residuals: 

    ##      Min        1Q    Median        3Q       Max  

    ## -2.54441  -0.28561  -0.05508   0.15145   2.72244  

    ## 

    ## Coefficients:

    ##             Estimate Std. Error z value Pr(>|z|)    

    ## (Intercept)   0.7359     0.2081   3.536 0.000406 ***

    ## x1            1.7795     0.1706  10.432  < 2e-16 ***

    ## x2           -3.5876     0.2732 -13.131  < 2e-16 ***

    ## x3           -4.6119     0.3689 -12.501  < 2e-16 ***

    ## x4            1.5467     0.2983   5.185 2.16e-07 ***

    ## ---

    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    ## 

    ## (Dispersion parameter for binomial family taken to be 1)

    ## 

    ##     Null deviance: 1275.33  on 999  degrees of freedom

    ## Residual deviance:  479.14  on 995  degrees of freedom

    ## AIC: 489.14

    ## 

    ## Number of Fisher Scoring iterations: 7

### Margins for unit change in binary variable (`x3`)

    ## (1) get model matrix from glm() object

    mm <- model.matrix(mod)

    head(mm)

    ##   (Intercept)         x1         x2 x3 x4

    ## 1           1  0.2839260  1.2895032  1  0

    ## 2           1  1.3495918 -2.0475880  0  0

    ## 3           1  0.4017083  0.8771911  1  0

    ## 4           1 -2.0652666  0.7446761  1  0

    ## 5           1  0.5624508  0.2748494  1  0

    ## 6           1 -0.1020731 -1.6143429  0  1

    ## (2) drop intercept column of ones b/c we don't need it

    mm <- mm[,-1]

    head(mm)

    ##           x1         x2 x3 x4

    ## 1  0.2839260  1.2895032  1  0

    ## 2  1.3495918 -2.0475880  0  0

    ## 3  0.4017083  0.8771911  1  0

    ## 4 -2.0652666  0.7446761  1  0

    ## 5  0.5624508  0.2748494  1  0

    ## 6 -0.1020731 -1.6143429  0  1

    ## (3) convert to data.frame to make life easier

    df_mm <- as.data.frame(mm)

### VERSION 1: all other variables `-atmeans-`

**NB: this should be equivalent to Stata `margins x3, atmeans`**

    ## (4) make "new data" where # rows == # margins for key var, averages elsewhere

    new_df <- data.frame(x1 = mean(df_mm$x1),

                         x2 = mean(df_mm$x2),

                         x3 = c(0,1),       # two margins, 0/1, for x3

                         x4 = mean(df_mm$x4))

    new_df

    ##           x1          x2 x3    x4

    ## 1 0.05914387 -0.03310865  0 0.193

    ## 2 0.05914387 -0.03310865  1 0.193

    ## (5) use predict() with new data, setting type to get probs

    pp <- predict(mod, newdata = new_df, se.fit = TRUE, type = 'response')

    pp

    ## $fit

    ##          1          2 

    ## 0.77876250 0.03378329 

    ## 

    ## $se.fit

    ##          1          2 

    ## 0.03568211 0.00822396 

    ## 

    ## $residual.scale

    ## [1] 1

    ## check difference (Stata: -margins, dydx(x3) atmeans-)

    pp$fit[2] - pp$fit[1]

    ##          2 

    ## -0.7449792

### VERSION 2: `x4 == 1`, others `-atmeans-`

**NB: this should be equivalent to Stata

`margins x3, at(x4 = 1) atmeans`**

    ## (4) make "new data" where # rows == # margins for key var, averages elsewhere

    new_df <- data.frame(x1 = mean(df_mm$x1),

                         x2 = mean(df_mm$x2),

                         x3 = c(0,1),       # two margins, 0/1, for x3

                         x4 = 1)            # x4 == 1

    new_df

    ##           x1          x2 x3 x4

    ## 1 0.05914387 -0.03310865  0  1

    ## 2 0.05914387 -0.03310865  1  1

    ## (5) use predict() with new data, setting type to get probs

    pp <- predict(mod, newdata = new_df, se.fit = TRUE, type = 'response')

    pp

    ## $fit

    ##         1         2 

    ## 0.9246054 0.1085866 

    ## 

    ## $se.fit

    ##          1          2 

    ## 0.02277638 0.02804731 

    ## 

    ## $residual.scale

    ## [1] 1

### Margins for unit change in continuous variable (`x1`)

**NB: this should be equivalent to Stata

`margins, at(x1 = (-4(1)4)) atmeans`**

    ## get idea of range

    summary(df$x1)

    ##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 

    ## -2.69344 -0.57129  0.04494  0.05914  0.70844  3.21878

    ## (4) make "new data" where # rows == # margins for key var, averages elsewhere

    new_df <- data.frame(x1 = seq(from = -4, to = 4, by = 1),

                         x2 = mean(df_mm$x2),

                         x3 = mean(df_mm$x3),

                         x4 = mean(df_mm$x4))

    new_df

    ##   x1          x2    x3    x4

    ## 1 -4 -0.03310865 0.714 0.193

    ## 2 -3 -0.03310865 0.714 0.193

    ## 3 -2 -0.03310865 0.714 0.193

    ## 4 -1 -0.03310865 0.714 0.193

    ## 5  0 -0.03310865 0.714 0.193

    ## 6  1 -0.03310865 0.714 0.193

    ## 7  2 -0.03310865 0.714 0.193

    ## 8  3 -0.03310865 0.714 0.193

    ## 9  4 -0.03310865 0.714 0.193

    ## (5) use predict() with new data, setting type to get probs

    pp <- predict(mod, newdata = new_df, se.fit = TRUE, type = 'response')

    pp

    ## $fit

    ##            1            2            3            4            5 

    ## 9.538101e-05 5.650291e-04 3.339462e-03 1.947163e-02 1.053009e-01 

    ##            6            7            8            9 

    ## 4.109116e-01 8.052238e-01 9.607867e-01 9.931607e-01 

    ## 

    ## $se.fit

    ##            1            2            3            4            5 

    ## 7.674447e-05 3.599678e-04 1.572884e-03 5.992871e-03 1.683050e-02 

    ##            6            7            8            9 

    ## 3.710458e-02 4.245440e-02 1.603947e-02 4.003603e-03 

    ## 

    ## $residual.scale

    ## [1] 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/btskinner/rmargins

Awesome Lists containing this project

README