Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/evamaerey/ggsmoothfit

extending stat_smooth to return fitted values and residuals
https://github.com/evamaerey/ggsmoothfit

Last synced: 11 days ago
JSON representation

extending stat_smooth to return fitted values and residuals

Awesome Lists containing this project

README

        

---
output:
github_document:
toc: TRUE
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(tidyverse, warn.conflicts = F)
ggplot2::theme_set(theme_gray(base_size = 18))
```

# ggsmoothfit

{ggsmoothfit} lets you visualize model fitted values and residuals easily!

```{r, eval = T}
library(tidyverse, warn.conflicts = F)
library(ggsmoothfit)
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
ggsmoothfit:::geom_fit() +
ggsmoothfit:::geom_residuals()
```

```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
ggsmoothfit:::geom_smooth_predict(xseq = 2:3, size = 5)

last_plot() +
ggsmoothfit:::geom_smooth_step(xseq = 2:3) +
NULL
```

# intercept

```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(method = lm) +
ggsmoothfit:::geom_smooth_predict(xseq = 0, size = 5, method = lm)
```

# Curious about implementation? Details about building these functions

# Step 00. Create alias of stat_smooth(geom = "point", xseq = ?)

Make it a bit easier for the user to stat_smooth(geom = "point", xseq = ?)

```{r geom_smooth_predict}
geom_smooth_predict <- function(xseq, mapping = NULL, data = NULL, ..., method = NULL, formula = NULL, se = TRUE, method.args = list(), na.rm = FALSE, orientation = NA, show.legend = NA, inherit.aes = TRUE, color = "blue"){

stat_smooth( mapping = mapping, data = data, geom = "point", position = "identity", xseq = xseq, ..., method = method, formula = formula, se = se, method.args = list(), na.rm = na.rm, orientation = orientation, show.legend = show.legend, inherit.aes = inherit.aes, color = color
)

}
```

## test it out...

```{r}
library(tidyverse)
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
geom_smooth_predict(xseq = 2:3, size = 5)

mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(method = lm) +
geom_smooth_predict(xseq = 0,
method = lm,
color = "red",
size = 4)
```

# Step 0. Examine ggplot2::StatSmooth$compute_group, and a dataframe that it returns

Key take away: this function allows you to set values of x with the *xseq argument*. Although the default is to create an evenly spaced sequence.

```{r}
ggplot2::StatSmooth$compute_group %>% capture.output() %>% .[1:10]

library(dplyr)
mtcars %>%
rename(x = wt, y = mpg, cat = am) %>%
ggplot2::StatSmooth$compute_group(method = lm,
formula = y ~ x, n = 7)
```

# Step 1. create compute_group_smooth_fit

Here we'll piggy back on StatSmooth$compute_group, to create a function, compute_group_smooth_fit. We ask that function to compute predictions at the values of x observed in our data set. We also preserve the values of y (as yend) so that we can draw in the residual error.

xend and yend are computed to draw the segments visualizing the error.

```{r compute_group_smooth_fit}
compute_group_smooth_fit <- function(data, scales, method = NULL, formula = NULL,
se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
level = 0.95, method.args = list(),
na.rm = FALSE, flipped_aes = NA){


out <- ggplot2::StatSmooth$compute_group(data = data, scales = scales,
method = method, formula = formula,
se = FALSE, n= n, span = span, fullrange = fullrange,
xseq = data$x,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes)

out$x_obs <- data$x
out$y_obs <- data$y

out$xend <- out$x_obs
out$yend <- out$y_obs

out

}
```

We'll also create compute_group_smooth_sq_error, further piggybacking, this time on the function we just build. This creates the ymin, ymax, xmin and xmax columns needed to show the *squared* error. Initially, I'd included this computation above, but the plot results can be bad, as the 'flags' that come off of the residuals effect the plot spacing even when they aren't used. Preferring to avoid this side-effect, we create two functions (and later two ggproto objects). Note too that xmax is computed in the units of y, and initial plotting can yield squares that do not look like squares. Standardizing both variables, with coord_equal will get us to squares.

```{r compute_group_smooth_sq_error}
compute_group_smooth_sq_error <- function(data, scales, method = NULL, formula = NULL,
se = TRUE, n = 80, span = 0.75, fullrange = FALSE,
level = 0.95, method.args = list(),
na.rm = FALSE, flipped_aes = NA){

compute_group_smooth_fit(data = data, scales = scales,
method = method, formula = formula,
se = FALSE, n= n, span = span, fullrange = fullrange,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes) %>%
dplyr::mutate(ymin = y,
xmin = x,
ymax = yend,
xmax = x + (ymax - ymin))

}
```

# Step 1.1 test compute group

```{r}
mtcars %>%
slice(1:10) %>%
rename(x = wt, y = mpg) %>%
compute_group_smooth_fit(method = lm, formula = y ~ x)

mtcars %>%
slice(1:10) %>%
rename(x = wt, y = mpg) %>%
compute_group_smooth_sq_error(method = lm, formula = y ~ x)
```

# Step 2. Pass to ggproto

```{r ggproto_objects}
StatSmoothFit <- ggplot2::ggproto("StatSmoothFit", ggplot2::Stat,
setup_params = ggplot2::StatSmooth$setup_params,
extra_params = c("na.rm", "orientation"),
compute_group = compute_group_smooth_fit,
dropped_aes = c("weight"),
required_aes = c("x", "y"),
default_aes = ggplot2::aes(xend = after_stat(x_obs), yend = after_stat(y_obs))

)

StatSmoothErrorSq <- ggplot2::ggproto("StatSmoothErrorSq", ggplot2::Stat,
setup_params = ggplot2::StatSmooth$setup_params,
extra_params = c("na.rm", "orientation"),
compute_group = compute_group_smooth_sq_error,
dropped_aes = c("weight"),
required_aes = c("x", "y")
)
```

# Step 3. Pass to stat_*/ geom_ functions

```{r stat_fit}
#' Title
#'
#' @param mapping
#' @param data
#' @param geom
#' @param position
#' @param ...
#' @param method
#' @param formula
#' @param se
#' @param n
#' @param span
#' @param fullrange
#' @param level
#' @param method.args
#' @param na.rm
#' @param orientation
#' @param show.legend
#' @param inherit.aes
#'
#' @return
#' @export
#'
#' @examples
stat_fit <- function(mapping = NULL, data = NULL,
geom = "point", position = "identity",
...,
method = NULL,
formula = NULL,
se = TRUE,
n = 80,
span = 0.75,
fullrange = FALSE,
level = 0.95,
method.args = list(),
na.rm = FALSE,
orientation = NA,
show.legend = NA,
inherit.aes = TRUE) {
ggplot2::layer(
data = data,
mapping = mapping,
stat = StatSmoothFit,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = rlang::list2(
method = method,
formula = formula,
se = se,
n = n,
fullrange = fullrange,
level = level,
na.rm = na.rm,
orientation = orientation,
method.args = method.args,
span = span,
...
)
)
}

geom_fit <- function(...){stat_fit(color = "blue", ...)}
geom_residuals <- function(...){stat_fit(geom = "segment", color = "darkred", ...)}
```

```{r stat_errorsq}
stat_errorsq <- function(mapping = NULL, data = NULL,
geom = "rect", position = "identity",
...,
method = NULL,
formula = NULL,
se = TRUE,
n = 80,
span = 0.75,
fullrange = FALSE,
level = 0.95,
method.args = list(),
na.rm = FALSE,
orientation = NA,
show.legend = NA,
inherit.aes = TRUE) {
ggplot2::layer(
data = data,
mapping = mapping,
stat = StatSmoothErrorSq,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = rlang::list2(
method = method,
formula = formula,
se = se,
n = n,
fullrange = fullrange,
level = level,
na.rm = na.rm,
orientation = orientation,
method.args = method.args,
span = span,
...
)
)
}

```

# Step 4. Test in ggplot2 pipeline and enjoy!

```{r test}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(alpha = .2, se = FALSE) +
stat_fit(color = "blue") + # wrap as geom_smooth_fit()
stat_fit(geom = "segment") # geom_smooth_error()

```

## Squared residuals

For best results, use standardized x, y and coord_equal() as shown below

```{r squaring}
stdz <- function(x){

var_mean <- mean(x)
var_sd <- sd(x)

(x-var_mean)/var_sd

}

last_plot() +
stat_errorsq(geom = "rect", alpha = .1) + # geom_smooth_error_sq() +
aes(stdz(wt), stdz(mpg)) +
coord_equal()
```

# Test with lm

```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(alpha = .2, se = FALSE, method = lm) +
stat_fit(geom = "point", color = "blue", method = lm) + # wrap as geom_smooth_fit()
stat_fit(geom = "segment", method = lm)
```

# Contrast to an empty model...

- show mean of y, residuals and squares (variance)

```{r}
mtcars %>%
ggplot() +
aes(stdz(wt), stdz(mpg)) +
geom_point() +
geom_smooth(alpha = .2, se = FALSE, method = lm, formula = y ~ 1) +
stat_fit(geom = "point", color = "blue", method = lm, formula = y ~ 1) + # wrap as geom_smooth_fit()
stat_fit(geom = "segment", method = lm, formula = y ~ 1) +
stat_errorsq(geom = "rect", alpha = .1, method = lm, formula = y ~ 1) +
coord_equal()

```

# Step 4.b Create geom alliases and wrappers, try it out, and enjoy! Wait not working, how do I need to do this?

```{r, eval = F}
geom_smooth_fit <- function(...){stat_fit(color = "blue", ...)} # wrap as geom_smooth_fit()

geom_smooth_residual <- function(...){stat_fit(geom = "segment", color = "darkred", ...)} # wrap as geom_smooth_fit()

mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(alpha = .2, se = FALSE) +
geom_smooth_fit(color = "blue") + # wrap as geom_smooth_fit()
geom_smooth_residual()
```

# Don't want to use ggsmoothfit? Here are some ways to get it done with base ggplot2!

## Option 1. Verbal description and move on...

"image a line that drops down from the observation to the model line" use vanilla geom_smooth

```{r stat-smooth}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth()
```

## Option 2: precalculate and plot

[stack overflow example goes here.]

## Option 3: little known xseq argument and geom = "point"

First a bit of under-the-hood thinking about geom_smooth/stat_smooth.

```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_smooth(n = 12) +
stat_smooth(geom = "point",
color = "blue",
n = 12)
```

Specify xseq... Almost surely new to you (and probably more interesting to stats instructors): predicting at observed values of x.. Warning, this is off label..

xseq is not advertised, but possibly of interest.. https://ggplot2.tidyverse.org/reference/geom_smooth.html

```{r, warning= F, message=F}
# fit where the support is in the data...
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
stat_smooth(geom = "point", color = "blue", # fitted values
xseq = mtcars$wt) +
stat_smooth(geom = "segment", color = "darkred", # residuals
xseq = mtcars$wt,
xend = mtcars$wt,
yend = mtcars$mpg)
```

# Rise over run viz bonus...

```{r}
mtcars |>
ggplot(aes(wt, mpg)) +
geom_point() +
geom_smooth(method = lm) +
stat_smooth(method = lm,
geom = 'point',
xseq = c(2,3),
size = 3,
color = "blue") +
stat_smooth(method = lm,
geom = "segment", # draw fitted values as points
color = "darkred",
xseq = c(2,3), # 'from', 'to' value pair
aes(yend = after_stat(y[1]), # 'from' value of y
xend = 3), # 'to' value of x
arrow = arrow(ends = c("last", "first"),
length = unit(.1, "in"))) +
labs(title = "For each 1000 lbs increase in weight...")

```

```{r geom_smooth_step}
geom_smooth_step <- function(method = NULL, formula = y ~ x,
color = "darkred", xseq = 0:1){

stat_smooth(method = method,
formula = formula,
geom = "segment", # draw fitted values as points
color = color,
xseq = xseq, # 'from', 'to' value pair
aes(yend = after_stat(y[1]), # 'from' value of y
xend = xseq[2]), # 'to' value of x
arrow = arrow(ends = c("last", "first"),
length = unit(.1, "in")))
}
```

```{r}
mtcars |>
ggplot(aes(wt, mpg)) +
geom_point() +
geom_smooth(method = lm) +
geom_smooth_step(xseq = 2:3)

```

```{r, eval = F, echo = F}
knitr::knit_exit()
```

# Part 2. Packaging and documentation 🚧 ✅

## minimal requirements for github package. Have you:

### Created files for package archetecture with `devtools::create("./ggbarlabs")` ✅

### Moved functions R folder? ✅

```{r}
knitr::knit_code$get() |> names()
```

```{r}
readme2pkg::chunk_to_r(c("geom_smooth_predict",
"geom_smooth_step",
"compute_group_smooth_fit",
"compute_group_smooth_sq_error",
"ggproto_objects",
"stat_fit",
"stat_errorsq"))
```

### Added roxygen skeleton? ✅

for auto documentation and making sure proposed functions are *exported*

### Managed dependencies ? ✅

package dependencies managed, i.e. `depend::function()` in proposed functions and declared in the DESCRIPTION

### Chosen a license? ✅

```{r, eval = F}
usethis::use_package("ggplot2")
usethis::use_mit_license()
```

### Run `devtools::check()` and addressed errors? ✅

## Listen 🚧

### Consulted with technical experts 🚧
### Consulted with potential users 🚧

Getting started with that!

## Polish. Have you.

### Settled on examples and put them in the roxygen skeleton? 🚧

### Written formal tests of functions? 🚧

### Sent tests in this readme to package via readme2pkg 🚧

That would look like this...

```
chunk_to_tests_testthat("test_geom_barlab_count")
```

### Have you worked added a description and author information in the DESCRIPTION file? 🚧

### Addressed *all* notes, warnings and errors. 🚧

## Promote

### Package website built? 🚧

### Package website deployed? 🚧

## Harden

### Submit to CRAN? 🚧

# Reports, Environment

## Description file extract

```{r}

```

## Environment

Here I just want to print the packages and the versions

```{r}
all <- sessionInfo() |> print() |> capture.output()
all[11:17]

```

## `devtools::check()` report

```{r, error = T, eval = F}
# rm(list = c("geom_barlab_count", "geom_barlab_count_percent"))
devtools::check(pkg = ".")
```