https://github.com/evamaerey/ggsmoothfit
extending stat_smooth to return fitted values and residuals
https://github.com/evamaerey/ggsmoothfit
Last synced: 4 months ago
JSON representation
extending stat_smooth to return fitted values and residuals
- Host: GitHub
- URL: https://github.com/evamaerey/ggsmoothfit
- Owner: EvaMaeRey
- License: other
- Created: 2023-09-11T19:46:49.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-18T18:34:54.000Z (4 months ago)
- Last Synced: 2025-03-18T19:44:28.192Z (4 months ago)
- Language: R
- Homepage:
- Size: 1.68 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output:
github_document:
toc: TRUE
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(tidyverse, warn.conflicts = F)
ggplot2::theme_set(theme_gray(base_size = 18))
```# ggsmoothfit
The goal of {ggsmoothfit} is to let you visualize model fitted values and residuals easily!
``` r
library(tidyverse, warn.conflicts = F)
library(ggsmoothfit)
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
ggsmoothfit:::geom_fit() +
ggsmoothfit:::geom_residuals() +
ggsmoothfit:::geom_smooth_fit(xseq = 2:3, size = 5) +
ggsmoothfit:::geom_smooth_step(xseq = 2:3) +
ggsmoothfit:::geom_smooth_fit(xseq = 0, size = 5, method = lm)
```# Let's build this functionality
# Step 0. Examine ggplot2::StatSmooth$compute_group, and a dataframe that it returns
Key take away: this function allows you to set values of x with the *xseq argument*. Although the default is to create an evenly spaced sequence.
```{r}
ggplot2::StatSmooth$compute_group %>% capture.output() %>% .[1:10]library(dplyr)
mtcars %>%
rename(x = wt, y = mpg, cat = am) %>%
ggplot2::StatSmooth$compute_group(method = lm,
formula = y ~ x, n = 7)
```# Step 1. create compute_group_smooth_fit
Here we'll piggy back on StatSmooth$compute_group, to create a function, compute_group_smooth_fit. We ask that function to compute predictions at the values of x observed in our data set. We also preserve the values of y (as yend) so that we can draw in the residual error.
xend and yend are computed to draw the segments visualizing the error.
```{r compute_group_smooth_fit}
compute_group_smooth_fit <- function(data, scales, method = NULL, formula = NULL,
xseq = NULL,
level = 0.95, method.args = list(),
na.rm = FALSE, flipped_aes = NA){
if(is.null(xseq)){ # predictions based on observationsggplot2::StatSmooth$compute_group(data = data, scales = scales,
method = method, formula = formula,
se = FALSE, n= 80, span = 0.75, fullrange = FALSE,
xseq = data$x,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes) |>
dplyr::mutate(xend = data$x,
yend = data$y)
}else{ # predict specific input values
ggplot2::StatSmooth$compute_group(data = data, scales = scales,
method = method, formula = formula,
se = FALSE, n= 80, span = 0.75, fullrange = FALSE,
xseq = xseq,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes)
}
}
``````{r}
mtcars %>%
slice(1:10) %>%
rename(x = wt, y = mpg) %>%
compute_group_smooth_fit(method = lm, formula = y ~ x)
```We'll also create compute_group_smooth_sq_error, further piggybacking, this time on the function we just build. This creates the ymin, ymax, xmin and xmax columns needed to show the *squared* error. Initially, I'd included this computation above, but the plot results can be bad, as the 'flags' that come off of the residuals effect the plot spacing even when they aren't used. Preferring to avoid this side-effect, we create two functions (and later two ggproto objects). Note too that xmax is computed in the units of y, and initial plotting can yield squares that do not look like squares. Standardizing both variables, with coord_equal will get us to squares.
```{r compute_group_smooth_sq_error}
compute_group_smooth_sq_error <- function(data, scales, method = NULL,
formula = NULL,
level = 0.95, method.args = list(),
na.rm = FALSE, flipped_aes = NA){
compute_group_smooth_fit(data = data, scales = scales,
method = method, formula = formula,
level = .95, method.args = method.args,
na.rm = na.rm, flipped_aes = flipped_aes) %>%
dplyr::mutate(ymin = y,
xmin = x,
ymax = yend,
xmax = x + (ymax - ymin))
}
```# Step 1.1 test compute group
```{r}
mtcars %>%
slice(1:10) %>%
rename(x = wt, y = mpg) %>%
compute_group_smooth_fit(method = lm, formula = y ~ x)mtcars %>%
slice(1:10) %>%
rename(x = wt, y = mpg) %>%
compute_group_smooth_sq_error(method = lm, formula = y ~ x)
```# Step 2. Pass to ggproto
```{r ggproto_objects}
StatSmoothFit <- ggplot2::ggproto("StatSmoothFit",
ggplot2::StatSmooth,
compute_group = compute_group_smooth_fit,
required_aes = c("x", "y"))StatSmoothErrorSq <- ggplot2::ggproto("StatSmoothErrorSq",
ggplot2::StatSmooth,
compute_group = compute_group_smooth_sq_error,
required_aes = c("x", "y"))
```# Try Out Stat
```{r}
mtcars %>%
ggplot() +
aes(x = wt, y = mpg) +
geom_point() +
geom_smooth() +
geom_point(stat = StatSmoothFit, color = "blue") +
geom_segment(stat = StatSmoothFit, color = "blue")
```# Step 3. Pass to stat_*/ geom_ functions
```{r stat_fit}
library(statexpress)stat_smooth_fit <- function(geom = "point", ...){
qlayer(geom = geom,
stat = StatSmoothFit, ...)
}geom_smooth_fit <- function(...){
qlayer(geom = qproto_update(GeomPoint, aes(colour = from_theme(accent))),
stat = StatSmoothFit, ...)
}geom_smooth_residuals <- function(...){
qlayer(geom = qproto_update(GeomSegment, aes(colour = from_theme(accent))),
stat = StatSmoothFit, ...)
}mtcars %>%
ggplot() +
aes(x = wt, y = mpg) +
geom_point() +
geom_smooth() +
geom_smooth_fit() +
geom_smooth_residuals() +
geom_smooth_fit(xseq = 2:3, size = 8)```
```{r stat_errorsq}
geom_smooth_residuals_squared <- function(...){
qlayer(geom = qproto_update(GeomRect,
aes(fill = from_theme(accent),
alpha = .2,
color = from_theme(accent),
linewidth = from_theme(linewidth*.2))),
stat = StatSmoothErrorSq,
...)
}standardize <- function(x){
var_mean <- mean(x)
var_sd <- sd(x)
(x-var_mean)/var_sd
}```
For best results, use standardized x, y and coord_equal() as shown below
```{r}
mtcars %>%
ggplot() +
aes(x = wt, y = mpg) +
geom_point() +
geom_smooth() +
geom_smooth_fit() +
geom_smooth_residuals() +
geom_smooth_residuals_squared()last_plot() +
coord_equal()last_plot() +
aes(standardize(wt), standardize(mpg))```
# And with lm
```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth(alpha = .2, se = FALSE, method = lm) +
geom_smooth_fit(method = lm) + # wrap as geom_smooth_fit()
geom_smooth_residuals(method = lm) +
geom_smooth_fit(xseq = 0, method = lm)
```# Contrast to an empty model...
- show mean of y, residuals and squares (variance)
```{r}
mtcars %>%
ggplot() +
aes(standardize(wt), standardize(mpg)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ 1) +
geom_smooth_fit(method = lm, formula = y ~ 1) + # wrap as geom_smooth_fit()
geom_smooth_residuals(method = lm, formula = y ~ 1) +
geom_smooth_residuals_squared(method = lm, formula = y ~ 1) +
coord_equal()```
```{r geom_smooth_step}
geom_smooth_step <- function(method = NULL, formula = y ~ x,
color = "darkred", xseq = 0:1){
stat_smooth(method = method,
formula = formula,
geom = "segment", # draw fitted values as points
color = color,
xseq = xseq, # 'from', 'to' value pair
aes(yend = after_stat(y[1]), # 'from' value of y
xend = xseq[2]), # 'to' value of x
arrow = arrow(ends = c("last", "first"),
length = unit(.1, "in")))
}
``````{r}
mtcars |>
ggplot(aes(wt, mpg)) +
geom_point() +
geom_smooth(method = lm) +
geom_smooth_step(method = lm, xseq = 2:3)```
# Via @friendly [ggplot2 extenders ggsprings discussion]( https://github.com/ggplot2-extenders/ggplot-extension-club/discussions/83) and [springs extension case study](https://ggplot2-book.org/ext-springs.html)
```{r}
library(ggplot2)create_spring <- function(x,
y,
xend,
yend,
diameter = 1,
tension = 0.75,
n = 50) {
# Validate the input arguments
if (tension <= 0) {
rlang::abort("`tension` must be larger than zero.")
}
if (diameter == 0) {
rlang::abort("`diameter` can not be zero.")
}
if (n == 0) {
rlang::abort("`n` must be greater than zero.")
}
# Calculate the direct length of the spring path
length <- sqrt((x - xend)^2 + (y - yend)^2)
# Calculate the number of revolutions and points we need
n_revolutions <- length / (diameter * tension)
n_points <- n * n_revolutions
# Calculate the sequence of radians and the x and y offset values
radians <- seq(0, n_revolutions * 2 * pi, length.out = n_points)
x <- seq(x, xend, length.out = n_points)
y <- seq(y, yend, length.out = n_points)
# Create and return the transformed data frame
data.frame(
x = cos(radians) * diameter/2 + x,
y = sin(radians) * diameter/2 + y
)
}GeomSmoothSpring <- ggproto("GeomSmoothSpring", Geom,
# Ensure that each row has a unique group id
setup_data = function(data, params) {
if (is.null(data$group)) {
data$group <- seq_len(nrow(data))
}
if (anyDuplicated(data$group)) {
data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")
}
data
},
# Transform the data inside the draw_panel() method
draw_panel = function(data,
panel_params,
coord,
n = 50,
arrow = NULL,
lineend = "butt",
linejoin = "round",
linemitre = 10,
na.rm = FALSE) {
# Transform the input data to specify the spring paths
cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))
data$diameter <- data$diameter %||% (.025 * abs(min(data$x)-max(data$x)))
data$springlength <- sqrt((data$x-data$xend)^2 + (data$y-data$yend)^2)
data$tension <- data$tension %||% (1 * data$springlength)
springs <- lapply(seq_len(nrow(data)), function(i) {
spring_path <- create_spring(
data$x[i],
data$y[i],
data$xend[i],
data$yend[i],
data$diameter[i],
data$tension[i],
n
)
cbind(spring_path, unclass(data[i, cols_to_keep]))
})
springs <- do.call(rbind, springs)
# Use the draw_panel() method from GeomPath to do the drawing
GeomPath$draw_panel(
data = springs,
panel_params = panel_params,
coord = coord,
arrow = arrow,
lineend = lineend,
linejoin = linejoin,
linemitre = linemitre,
na.rm = na.rm
)
},
# Specify the default and required aesthetics
required_aes = c("x", "y", "xend", "yend"),
default_aes = aes(
colour = from_theme(accent),
linewidth = 0.5,
linetype = 1L,
alpha = NA
)
)anscombe |>
ggplot() +
aes(x = x1, y = y1) +
geom_point() +
geom_smooth(method = lm) +
stat_smooth_fit(geom = GeomSmoothSpring, method = lm) +
ggchalkboard:::theme_blackboard()last_plot() +
aes(x = x2, y = y2)last_plot() +
aes(tension = 1)last_plot() +
aes(x = x3, y = y3) +
aes(tension = NULL)last_plot() +
aes(x = x4, y = y4)anscombe |>
ggplot() +
aes(x = x2, y = y2) +
geom_point() +
geom_smooth(method = lm, formula = y ~ 1) +
stat_smooth_fit(geom = GeomSmoothSpring, method = lm, formula = y ~ 1) +
ggchalkboard:::theme_blackboard()last_plot() +
aes(tension = 1)```
# Part 2. Packaging and documentation 🚧 ✅
## minimal requirements for github package. Have you:
### Created files for package archetecture with `devtools::create("./ggbarlabs")` ✅
### Moved functions R folder? ✅
```{r}
knitr::knit_code$get() |> names()
``````{r}
knitrExtra::chunk_to_dir(c(
"geom_smooth_step",
"compute_group_smooth_fit",
"compute_group_smooth_sq_error",
"ggproto_objects",
"stat_fit",
"stat_errorsq"))
```### Added roxygen skeleton? ✅
for auto documentation and making sure proposed functions are *exported*
### Managed dependencies ? ✅
package dependencies managed, i.e. `depend::function()` in proposed functions and declared in the DESCRIPTION
### Chosen a license? ✅
```{r, eval = F}
usethis::use_package("ggplot2")
usethis::use_mit_license()
```## `devtools::check()` report
```{r, error = T, eval = F}
# rm(list = c("geom_barlab_count", "geom_barlab_count_percent"))
devtools::check(pkg = ".")
```---
# Don't want to use ggsmoothfit? Here are some ways to get it done with base ggplot2!
## Option 1. Verbal description and move on...
"image a line that drops down from the observation to the model line" use vanilla geom_smooth
```{r stat-smooth}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth()
```## Option 2: precalculate and plot
[stack overflow example goes here.]
## Option 3: little known xseq argument and geom = "point"
First a bit of under-the-hood thinking about geom_smooth/stat_smooth.
```{r}
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_smooth(n = 12) +
stat_smooth(geom = "point",
color = "blue",
n = 12)
```Specify xseq... Almost surely new to you (and probably more interesting to stats instructors): predicting at observed values of x..
xseq has only recently been advertised, but possibly of interest.. https://ggplot2.tidyverse.org/reference/geom_smooth.html
```{r, warning= F, message=F}
# fit where the support is in the data...
mtcars %>%
ggplot() +
aes(wt, mpg) +
geom_point() +
geom_smooth() +
stat_smooth(geom = "point", color = "blue", # fitted values
xseq = mtcars$wt) +
stat_smooth(geom = "segment", color = "darkred", # residuals
xseq = mtcars$wt,
xend = mtcars$wt,
yend = mtcars$mpg)
```