https://github.com/evamaerey/ggsmoothfit

extending stat_smooth to return fitted values and residuals
https://github.com/evamaerey/ggsmoothfit
Last synced: 4 months ago
JSON representation
extending stat_smooth to return fitted values and residuals
Host: GitHub
URL: https://github.com/evamaerey/ggsmoothfit
Owner: EvaMaeRey
License: other
Created: 2023-09-11T19:46:49.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-03-18T18:34:54.000Z (4 months ago)
Last Synced: 2025-03-18T19:44:28.192Z (4 months ago)
Language: R
Homepage:
Size: 1.68 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project

README

        ---

output: 

  github_document:

    toc: TRUE

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>"

)

library(tidyverse, warn.conflicts = F)

ggplot2::theme_set(theme_gray(base_size = 18))

```

# ggsmoothfit 

The goal of {ggsmoothfit} is to let you visualize model fitted values and residuals easily!

``` r

library(tidyverse, warn.conflicts = F)

library(ggsmoothfit)

mtcars %>% 

  ggplot() + 

  aes(wt, mpg) +

  geom_point() +

  geom_smooth() +

  ggsmoothfit:::geom_fit() + 

  ggsmoothfit:::geom_residuals() + 

  ggsmoothfit:::geom_smooth_fit(xseq = 2:3, size = 5) +

  ggsmoothfit:::geom_smooth_step(xseq = 2:3) + 

  ggsmoothfit:::geom_smooth_fit(xseq = 0, size = 5, method = lm)

```

# Let's build this functionality

# Step 0. Examine ggplot2::StatSmooth$compute_group, and a dataframe that it returns

Key take away: this function allows you to set values of x with the *xseq argument*.  Although the default is to create an evenly spaced sequence. 

```{r}

ggplot2::StatSmooth$compute_group %>% capture.output() %>% .[1:10]

library(dplyr)

mtcars %>%

  rename(x = wt, y = mpg, cat = am) %>%

  ggplot2::StatSmooth$compute_group(method = lm, 

                           formula = y ~ x, n = 7)

```

# Step 1. create compute_group_smooth_fit 

Here we'll piggy back on StatSmooth$compute_group, to create a function, compute_group_smooth_fit.  We ask that function to compute predictions at the values of x observed in our data set.  We also preserve the values of y (as yend) so that we can draw in the residual error.

xend and yend are computed to draw the segments visualizing the error.

```{r compute_group_smooth_fit}

compute_group_smooth_fit <- function(data, scales, method = NULL, formula = NULL,

                           xseq = NULL,

                           level = 0.95, method.args = list(),

                           na.rm = FALSE, flipped_aes = NA){

  

  if(is.null(xseq)){ # predictions based on observations 

  ggplot2::StatSmooth$compute_group(data = data, scales = scales, 

                       method = method, formula = formula, 

                       se = FALSE, n= 80, span = 0.75, fullrange = FALSE,

                       xseq = data$x, 

                       level = .95, method.args = method.args, 

                       na.rm = na.rm, flipped_aes = flipped_aes) |>

      dplyr::mutate(xend = data$x,

                    yend = data$y)

  

  }else{  # predict specific input values

    

  ggplot2::StatSmooth$compute_group(data = data, scales = scales, 

                       method = method, formula = formula, 

                       se = FALSE, n= 80, span = 0.75, fullrange = FALSE,

                       xseq = xseq, 

                       level = .95, method.args = method.args, 

                       na.rm = na.rm, flipped_aes = flipped_aes)   

    

  }

  

}

```

```{r}

mtcars %>% 

  slice(1:10) %>% 

  rename(x = wt, y = mpg) %>% 

  compute_group_smooth_fit(method = lm, formula = y ~ x)

```

We'll also create compute_group_smooth_sq_error, further piggybacking, this time on the function we just build.  This creates the ymin, ymax, xmin and xmax columns needed to show the *squared* error.  Initially, I'd included this computation above, but the plot results can be bad, as the 'flags' that come off of the residuals effect the plot spacing even when they aren't used.  Preferring to avoid this side-effect, we create two functions (and later two ggproto objects).  Note too that xmax is computed in the units of y, and initial plotting can yield squares that do not look like squares. Standardizing both variables, with coord_equal will get us to squares. 

```{r compute_group_smooth_sq_error}

compute_group_smooth_sq_error <- function(data, scales, method = NULL, 

                                          formula = NULL,

                          

                           level = 0.95, method.args = list(),

                           na.rm = FALSE, flipped_aes = NA){

  

 compute_group_smooth_fit(data = data, scales = scales, 

                       method = method, formula = formula, 

                       level = .95, method.args = method.args, 

                       na.rm = na.rm, flipped_aes = flipped_aes) %>% 

    dplyr::mutate(ymin = y,

           xmin = x,

           ymax = yend,

           xmax = x + (ymax - ymin))

  

}

```

# Step 1.1 test compute group 

```{r}

mtcars %>% 

  slice(1:10) %>% 

  rename(x = wt, y = mpg) %>% 

  compute_group_smooth_fit(method = lm, formula = y ~ x)

mtcars %>% 

  slice(1:10) %>% 

  rename(x = wt, y = mpg) %>% 

  compute_group_smooth_sq_error(method = lm, formula = y ~ x)

```

# Step 2. Pass to ggproto

```{r ggproto_objects}

StatSmoothFit <- ggplot2::ggproto("StatSmoothFit", 

                                  ggplot2::StatSmooth,

                                  compute_group = compute_group_smooth_fit,

                                  required_aes = c("x", "y"))

StatSmoothErrorSq <- ggplot2::ggproto("StatSmoothErrorSq", 

                                      ggplot2::StatSmooth,

                                      compute_group = compute_group_smooth_sq_error,

                                      required_aes = c("x", "y"))

```

# Try Out Stat

```{r}

mtcars %>% 

  ggplot() + 

  aes(x = wt, y = mpg) + 

  geom_point() + 

  geom_smooth() + 

  geom_point(stat = StatSmoothFit, color = "blue") + 

  geom_segment(stat = StatSmoothFit, color = "blue")

```

# Step 3. Pass to stat_*/ geom_ functions

```{r stat_fit}

library(statexpress)

stat_smooth_fit <- function(geom = "point", ...){

  

  qlayer(geom = geom,

         stat = StatSmoothFit, ...)

  

}

geom_smooth_fit <- function(...){

  

  qlayer(geom = qproto_update(GeomPoint, aes(colour = from_theme(accent))),

         stat = StatSmoothFit, ...)

  

}

geom_smooth_residuals <- function(...){

  

  qlayer(geom = qproto_update(GeomSegment, aes(colour = from_theme(accent))),

         stat = StatSmoothFit, ...)

  

}

mtcars %>% 

  ggplot() + 

  aes(x = wt, y = mpg) + 

  geom_point() + 

  geom_smooth() + 

  geom_smooth_fit() + 

  geom_smooth_residuals() + 

  geom_smooth_fit(xseq = 2:3, size = 8)

```

```{r stat_errorsq}

geom_smooth_residuals_squared <- function(...){

  

  qlayer(geom = qproto_update(GeomRect, 

                              aes(fill = from_theme(accent), 

                                  alpha = .2,

                                  color = from_theme(accent),

                                  linewidth = from_theme(linewidth*.2))),

         stat = StatSmoothErrorSq,

         ...)

  

}

standardize <- function(x){

  

  var_mean <- mean(x) 

  var_sd <- sd(x)

  

  (x-var_mean)/var_sd

  

}

```

For best results, use standardized x, y and coord_equal() as shown below

```{r}

mtcars %>% 

  ggplot() + 

  aes(x = wt, y = mpg) + 

  geom_point() + 

  geom_smooth() + 

  geom_smooth_fit() + 

  geom_smooth_residuals() +

  geom_smooth_residuals_squared()

last_plot() + 

  coord_equal()

last_plot() +

  aes(standardize(wt), standardize(mpg)) 

```

# And with lm

```{r}

mtcars %>% 

  ggplot() +

  aes(wt, mpg) + 

  geom_point() +

  geom_smooth(alpha = .2, se = FALSE, method = lm) + 

  geom_smooth_fit(method = lm) + # wrap as geom_smooth_fit()

  geom_smooth_residuals(method = lm) + 

  geom_smooth_fit(xseq = 0, method = lm)

```

# Contrast to an empty model...

- show mean of y, residuals and squares (variance)

```{r}

mtcars %>% 

  ggplot() +

  aes(standardize(wt), standardize(mpg)) + 

  geom_point() +

  geom_smooth(method = lm, formula = y ~ 1) + 

  geom_smooth_fit(method = lm, formula = y ~ 1) + # wrap as geom_smooth_fit()

  geom_smooth_residuals(method = lm, formula = y ~ 1) + 

  geom_smooth_residuals_squared(method = lm, formula = y ~ 1) + 

  coord_equal()

```

```{r geom_smooth_step}

geom_smooth_step <- function(method = NULL, formula = y ~ x,

                              color = "darkred", xseq = 0:1){

  

  stat_smooth(method = method,

              formula = formula,

              geom = "segment", # draw fitted values as points

              color = color,

              xseq = xseq, # 'from', 'to' value pair

              aes(yend = after_stat(y[1]), # 'from' value of y

                  xend = xseq[2]), # 'to' value of x

              arrow = arrow(ends = c("last", "first"), 

                            length = unit(.1, "in")))

} 

```

```{r}

mtcars |>

  ggplot(aes(wt, mpg)) +

  geom_point() +

  geom_smooth(method = lm) +

  geom_smooth_step(method = lm, xseq = 2:3)

```

# Via @friendly [ggplot2 extenders ggsprings discussion]( https://github.com/ggplot2-extenders/ggplot-extension-club/discussions/83) and [springs extension case study](https://ggplot2-book.org/ext-springs.html)

```{r}

library(ggplot2)

create_spring <- function(x, 

                          y, 

                          xend, 

                          yend, 

                          diameter = 1, 

                          tension = 0.75, 

                          n = 50) {

  

  # Validate the input arguments

  if (tension <= 0) {

    rlang::abort("`tension` must be larger than zero.")

  }

  if (diameter == 0) {

    rlang::abort("`diameter` can not be zero.")

  }

  if (n == 0) {

    rlang::abort("`n` must be greater than zero.")

  }

  

  # Calculate the direct length of the spring path

  length <- sqrt((x - xend)^2 + (y - yend)^2)

  

  # Calculate the number of revolutions and points we need

  n_revolutions <- length / (diameter * tension)

  n_points <- n * n_revolutions

  

  # Calculate the sequence of radians and the x and y offset values

  radians <- seq(0, n_revolutions * 2 * pi, length.out = n_points)

  x <- seq(x, xend, length.out = n_points)

  y <- seq(y, yend, length.out = n_points)

  

  # Create and return the transformed data frame

  data.frame(

    x = cos(radians) * diameter/2 + x,

    y = sin(radians) * diameter/2 + y

  )

}

GeomSmoothSpring <- ggproto("GeomSmoothSpring", Geom,

  

  # Ensure that each row has a unique group id

  setup_data = function(data, params) {

    if (is.null(data$group)) {

      data$group <- seq_len(nrow(data))

    }

    if (anyDuplicated(data$group)) {

      data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")

    }

    data

  },

  

  # Transform the data inside the draw_panel() method

  draw_panel = function(data, 

                        panel_params, 

                        coord, 

                        n = 50, 

                        arrow = NULL,

                        lineend = "butt", 

                        linejoin = "round", 

                        linemitre = 10,

                        na.rm = FALSE) {

    

    # Transform the input data to specify the spring paths

    cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))

    

    data$diameter <- data$diameter %||%  (.025 * abs(min(data$x)-max(data$x)))

    data$springlength <- sqrt((data$x-data$xend)^2 + (data$y-data$yend)^2)

    data$tension <- data$tension %||% (1 * data$springlength)

    

    springs <- lapply(seq_len(nrow(data)), function(i) {

      spring_path <- create_spring(

        data$x[i], 

        data$y[i], 

        data$xend[i], 

        data$yend[i], 

        data$diameter[i], 

        data$tension[i], 

        n

      )

      cbind(spring_path, unclass(data[i, cols_to_keep]))

    })

    springs <- do.call(rbind, springs)

    

    # Use the draw_panel() method from GeomPath to do the drawing

    GeomPath$draw_panel(

      data = springs, 

      panel_params = panel_params, 

      coord = coord, 

      arrow = arrow, 

      lineend = lineend, 

      linejoin = linejoin, 

      linemitre = linemitre, 

      na.rm = na.rm

    )

  },

  

  # Specify the default and required aesthetics

  required_aes = c("x", "y", "xend", "yend"),

  default_aes = aes(

    colour = from_theme(accent), 

    linewidth = 0.5, 

    linetype = 1L, 

    alpha = NA

  )

  

)

anscombe |> 

  ggplot() + 

  aes(x = x1, y = y1) + 

  geom_point() + 

  geom_smooth(method = lm) +

  stat_smooth_fit(geom = GeomSmoothSpring, method = lm) + 

  ggchalkboard:::theme_blackboard()

last_plot() + 

  aes(x = x2, y = y2)

last_plot() + 

  aes(tension = 1)

last_plot() + 

  aes(x = x3, y = y3) + 

  aes(tension = NULL)

last_plot() + 

  aes(x = x4, y = y4)

anscombe |> 

  ggplot() + 

  aes(x = x2, y = y2) + 

  geom_point() + 

  geom_smooth(method = lm, formula = y ~ 1) +

  stat_smooth_fit(geom = GeomSmoothSpring, method = lm, formula = y ~ 1) + 

  ggchalkboard:::theme_blackboard()

last_plot() + 

  aes(tension = 1)

```

# Part 2. Packaging and documentation  🚧 ✅ 

## minimal requirements for github package.  Have you:

### Created files for package archetecture with `devtools::create("./ggbarlabs")` ✅ 

### Moved functions R folder? ✅  

```{r}

knitr::knit_code$get() |> names()

```

```{r}

knitrExtra::chunk_to_dir(c(

                         "geom_smooth_step",

                         "compute_group_smooth_fit", 

                         "compute_group_smooth_sq_error",

                         "ggproto_objects",

                         "stat_fit", 

                         "stat_errorsq"))

```

### Added roxygen skeleton? ✅ 

 for auto documentation and making sure proposed functions are *exported*

### Managed dependencies ? ✅ 

package dependencies managed, i.e. `depend::function()` in proposed functions and declared in the DESCRIPTION

### Chosen a license? ✅ 

```{r, eval = F}

usethis::use_package("ggplot2")

usethis::use_mit_license()

```

## `devtools::check()` report

```{r, error = T, eval = F}

# rm(list = c("geom_barlab_count", "geom_barlab_count_percent"))

devtools::check(pkg = ".")

```

---

# Don't want to use ggsmoothfit?  Here are some ways to get it done with base ggplot2!

## Option 1. Verbal description and move on...

"image a line that drops down from the observation to the model line" use vanilla geom_smooth

```{r stat-smooth}

mtcars %>% 

  ggplot() +

  aes(wt, mpg) + 

  geom_point() + 

  geom_smooth()

```

## Option 2: precalculate and plot 

[stack overflow example goes here.]

## Option 3: little known xseq argument and geom = "point"

First a bit of under-the-hood thinking about geom_smooth/stat_smooth.

```{r}

mtcars %>% 

  ggplot() +

  aes(wt, mpg) + 

  geom_smooth(n = 12) +

  stat_smooth(geom = "point", 

              color = "blue", 

              n = 12)

```

Specify xseq... Almost surely new to you (and probably more interesting to stats instructors): predicting at observed values of x..

xseq has only recently been advertised, but possibly of interest.. https://ggplot2.tidyverse.org/reference/geom_smooth.html

```{r, warning= F, message=F}

# fit where the support is in the data... 

mtcars %>% 

  ggplot() +

  aes(wt, mpg) + 

  geom_point() +

  geom_smooth() + 

  stat_smooth(geom = "point",  color = "blue", # fitted values

              xseq = mtcars$wt) +

  stat_smooth(geom = "segment", color = "darkred", # residuals

              xseq = mtcars$wt,

              xend = mtcars$wt,

              yend = mtcars$mpg)

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/evamaerey/ggsmoothfit

Awesome Lists containing this project

README