Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/evamaerey/ggols
visual expositions of simpler linear models
https://github.com/evamaerey/ggols
Last synced: 11 days ago
JSON representation
visual expositions of simpler linear models
- Host: GitHub
- URL: https://github.com/evamaerey/ggols
- Owner: EvaMaeRey
- License: other
- Created: 2022-11-01T01:40:44.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-09-01T19:43:58.000Z (about 1 year ago)
- Last Synced: 2024-07-18T07:50:54.317Z (4 months ago)
- Language: CSS
- Homepage: https://evamaerey.github.io/ggols
- Size: 1.87 MB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = F
)
```# meta goals
- learn about ggplot2 extension strategies and extension workflow
- learn about how teaching strategies can inform development
- learn more about robust package practices
- practice collaborating on package, git and github
- publication w/ collaborators# ggols goals (abstract)
Statistical modeling is an important technique to understanding relationships in data. In statistics classes, introductions to modeling often start by modeling one continuous 'response' variable, y, by one continuous 'independent' variable, x. After discussing statistical features of this simple modeling set-up, students are then often introduced to multivariate thinking in which a second explanatory variable is used in modeling; most commonly, this second variable is an cat or categorical variable, where conditions are discrete and therefore easier to reason about.
Using the popular and powerful ggplot2 library, visualizations of models with one explanatory variable is relatively straightforward via the ggplot2::geom_smooth() function. However, it is much more difficult to use the ggplot2 package to visualize models with a second, categorical prediction variable. geom_smooth will indicate different predictions for different categories when an categorical variable is included in the aesthetic specification via for example aes(linetype = my_cat) or aes(color = my_cat). However, the model estimation is performed independently for each category; geom_smooth is instructed to break each dataset apart and compute group-wise. Compelling, 2-D visuals of the models are possible, but challenging to choreograph with ggplot2.
The ggplot2 extension package {{ggols}} (from OLS) aims to provide new geom_*'s that will allow for easy visualization of models of the form y ~ x + z where x and y are continuous and z is discrete. The extension strategy that is used is to specify that model computation happen at the panel level (or maybe layer); and not the group level. In addition to providing functionality like geom smooth, which allows for easy visualization of a smear of model predictions and confidence intervals, {{ggols}} also seeks to provide functions that allow for easy visualization of other statistical quantities (like residuals, prediction interval, intercepts, etc.). Many of the examples will be OLS, but we aim to build something that can be used in a more generic way as well.
The goal of ggols is to ...
## Installation
You can install the released version of ggols from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("ggols")
```And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("EvaMaeRey/ggols")
```## ggxmean gives us all the geom_lm_* functions
```{r example}
library(tidyverse)
library(ggxmean)ggplot(cars) +
aes(x = speed, y = dist,) +
geom_point() +
geom_lm() +
geom_lm_intercept(color = "blue") +
geom_lm_intercept_label(hjust = -.2) +
geom_lm_conf_int() +
geom_lm_residuals() +
geom_lm_fitted() +
geom_lm_formula() +
ggols:::geom_lm_rsquared() #should be moved to ggxmean
```# Visualization, adding cat var
```{r cat}
library(ggols)ggplot(cars) +
aes(x = speed, y = dist, cat = dist > 30,
color = dist > 30) +
geom_point() +
geom_lm_cat() +
geom_lm_cat_intercept(color = "blue") +
geom_lm_cat_intercept_label(hjust = -.2) +
geom_lm_cat_conf_int() +
geom_lm_cat_residuals() +
geom_lm_cat_fitted() +
geom_lm_cat_formula() +
geom_lm_cat_rsquared()
```# Visualization, adding cat interaction
```{r interaction}
ggplot(cars) +
aes(x = speed, y = dist, cat = dist > 30,
color = dist > 30) +
geom_point() +
geom_lm_interaction() +
geom_lm_interaction_intercept(color = "blue") +
geom_lm_interaction_intercept_label(hjust = -.2) +
geom_lm_interaction_conf_int() +
geom_lm_interaction_residuals() +
geom_lm_interaction_fitted() +
geom_lm_interaction_formula() +
geom_lm_interaction_rsquared()
```# Visualization, and two cat variables
```{r int2}
ggplot(palmerpenguins::penguins) +
aes(x = bill_length_mm, y = bill_depth_mm,
cat = species, cat2 = sex,
color = species, shape = sex) +
geom_point() +
geom_lm_cat2() +
geom_lm_cat2_conf_int() +
geom_lm_cat2_residuals() +
geom_lm_cat2_fitted() +
geom_lm_cat2_formula() +
geom_lm_cat2_rsquared() +
NULLlast_plot() +
geom_lm_cat2_intercept(color = "blue") +
geom_lm_cat2_intercept_label(hjust = -.2)
``````{r}
ggplot(palmerpenguins::penguins) +
aes(x = bill_length_mm, y = bill_depth_mm,
cat = species,
color = species, shape = sex) +
geom_point() +
geom_lm_cat(formula = y ~ I(x^3) + I(x^2) + x + cat)# the implementation should change; curves expose problem. The non-smoothness shows prediction not made along parts not supported with data
```