https://github.com/tjmahr/tjmisc

TJ's R Miscellany
https://github.com/tjmahr/tjmisc
Last synced: 21 days ago
JSON representation
TJ's R Miscellany
Host: GitHub
URL: https://github.com/tjmahr/tjmisc
Owner: tjmahr
License: other
Created: 2017-11-02T13:04:42.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2023-02-07T15:01:47.000Z (over 2 years ago)
Last Synced: 2025-04-26T14:05:37.616Z (25 days ago)
Language: R
Homepage:
Size: 196 KB
Stars: 10
Watchers: 4
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project

README

        ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-"

)

```

# tjmisc 

[![R-CMD-check](https://github.com/tjmahr/tjmisc/workflows/R-CMD-check/badge.svg)](https://github.com/tjmahr/tjmisc/actions)

The goal of tjmisc is to gather miscellaneous helper functions, mostly for use 

in [my dissertation](https://github.com/tjmahr/dissertation). 

Apologies in advance. I think "misc" packages are kind of bad because packages

should be focused on specific problems: for example, my helper packages for

[working on polynomials](https://github.com/tjmahr/polypoly),

[printing numbers](https://github.com/tjmahr/printy) or 

[tidying MCMC samples](https://github.com/tjmahr/tristan). Having modular code

snapping together like Lego blocks is better than a grab-bag of functions, it's

true, but using `library(helpers)` is much, much better than using

`source("helpers.R")`. So here we are... in the grab-bag.

## Installation

You can install the tjmisc from github with:

```{r, eval = FALSE}

# install.packages("devtools")

devtools::install_github("tjmahr/tjmisc")

```

## Examples

### Sample groups of data

`sample_n_of()` is like dplyr's `sample_n()` but it samples groups. 

```{r example}

library(dplyr, warn.conflicts = FALSE)

library(tjmisc)

set.seed(11022017)

data <- tibble::tibble(

  day = 1:10 %>% rep(10) %>% sort(),

  id  = 1:10 %>% rep(10),

  block = letters[1:5] %>% rep(10) %>% sort() %>% rep(2),

  value = rnorm(100) %>% round(2)

)

# data from 3 days

sample_n_of(data, 3, day)

# data from 1 id

sample_n_of(data, 1, id)

# data from 2 block-id pairs

sample_n_of(data, 2, block, id)

```

### Tidy quantiles

`tidy_quantile()` returns a dataframe with quantiles for a given variable. I

like to use it to select values for plotting model predictions.

```{r}

penguins <- palmerpenguins::penguins

penguins %>% 

  tidy_quantile(bill_length_mm)

penguins %>% 

  group_by(species) %>% 

  tidy_quantile(bill_length_mm)

```

### Tidy correlations

`tidy_correlation()` calculates correlations between pairs of selected dataframe

columns. It accepts `dplyr::select()` selection semantics, and it respects

grouped dataframes.

```{r}

penguins %>% 

  tidy_correlation(bill_length_mm, bill_depth_mm, flipper_length_mm)

penguins %>%

  dplyr::group_by(species) %>%

  tidy_correlation(dplyr::ends_with("mm"))

```

### Pairwise comparisons

`compare_pairs()` compares all pairs of values among levels of a categorical

variable. Hmmm, that sounds confusing. Here's an example. We compute the

difference in average score between each pair of workers.

```{r}

to_compare <- nlme::Machines %>%

  group_by(Worker) %>%

  summarise(avg_score = mean(score)) %>%

  print()

to_compare %>%

  compare_pairs(Worker, avg_score) %>%

  rename(difference = value) %>%

  mutate(

    across(where(is.numeric), round, 1)

  )

```

### Plotting a matrix

`ggmatplot()` plots the columns of a matrix as individual lines, much like 

`matplot()` in base R.

Here we plot a spline basis matrix for penguin bill length. By default it plots

the columns with unique row number as the x-axis.

```{r matplot, fig.width = 4, fig.height = 2.5, fig.retina = 2}

# Create a 10-column natural spline bases

sorted_lengths <- sort(penguins$bill_length_mm)

length_ns <- splines::ns(sorted_lengths, df = 10)

ggmatplot(length_ns)

```

Alternatively, you can supply a column number and make it the *x* axis. In this

example, we bind on the original data and use it as the *x*-axis column.

This makes the lines much smoother because the spline basis was built on the

bill lengths, not on row numbers.

```{r matplot1, fig.width = 4, fig.height = 2.5, fig.retina = 2}

ggmatplot(cbind(sorted_lengths, length_ns), x_axis_column = 1)

```

By default, duplicated rows are removed. We can choose to keep them. The little

flat steps along the curve are the repeated rows. We can also change the number

of colors to use. The package also provides `annotate_label_grey()` 

for making labels on ggplot2's default grey background.

```{r matplot2, fig.width = 4, fig.height = 2.5, fig.retina = 2}

ggmatplot(length_ns, unique_rows = FALSE, n_colors = 1) + 

  annotate_label_grey("splines!", 20, .65, size = 5)

```

### Et cetera

`ggpreview()` is like ggplot2's `ggsave()` but it saves an image to a temporary

file and then opens it in the system viewer. If you've ever found yourself in

a loop of saving a plot, leaving RStudio to doubleclick the file, sighing, going

back to RStudio, tweaking the height or width or plot theme, ever so slowly

spiraling in on your desired plot, then `ggpreview()` is for you.

`seq_along_rows()` saves a few keystrokes in for-loops that iterate over

dataframe rows.

```{r}

cars %>% head(5) %>% seq_along_rows()

cars %>% head(0) %>% seq_along_rows()

```

`is_same_as_last` and `replace_if_same_as_last()` are helpers for formatting

tables. I use them to replace repeating values in a text column with blanks.

```{r}

mtcars %>% 

  tibble::rownames_to_column("name") %>% 

  slice(1:10) %>% 

  select(cyl, name, mpg) %>% 

  arrange(cyl, mpg) %>% 

  mutate_at(c("cyl"), replace_if_same_as_last, "") %>% 

  knitr::kable()

```

`fct_add_counts()` adds counts to a factor's labels.

```{r}

# Create a factor with some random counts

set.seed(20190124)

random_penguins <- penguins %>% 

  dplyr::sample_n(250, replace = TRUE)

table(random_penguins$species)

# Updated factors

random_penguins$species %>% levels()

random_penguins$species %>% fct_add_counts() %>% levels()

```

You can tweak the format for the first label. I like to use this for plotting by

stating the unit next to the first count. 

```{r}

random_penguins$species %>% 

  fct_add_counts(first_fmt = "{levels} ({counts} penguins)") %>% 

  levels()

```

Behind the scenes, `fct_add_counts()` uses the function `fct_glue_labels()` to

construct labels using a [glue]-templating string. Therefore, `fct_glue_labels()`

would be a more appropriate function for generic relabeling using glue:

```{r}

random_penguins$species %>% 

  fct_glue_labels(

    fmt = "{tolower(levels)}", 

    first_fmt = "Species: {tolower(levels)}"

  ) %>% 

  levels()

```

#### Comparing two sets

When I need to merge two datasets together, I have to go through a little dance

to figure out which elements are in `your_data` and which are in `my_data`.

`compare_sets()` performs all of R's set operations so I can skim over the

differences.

```{r}

your_data <- c(1, 2, 3, 3, 4, 5)

my_data <- c(4, 4, 4, 5, 6, 7, 8)

str(compare_sets(your_data, my_data))

```

#### Jekyll helpers

I also include functions I use to create and maintain my website.

`jekyll_create_rmd_draft()` creates a post in the `_R/_drafts` folder.

```{r}

withr::with_dir(tempdir(), {

  dir.create("_R")

  dir.create("_R/_drafts")

  

  # Basic use

  jekyll_create_rmd_draft(slug = "today-i-learned")

  

  # Accepts a date

  jekyll_create_rmd_draft(

    slug = "yesterday-i-learned", 

    date = Sys.Date() - 1

  )

  # Filler text used if slug is not provided

  jekyll_create_rmd_draft()

})

```

## More involved demos

These are things that I would have used in the demo above but cut and moved

down here to keep that overview succinct.

### Comparing pairs of values over a posterior distribution

I wrote `compare_pairs()` to compute posterior differences in Bayesian models.

For the sake of example, let's fit a Bayesian model of average bill length for

each species in `penguins`. We could get these estimates more directly using the

default dummy-coding of factors, but let's ignore that for now.

```{r, results = "hide"}

library(rstanarm)

m <- stan_glm(

  bill_length_mm ~ species - 1,

  penguins,

  family = gaussian

)

```

Now, we have a posterior distribution of species means.

```{r}

newdata <- data.frame(species = unique(penguins$species))

p_means <- posterior_linpred(m, newdata = newdata) %>%

  as.data.frame() %>%

  tibble::as_tibble() %>%

  setNames(newdata$species) %>%

  tibble::rowid_to_column("draw") %>%

  tidyr::gather(species, mean, -draw) %>%

  print()

```

For each posterior sample, we can compute pairwise differences of means with

`compare_means()`.

```{r pairs, fig.width = 4, fig.height = 2.5, fig.retina = 2}

pair_diffs <- compare_pairs(p_means, species, mean) %>%

  print()

library(ggplot2)

ggplot(pair_diffs) +

  aes(x = pair, y = value) +

  stat_summary(fun.data = median_hilow, geom = "linerange") +

  stat_summary(fun.data = median_hilow, fun.args = list(conf.int = .8),

               size = 2, geom = "linerange") +

  stat_summary(fun.y = median, size = 5, shape = 3, geom = "point") +

  labs(x = NULL, y = "Difference in posterior means") +

  coord_flip()

```

...which should look like the effect ranges in the dummy-coded models.

```{r, results = "hide"}

m2 <- update(m, bill_length_mm ~ species)

m3 <- update(

  m, 

  bill_length_mm ~ species, 

  data = penguins %>% mutate(species = forcats::fct_rev(species))

)

```

Give or take a few decimals of precision and give or take changes in signs

because of changes in who was subtracted from whom.

```{r}

# Adelie versus others

m2 %>% 

  posterior_interval(regex_pars = "species") %>% 

  round(2)

# Gentoo versus others

m3 %>% 

  rstanarm::posterior_interval(regex_pars = "species") %>% 

  round(2)

# differences from compare_pairs()

pair_diffs %>% 

  tidyr::spread(pair, value) %>% 

  select(-draw) %>% 

  as.matrix() %>% 

  posterior_interval() %>% 

  round(2)

```

[glue]: https://glue.tidyverse.org/index.html "glue: Interpreted String Literals"
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tjmahr/tjmisc

Awesome Lists containing this project

README