https://github.com/hadley/table-shapes

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/hadley/table-shapes
Owner: hadley
Created: 2019-03-24T13:53:09.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-03-26T01:04:12.000Z (over 6 years ago)
Last Synced: 2025-04-14T08:12:55.851Z (3 months ago)
Size: 118 KB
Stars: 34
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

jimsghstars - hadley/table-shapes - (Others)

README

        ---

output: github_document

---

# Pivot function names

On 2019-03-22, I [tweeted about](https://twitter.com/hadleywickham/status/1109132826631421952) a [survey](https://forms.gle/vvYgBw1EwHK69gA17) to help me pick names for the [new pivot functions](https://tidyr.tidyverse.org/dev/articles/pivot.html) in the dev version of tidyr.

In the survey, I showed a picture of two tables containing the same data, and asked participants to describe their relative shapes. This document describes the results. 

![Table A has four columns (id, x, y, z) and 2 rows. Table B has three columns (id, n, x) and six rows](table.png)

```{r, include=FALSE}

knitr::opts_chunk$set(comment = "#>", collapse = TRUE)

```

```{r setup, message = FALSE}

library(googlesheets)

library(tidyverse)

# This googlesheet is public if you want to do your own analysis

key <- gs_key("1Do5R1k5sEZrwU0N1KmIjKaapHDrf7eYLdIlcGNx-MsI")

results <- googlesheets::gs_read(key, col_types = list())

names(results) <- c("timestamp", "table_a", "table_b")

head(results)

nrow(results)

# Capture for posterity

write_csv(results, "results.csv")

```

## Table A -> Table B

Wider is the clear winner with ~80% of responses.

```{r}

table_a <- results %>% 

  filter(!is.na(table_a)) %>% 

  mutate(top3 = table_a %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>%

  count(top3) %>% 

  mutate(prop = n / sum(n))

table_a %>% 

  ggplot(aes(top3, prop)) +

  geom_col() +

  scale_y_continuous(labels = scales::percent) +

  labs(

    x = NULL,

    y = "Percent of responses"

  ) + 

  coord_flip()

```

There were a wide range of write in respones. The most popular included concise, compact, condense, denser.

```{r}

results %>% 

  mutate(

    table_a = table_a %>% 

      str_remove("Table A is ") %>% 

      str_remove(" than Table B") %>% 

      str_trunc(50)

  ) %>% 

  count(table_a, sort = TRUE) %>% 

  print(n = Inf)

```

## Table B -> Table A

Longer is the clear winner with ~70% of responses. Given the number of people who suggested taller to me, I had expected it to come in much higher. Interestingly narrower is much less common than shorter, it's equivalent above.

```{r}

table_b <- results %>% 

  filter(!is.na(table_b)) %>% 

  mutate(top3 = table_b %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>%

  count(top3) %>% 

  mutate(prop = n / sum(n))

table_b %>% 

  ggplot(aes(top3, prop)) +

  geom_col() +

  scale_y_continuous(labels = scales::percent) +

  labs(

    x = NULL,

    y = "Percent of responses"

  ) + 

  coord_flip()

```

There were a wide range of write in respones. The most popular included expanded and skinnier.

```{r}

results %>% 

  mutate(

    table_b = table_b %>% 

      str_remove("Table B is ") %>% 

      str_remove(" than Table A") %>% 

      str_trunc(50)

  ) %>% 

  count(table_b, sort = TRUE) %>% 

  print(n = Inf)

```

## Conclusion

The new functions will be called `pivot_wider()` and `pivot_longer()`: these are not the most natural names for everyone, but they are are the most popular by a large margin. I like pivot because it suggests the form of the underlying operation (a pivoting or rotation), and it is evocative to excel users.

A few alternatives that were suggested, considered, and rejected:

* `VERB_long()`/`VERB_wide()`: not obvious whether they take long/wide

  data or return long/wide data.

  

* `VERB_to_long()`/`VERB_to_wider()`: implies that long and wide are absolute

  terms. I don't think it makes sense to talk about long or wide form data;

  you can only say one form is longer or wider than another form.

  

* `to_long()`/`to_wide()`: isn't a verb, and implies that there's only one 

  operation that makes data longer/wider. The next version of tidyr will also

  contain functions that unnest list-columns of vectors, and that verb (name 

  TBA) also needs directional suffixes.

  

* `reshape_SHAPE`: too much potential for confusion with the existing 

  `base::resahpe()`

* `gather()`/`spread()`: while some people clearly liked these functions they

  were not memorable to a large number of people I talked to. 

I appreciate the enthusiasm that people have for naming functions!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hadley/table-shapes

Awesome Lists containing this project

README