https://github.com/njtierney/brolgar

BRowse Over Longitudinal Data Graphically and Analytically in R
https://github.com/njtierney/brolgar
Last synced: 7 months ago
JSON representation
BRowse Over Longitudinal Data Graphically and Analytically in R
Host: GitHub
URL: https://github.com/njtierney/brolgar
Owner: njtierney
License: other
Created: 2019-04-04T04:30:00.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2025-04-03T04:58:27.000Z (8 months ago)
Last Synced: 2025-05-10T21:54:45.152Z (7 months ago)
Language: R
Homepage: http://brolgar.njtierney.com/
Size: 196 MB
Stars: 110
Watchers: 6
Forks: 10
Open Issues: 47
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project

jimsghstars - njtierney/brolgar - BRowse Over Longitudinal Data Graphically and Analytically in R (R)
README

          ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  fig.align = "center",

  out.width = "75%"

)

```

# brolgar 

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)

[![R-CMD-check](https://github.com/njtierney/brolgar/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/njtierney/brolgar/actions/workflows/R-CMD-check.yaml)

[![CRAN status](https://www.r-pkg.org/badges/version/brolgar)](https://CRAN.R-project.org/package=brolgar)

[![Codecov test coverage](https://codecov.io/gh/njtierney/brolgar/branch/main/graph/badge.svg)](https://app.codecov.io/gh/njtierney/brolgar?branch=main)

[![Codecov test coverage](https://codecov.io/gh/njtierney/brolgar/graph/badge.svg)](https://app.codecov.io/gh/njtierney/brolgar)

`brolgar` helps you **br**owse **o**ver **l**ongitudinal **d**ata **g**raphically and **a**nalytically in **R**, by providing tools to:

-   Efficiently explore raw longitudinal data

-   Calculate features (summaries) for individuals

-   Evaluate diagnostics of statistical models

This helps you go from the "plate of spaghetti" plot on the left, to "interesting observations" plot on the right.

```{r show-spaghetti, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3, fig.width = 6}

library(brolgar)

library(ggplot2)

library(patchwork)

suppressPackageStartupMessages(library(gghighlight))

suppressPackageStartupMessages(library(ggplot2))

suppressPackageStartupMessages(library(dplyr))

p1 <- ggplot(wages, 

       aes(x = xp, 

             y = ln_wages, 

             group = id)) + 

  geom_line()

p2 <- 

wages %>%

  features(ln_wages, feat_monotonic) %>%

  left_join(wages, by = "id") %>%

  ggplot(aes(x = xp,

             y = ln_wages,

             group = id)) +

  geom_line() + 

  gghighlight(increase)

p1 + p2

```

## Installation

Install from [GitHub](https://github.com/) with:

```r

# install.packages("remotes")

remotes::install_github("njtierney/brolgar")

```

Or from the [R Universe](https://njtierney.r-universe.dev) with:

```r

# Enable this universe

options(repos = c(

    njtierney = 'https://njtierney.r-universe.dev',

    CRAN = 'https://cloud.r-project.org')

    )

# Install some packages

install.packages('brolgar')

```

# Using `brolgar`: We need to talk about data

There are many ways to describe longitudinal data - from panel data, cross-sectional data, and time series. We define longitudinal data as:

> individuals repeatedly measured through time.

The tools and workflows in `brolgar` are designed to work with a special tidy time series data frame called a `tsibble`. We can define our longitudinal data in terms of a time series to gain access to some really useful tools. To do so, we need to identify three components:

1.  The **key** variable in your data is the **identifier** of your individual.

2.  The **index** variable is the **time** component of your data.

3.  The **regularity** of the time interval (index). Longitudinal data typically has irregular time periods between measurements, but can have regular measurements.

Together, time **index** and **key** uniquely identify an observation.

The term `key` is used a lot in brolgar, so it is an important idea to internalise:

> **The key is the identifier of your individuals or series**

Identifying the key, index, and regularity of the data can be a challenge. You can learn more about specifying this in the vignette, ["Longitudinal Data Structures"](https://brolgar.njtierney.com/articles/longitudinal-data-structures.html).

## The wages data

The `wages` data is an example dataset provided with brolgar. It looks like this:

```{r print-wages}

wages

```

And under the hood, it was created with the following setup:

```{r setup-wages-ts, eval = FALSE}

wages <- as_tsibble(x = wages,

                    key = id,

                    index = xp,

                    regular = FALSE)

```

Here `as_tsibble()` takes wages, and a `key`, and `index`, and we state the `regular = FALSE` (since there are not regular time periods between measurements). This turns the data into a `tsibble` object - a powerful data abstraction made available in the [`tsibble`](https://tsibble.tidyverts.org/) package by [Earo Wang](https://earo.me/), if you would like to learn more about `tsibble`, see the [official package documentation](https://tsibble.tidyverts.org/) or read [the paper](https://pdf.earo.me/tsibble.pdf).

# Efficiently exploring longitudinal data

Exploring longitudinal data can be challenging when there are many individuals. It is difficult to look at all of them!

You often get a "plate of spaghetti" plot, with many lines plotted on top of each other. You can avoid the spaghetti by looking at a random subset of the data using tools in `brolgar`.

## `sample_n_keys()`

In `dplyr`, you can use `sample_n()` to sample `n` observations, or `sample_frac()` to look at a `frac`tion of observations.

`brolgar` builds on this providing `sample_n_keys()` and `sample_frac_keys()`. This allows you to take a random sample of `n` keys using `sample_n_keys()`. For example:

```{r plot-sample-n-keys}

set.seed(2019-7-15-1300)

wages %>%

  sample_n_keys(size = 5) %>%

  ggplot(aes(x = xp,

             y = ln_wages,

             group = id)) + 

  geom_line()

```

And what if you want to create many of these plots?

## Clever facets: `facet_sample()`

`facet_sample()` allows you to specify the number of keys per facet, and the number of facets with `n_per_facet` and `n_facets`.

By default, it splits the data into 12 facets with 5 per facet:

```{r facet-sample}

set.seed(2019-07-23-1937)

ggplot(wages,

       aes(x = xp,

           y = ln_wages,

           group = id)) +

  geom_line() +

  facet_sample()

```

Under the hood, `facet_sample()` is powered by `sample_n_keys()` and `stratify_keys()`.

You can see more facets (e.g., `facet_strata()`) and data visualisations you can make in brolgar in the [Visualisation Gallery](https://brolgar.njtierney.com/articles/visualisation-gallery.html).

## Finding features in longitudinal data

Sometimes you want to know what the range or a summary of a variable for each individual. We call these summaries `features` of the data, and they can be extracted using the `features` function, from [`fabletools`](https://fabletools.tidyverts.org/).

For example, if you want to answer the question "What is the summary of wages for each individual?". You can use `features()` to find the five number summary (min, max, q1, q3, and median) of `ln_wages` with `feat_five_num`:

```{r features-five-num}

wages %>%

  features(ln_wages,

           feat_five_num)

  

```

This returns the id, and then the features.

There are many features in brolgar - these features all begin with `feat_`. You can, for example, find those whose `ln_wages` values only increase or decrease with `feat_monotonic`:

```{r features-monotonic}

wages %>%

  features(ln_wages, feat_monotonic)

```

You can read more about creating and using features in the [Finding Features](https://brolgar.njtierney.com/articles/finding-features.html) vignette. You can also see other features for time series in the [`feasts` package](https://feasts.tidyverts.org).

## Linking individuals back to the data

Once you have created these features, you can join them back to the data with a `left_join`, like so:

```{r features-left-join}

wages %>%

  features(ln_wages, feat_monotonic) %>%

  left_join(wages, by = "id") %>%

  ggplot(aes(x = xp,

             y = ln_wages,

             group = id)) +

  geom_line() + 

  gghighlight(increase)

```

# Other helper functions

## `n_obs()`

Return the number of observations total with `n_obs()`:

```{r example-n-obs}

n_obs(wages)

```

## `n_keys()`

And the number of keys in the data using `n_keys()`:

```{r example-n-keys}

n_keys(wages)

```

## Finding the number of observations per `key`.

You can also use `n_obs()` inside features to return the number of observations for each key:

```{r n-obs-features}

wages %>%

  features(ln_wages, n_obs)

```

This returns a dataframe, with one row per key, and the number of observations for each key.

This could be further summarised to get a sense of the patterns of the number of observations:

```{r summarise-n-obs}

library(ggplot2)

wages %>%

  features(ln_wages, n_obs) %>%

  ggplot(aes(x = n_obs)) + 

    geom_bar()

wages %>%

  features(ln_wages, n_obs) %>%

  summary()

```

# Further Reading

`brolgar` provides other useful functions to explore your data, which you can read about in the [exploratory modelling](https://brolgar.njtierney.com/articles/exploratory-modelling.html) and [Identify Interesting Observations](https://brolgar.njtierney.com/articles/id-interesting-obs.html) vignettes. As a taster, here are some of the figures you can produce:

```{r show-wages-lg, echo = FALSE,  fig.height = 3, fig.width = 6}

suppressMessages(library(dplyr))

suppressMessages(library(gghighlight))

wages_slope <- key_slope(wages,ln_wages ~ xp) %>%

  left_join(wages, by = "id") 

p3 <- wages_slope %>% 

  as_tibble() %>% # workaround for gghighlight + tsibble

  ggplot(aes(x = xp, 

             y = ln_wages, 

             group = id)) + 

  geom_line() +

  gghighlight(.slope_xp < 0)

p4 <- wages_slope %>%

  keys_near(key = id,

            var = .slope_xp) %>%

  left_join(wages, by = "id") %>%

  ggplot(aes(x = xp,

             y = ln_wages,

             group = id,

             colour = stat)) + 

  geom_line()

gridExtra::grid.arrange(p3, p4, ncol = 2)

```

# Related work

One of the sources of inspiration for this work was the [`lasangar` R package by Bryan Swihart](https://github.com/swihart/lasagnar) (and [paper](https://doi.org/10.1097%2FEDE.0b013e3181e5b06a)).

For even more expansive time series summarisation, make sure you check out the [`feasts` package](https://github.com/tidyverts/feasts) (and [talk!](https://slides.mitchelloharawild.com/user2019/#1)).

# Contributing

Please note that the `brolgar` project is released with a [Contributor Code of Conduct](https://github.com/njtierney/brolgar/blob/main/.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.

# A Note on the API

This version of brolgar was been forked from [tprvan/brolgar](https://github.com/tprvan/brolgar), and has undergone breaking changes to the API.

# Acknowledgements

Thank you to [Mitchell O'Hara-Wild](https://mitchelloharawild.com/blog.html) and [Earo Wang](https://earo.me/) for many useful discussions on the implementation of brolgar, as it was heavily inspired by the [`feasts`](https://github.com/tidyverts/feasts) package from the [`tidyverts`](https://tidyverts.org/). I would also like to thank [Tania Prvan](https://researchers.mq.edu.au/en/persons/tania-prvan) for her valuable early contributions to the project, as well as [Stuart Lee](https://stuartlee.org/) for helpful discussions. Thanks also to [Ursula Laa](https://uschilaa.github.io/) for her feedback on the package structure and documentation. Thank you to [Di Cook](https://www.dicook.org/) for making the hex sticker - which is taken from an illustration by [John Gould](https://en.wikipedia.org/wiki/John_Gould), drawn in 1865, and is in the public domain as the drawing is over 100 years old.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/njtierney/brolgar

Awesome Lists containing this project

README