https://github.com/jacob-long/panelr

Regression models and utilities for repeated measures and panel data
https://github.com/jacob-long/panelr

r r-package social-science statistics

Last synced: 4 months ago
JSON representation

Regression models and utilities for repeated measures and panel data

Host: GitHub
URL: https://github.com/jacob-long/panelr
Owner: jacob-long
License: other
Created: 2018-01-09T03:27:58.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-01-11T02:32:35.000Z (over 1 year ago)
Last Synced: 2024-07-31T19:26:20.524Z (12 months ago)
Topics: r, r-package, social-science, statistics
Language: R
Homepage:
Size: 7.97 MB
Stars: 96
Watchers: 8
Forks: 21
Open Issues: 26
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - jacob-long/panelr - Regression models and utilities for repeated measures and panel data (R)

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = FALSE,

  comment = "",

  fig.path = "README-",

  message = FALSE

)

options("panelr.table.format" = "multiline")

```

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-ago/panelr)](https://cran.r-project.org/package=panelr) [![Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/panelr)](https://cran.r-project.org/package=panelr)

 [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/jacob-long/panelr?branch=master&svg=true)](https://ci.appveyor.com/project/jacob-long/panelr) [![Coverage Status](https://img.shields.io/codecov/c/github/jacob-long/panelr/master.svg)](https://app.codecov.io/github/jacob-long/panelr?branch=master)

# panelr

This is an R package designed to aid in the analysis of panel data, 

designs in which the same group of respondents/entities are contacted/measured

multiple times. `panelr` provides some useful infrastructure, like a 

`panel_data` object class, as well as automating some emerging methods for

analyses of these data.

`wbm()` automates the "within-between" (also known as

"between-within" and "hybrid") specification that combines the

desirable aspects of both fixed effects and random effects econometric models

and fits them using the `lme4` package in the backend. Bayesian estimation of 

these models is supported by interfacing with the `brms` package (`wbm_stan()`)

and GEE estimation via `geepack` (`wbgee()`). 

It also automates the fairly new "asymmetric effects" specification described

by [Allison (2019)](https://journals.sagepub.com/doi/10.1177/2378023119826441)

and supports estimation via GLS for linear asymmetric effects

models (`asym()`) and via GEE for non-Gaussian models (`asym_gee()`).

## Installation

`panelr` is now available via CRAN.

```{r eval = FALSE}

install.packages("panelr")

```

## Usage

### `panel_data` frames

While not strictly required, the best way to start is to declare your data

as panel data. I'll load the example data `WageData` to demonstrate.

```{r, message = FALSE}

library(panelr)

data("WageData")

colnames(WageData)

```

The two key variables here are `t` and `id`. `t` is the wave of the survey the

row of the data refers to while `id` is the survey respondent. This is a 

perfectly balanced data set, so there are 7 observations for each of the 595

respondents. We will use those two pieces of information to create a 

`panel_data` object.

```{r}

wages <- panel_data(WageData, id = id, wave = t)

wages

```

We have to tell `panel_data()` which column refers to the unique identifiers

for respondents/entities (the latter when you have something like countries

or companies instead of people) and which column refers to the period/wave of

data collection. 

Note that the resulting `panel_data` object will remember which of the columns

is the ID column and which is the wave column. It will also fight you a bit 

when you do things that might have the side effect of dropping those columns

or putting them out of time order.

`panel_data` frames are modified tibbles 

([`tibble` package](https://tibble.tidyverse.org/)) that are grouped by entity 

(i.e., the ID column).

`panel_data` frames are meant to play nice with the 

[`tidyverse`](https://www.tidyverse.org). Here's a quick sample of how a tidy workflow

with `panelr` can work:

```{r}

library(dplyr)

data("WageData")

# Create `panel_data` object

wages <- panel_data(WageData, id = id, wave = t) %>%

  # Pass to mutate, which will calculate statistics groupwise when appropriate

  mutate(

    wage = exp(lwage), # reverse transform the log wage variable

    mean_wage_individual = mean(wage), # means calculated separately by entity

    lag_wage = lag(wage) # mutate() will calculate lagged values correctly

  ) %>%

  # Use `panelr`'s complete_data() to filter for entities that have

  # enough observations

  complete_data(wage, union, min.waves = 5) %>% # drop if there aren't 5 completions

  # You can use unpanel() if you need to do rowwise or columnwise operations

  unpanel() %>%

  mutate(

    mean_wage_grand = mean(wage)

  ) %>%

  # You'll need to convert back to panel_data if you want to keep using panelr functions

  panel_data(id = id, wave = t)

```

### `wbm()` --- the within-between model

Anyone can fit a within-between model without the use of this package as it is

just a particular specification of a multilevel model. With that said, it's 

something that will require some programming and could be rather prone to 

error. In the best case, it is cumbersome and inefficient to create the 

necessary variables. 

`wbm()` is the primary model-fitting function that you'll use from this package 

and it fits within-between models for you, utilizing

[`lme4`](https://cran.r-project.org/package=lme4) as a 

backend for estimation. 

A three-part model syntax is used that goes like this:

`dv ~ varying_variables | invariant_variables | cross_level_interactions/random effects`

It works like a typical formula otherwise. The bars just tell `panelr` how to 

treat the variables. Note also that you can specify random slopes using

`lme4`-style syntax in the third part of the formula as well. A random intercept

for the ID variable is included by default and doesn't need to be specified

in the formula.

Lagged variables are supported as well through the `lag()` function. Unlike base

R, `panelr` lags the variables correctly --- wave 1 observations will have NA

values for the lagged variable rather than taking the final wave value of the

previous entity. 

Here we will specify a model using the `wages` data. We will predict 

logged wages (`lwage`) using two time-varying variables --- lagged

union membership (`union`) and contemporaneous weeks worked (`wks`) --- along 

with a time-invariant predictor, a binary indicator for black race (`blk`).

For demonstrative purposes, we'll fit a random slope for `lag(union)` and a 

cross-level interaction between `blk` and `wks`.

```{r message = FALSE}

model <- wbm(lwage ~ lag(union) + wks | blk | blk * wks + (lag(union) | id), data = wages)

summary(model)

```

Note that `imean()` is an internal function that calculates the individual-level

mean, which represents the between-subjects effects of the time-varying 

predictors. The within effects are the time-varying predictors at the occasion 

level with the individual-level mean subtracted. If you want the model specified

such that the occasion level predictors do not have the mean subtracted, use

the `model = "contextual"` argument. The "contextual" label refers to the way 

these terms are normally interpreted when it is specified that way. 

You may also use `model = "between"` to fit what econometricians call the 

random effects model, which does not disaggregate the within- and between-entity

variation.

### `widen_panel()` and `long_panel()`

Two functions that should cover your bases for the tricky business of 

**reshaping** panel data are included. Sometimes, like for doing SEM-based

analyses, you need your data in wide format --- i.e., one row per entity.

`widen_panel()` makes that easy and should require minimal trial and error or 

thinking.

Perhaps more often, your raw data are already in wide format and you need

to get it into long format to do cool stuff like use `wbm()`. That can be very

tricky, but `long_panel()` (I didn't think `lengthen_panel()` or `longen_panel()`

quite worked as names) should cover most situations. You tell it what the

labels for periods are (e.g., does it range from `1` to `5`, `"A"` to `"E"`,

or something else?), where they are located (before or after the variable's

name?), and what kinds of formatting go before/after it. Check out the

vignette for more details and some worked examples.

## Contributing

I'm happy to receive bug reports, suggestions, questions, and (most of all)

contributions to fix problems and add features. I prefer you use the Github 

issues system over trying to reach out to me in other ways. Pull requests for

contributions are encouraged.

Please note that this project is released with a 

[Contributor Code of Conduct](https://github.com/jacob-long/panelr/blob/master/CONDUCT.md). By participating in this project you

agree to abide by its terms.

## License

The source code of this package is licensed under the 

[MIT License](https://opensource.org/license/mit/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jacob-long/panelr

Awesome Lists containing this project

README