https://github.com/lauken13/mrpkit

Tools and tutorials for multi-level regression and post-stratification of survey data
https://github.com/lauken13/mrpkit

Last synced: 7 months ago
JSON representation

Tools and tutorials for multi-level regression and post-stratification of survey data

Host: GitHub
URL: https://github.com/lauken13/mrpkit
Owner: lauken13
License: other
Created: 2020-05-21T17:15:16.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-03-05T15:24:01.000Z (almost 2 years ago)
Last Synced: 2025-04-06T03:24:17.995Z (8 months ago)
Language: R
Size: 17.7 MB
Stars: 10
Watchers: 3
Forks: 0
Open Issues: 30
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - lauken13/mrpkit - Tools and tutorials for multi-level regression and post-stratification of survey data (R)

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# mrpkit

[![CRAN status](https://www.r-pkg.org/badges/version/mrpkit)](https://CRAN.R-project.org/package=mrpkit)

[![R-CMD-check](https://github.com/lauken13/mrpkit/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/lauken13/mrpkit/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/lauken13/mrpkit/branch/master/graph/badge.svg)](https://app.codecov.io/gh/lauken13/mrpkit?branch=master)

**NOTE: This package is still a work in progress and is yet not released or officially supported**

For students and researchers who are comfortable with at least `glm()` and want

to conduct multilevel regression with post-stratification (MRP), the `mrpkit` R

package provides a reproducible, opinionated, and highly structured workflow.

Unlike writing all the code yourself, using `mrpkit` proactively addresses many

common issues, and makes it possible for people who are new to MRP to quickly

conduct their first analysis.

The package first assists in setting up the survey data and relationships

between different variables in the sample and the population. From there, a

substantial amount of data cleaning is automated, saving time and reducing the

risk of coding errors. `mrpkit` has native support for multilevel binomial and

Bernoulli models fit with lme4 and Stan (via brms and rstanarm) and also allows

for the use of custom modeling functions. After model fitting, `mrpkit` handles

the post-stratification step, producing population and sub-population estimates.

Summary statistics and simple visualizations of the resulting MRP estimates are

provided.

## Installation

You can install the development version of `mrpkit` from GitHub:

``` r

# install.packages("remotes")

remotes::install_github("lauken13/mrpkit")

```

### License 

`mrpkit` is licensed under an MIT license. See the `LICENSE.md` file.

## Example

```{r example}

library(mrpkit)

# Some fake survey data for demonstration

head(shape_survey)

# Create SurveyData object for the sample

box_prefs <- SurveyData$new(

  data = shape_survey,

  questions = list(

    age = "Please identify your age group",

    gender = "Please select your gender",

    vote_for = "Which party did you vote for in the 2018 election?",

    y = "If today is the election day, would you vote for the Box Party?"

  ),

  responses = list(

    age = levels(shape_survey$age),

    gender = levels(shape_survey$gender),

    # Here we use a dataframe for the responses because the levels in the data are abridged versions of the actual responses

    # This can be useful when surveys have brief/non descriptive responses.

    vote_for = data.frame(data = levels(shape_survey$vote_for),

    asked = c("Box Party Faction A", "Box Party Faction B", "Circle Party Coalition", "Circle Party")),

    y = c("no","yes")

  ),

  weights = "wt",

  design = list(ids =~1)

)

box_prefs$print()

box_prefs$n_questions()

# Some fake population data for demonstration

head(approx_voters_popn)

# Create SurveyData object for the population

popn_obj <- SurveyData$new(

  data = approx_voters_popn,

  questions = list(

    age_group = "Which age group are you?",

    gender = "Gender?",

    vote_pref = "Which party do you prefer to vote for?"

  ),

  # order doesn't matter (gender before age here) because

  # the list has the names of the variables

  responses = list(

    gender = levels(approx_voters_popn$gender),

    age_group = levels(approx_voters_popn$age_group),

    vote_pref = levels(approx_voters_popn$vote_pref)

  ),

  weights = "wt"

)

popn_obj$print()

# Create the QuestionMap objects mapping each question between the

# survey and population dataset

q_age <- QuestionMap$new(

  name = "age",

  col_names = c("age","age_group"),

  values_map = list(

    "18-25" = "18-35", "26-35" = "18-35","36-45" = "36-55",

    "46-55" = "36-55", "56-65" = "56-65", "66-75" = "66+", "76-90" = "66+"

  )

)

print(q_age)

q_party_pref <- QuestionMap$new(

  name = "party_pref",

  col_names = c("vote_for","vote_pref"),

  values_map = list("Box Party" = "BP",  "BP" = "BP","Circle Party" = "CP", "CP" = "CP")

)

q_gender <- QuestionMap$new(

  name = "gender",

  col_names = c("gender", "gender"),

  values_map = list("male" = "m","female" = "f", "nonbinary" = "nb")

)

# Create SurveyMap object adding all questions at once

ex_map <- SurveyMap$new(

  sample = box_prefs,

  population = popn_obj,

  q_age,

  q_party_pref,

  q_gender

)

print(ex_map) # or ex_map$print()

# Or can add questions incrementally

ex_map <- SurveyMap$new(sample = box_prefs, population = popn_obj)

print(ex_map)

ex_map$add(q_age, q_party_pref)

print(ex_map)

ex_map$add(q_gender)

print(ex_map)

# Create the mapping between sample and population

ex_map$mapping()

# Create the poststratification data frame using all variables in the mapping

# (alternatively, can specify particular variables, e.g. tabulate("age"))

ex_map$tabulate()

# Take a peak at the poststrat data frame

head(ex_map$poststrat_data())

# Fit regression model using rstanarm (returns a SurveyFit object)

fit_1 <- ex_map$fit(

  fun = rstanarm::stan_glmer,

  formula = y ~ (1|age) + (1|gender),

  family = "binomial",

  seed = 1111,

  chains = 1, # just to keep the example fast and small

  refresh = 0 # suppress printed sampling iteration updates

)

# To use lme4 or brms instead of rstanarm you would use: 

# Example lme4 usage

# fit_2 <- ex_map$fit(

#   fun = lme4::glmer,

#   formula = y ~ (1|age) + (1|gender),

#   family = "binomial"

# )

# Example brms usage

# fit_3 <- ex_map$fit(

#   fun = brms::brm,

#   formula = y ~ (1|age) + (1|gender),

#   family = "bernoulli",

#   seed = 1111

# )

# Predicted probabilities

# returns matrix with rows for poststrat cells, cols for posterior draws

poststrat_estimates <- fit_1$population_predict()

# Compute and summarize estimates by age level and party preference

estimates_by_age <- fit_1$aggregate(poststrat_estimates, by = "age")

estimates_by_party <- fit_1$aggregate(poststrat_estimates, by = "party_pref")

fit_1$summary(estimates_by_age)

fit_1$summary(estimates_by_party)

# Plot estimates

fit_1$plot(estimates_by_party)

fit_1$plot(estimates_by_age)

fit_1$plot(estimates_by_age, additional_stats = "none")

fit_1$plot(estimates_by_age, additional_stats = "wtd")

fit_1$plot(estimates_by_age, additional_stats = "raw")

fit_1$plot(estimates_by_age, additional_stats = c("wtd","raw","mrp"))

# Compute and summarize the population estimate

estimates_popn <- fit_1$aggregate(poststrat_estimates)

fit_1$summary(estimates_popn)

# Plot population estimate

fit_1$plot(estimates_popn)

fit_1$plot(estimates_popn, additional_stats = "none")

fit_1$plot(estimates_popn, additional_stats = "wtd")

fit_1$plot(estimates_popn, additional_stats = "raw")

fit_1$plot(estimates_popn, additional_stats = c("wtd","raw","mrp"))

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lauken13/mrpkit

Awesome Lists containing this project

README