Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/gergness/srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
https://github.com/gergness/srvyr
r survey
Last synced: about 3 hours ago
JSON representation
R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
Host: GitHub
URL: https://github.com/gergness/srvyr
Owner: gergness
Created: 2015-09-26T20:24:01.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2024-10-05T17:06:52.000Z (3 months ago)
Last Synced: 2025-01-03T16:05:22.298Z (7 days ago)
Topics: r, survey
Language: R
Size: 5.01 MB
Stars: 214
Watchers: 10
Forks: 29
Open Issues: 17
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project

README

        ---

output:

  github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

# srvyr 

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/srvyr)](https://CRAN.R-project.org/package=srvyr)

[![R build status](https://github.com/gergness/srvyr/workflows/R-CMD-check/badge.svg)](https://github.com/gergness/srvyr/actions)

[![Codecov test coverage](https://codecov.io/gh/gergness/srvyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gergness/srvyr?branch=main)

[![Documentation via pkgdown](tools/pkgdownshield.svg)](http://gdfe.co/srvyr/)

srvyr brings parts of [dplyr's](https://github.com/tidyverse/dplyr/) syntax to survey

analysis, using the [survey](https://CRAN.R-project.org/package=survey) 

package.

srvyr focuses on calculating summary statistics from survey data, such as the

mean, total or quantile. It allows for the use of many dplyr verbs, such as

`summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions,

rlang's style of non-standard evaluation and more consistent return types

than the survey package.

You can try it out:

```R

install.packages("srvyr")

# or for development version

# remotes::install_github("gergness/srvyr")

```

## Example usage

First, describe the variables that define the survey's structure with the function

`as_survey()`with the bare column names of the names that you would use in functions

from the survey package like `survey::svydesign()`, `survey::svrepdesign()` or 

`survey::twophase()`.

```{r}

library(srvyr, warn.conflicts = FALSE)

data(api, package = "survey")

dstrata <- apistrat %>%

   as_survey_design(strata = stype, weights = pw)

```

Now many of the dplyr verbs are available.

* `mutate()` adds or modifies a variable.

```{r}

dstrata <- dstrata %>%

  mutate(api_diff = api00 - api99)

```

* `summarise()` calculates summary statistics such as mean, total, quantile or ratio.

```{r}

dstrata %>% 

  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))

```

* `group_by()` and then `summarise()` creates summaries by groups.

```{r}

dstrata %>% 

  group_by(stype) %>%

  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))

```

* Functions from the survey package are still available:

```{r}

my_model <- survey::svyglm(api99 ~ stype, dstrata)

summary(my_model)

```

## Learning more

Here are some free resources put together by the community about srvyr:

- **"How-to"s & examples of using srvyr**

  - srvyr's included vignette ["srvyr vs survey"](http://gdfe.co/srvyr/articles/srvyr-vs-survey.html) and the rest of the [pkgdown website](http://gdfe.co/srvyr/)

  - Stephanie Zimmer, Rebecca Powell and Isabella Velásquez's book [Exploring Complex Survey Data Analysis Using R](https://www.routledge.com/Exploring-Complex-Survey-Data-Analysis-Using-R-A-Tidy-Introduction-with-srvyr-and-survey/Zimmer-Powell-Velasquez/p/book/9781032302867?srsltid=AfmBOordog836itDOABXbcZM2BAE1WdJ6muu8sjgAIpO7WFu-x00D6HQ) (releasing in November 2024). See also their [2021 AAPOR Workshop "Tidy Survey Analysis in R using the srvyr Package"](https://github.com/szimmer/tidy-survey-aapor-2021)

  - "The Epidemiologist R Handbook", by Neale Batra et al. has a [chapter on survey analysis](https://epirhandbook.com/en/) with srvyr and survey package examples

  - Kieran Healy's book ["Data Visualization: A Practical Introduction"](https://socviz.co/modeling.html#plots-from-complex-surveys) has a section on using srvyr to visualize the ESS.

  - The IPUMS PMA team's blog had a series showing examples of using the [PMA COVID survey panel with weights](https://tech.popdata.org/pma-data-hub/index.html)

  - ["Open Case Studies: Vaping Behaviors in American Youth"](https://www.opencasestudies.org/ocs-bp-vaping-case-study/) by Carrie Wright, Michael Ontiveros, Leah Jager, Margaret Taub, and Stephanie Hicks is a detailed case study that includes using srvyr to analyze the National Youth Tobacco Survey.

  - ["How to plot Likert scales with a weighted survey in a dplyr friendly way"](https://towardsdatascience.com/how-to-plot-likert-scales-with-a-weighted-survey-in-a-dplyr-friendly-way-68df600881a) by Francisco Suárez Salas

  - The tidycensus package vignette ["Working with Census microdata"](https://walker-data.com/tidycensus/articles/pums-data.html) includes information about using the weights from the ACS retrieved from the census API.

  - ["The Joy of Calculating the Direct Standard Error for PUMS Estimates"](https://ldaly.github.io/giveinandblogit/) by GitHub user @ldaly

- **About survey statistics**

  - Thomas Lumley's book ["Complex Surveys: a guide to analysis using R"](http://r-survey.r-forge.r-project.org/svybook/)

  - [Chris Skinner. Jon Wakefield. "Introduction to the Design and Analysis of Complex Survey Data." Statist. Sci. 32 (2) 165 - 175, May 2017. 10.1214/17-STS614](https://projecteuclid.org/accountAjax/Download?downloadType=journal%20article&urlId=10.1214%2F17-STS614&isResultClick=True)

  - Sharon Lohr's textbook "Sampling: Design and Analysis". [Second ](https://www.sharonlohr.com/sampling-design-and-analysis-2e) or [Third ](https://www.sharonlohr.com/sampling-design-and-analysis-3e) Editions

  - "Survey weighting is a mess" is the opening to Andrew Gelman's ["Struggles with Survey Weighting and Regression Modeling"](http://www.stat.columbia.edu/~gelman/research/published/STS226.pdf)

  - Anthony Damico's website ["Analyze Survey Data for Free"](https://asdfree.com) has the weight specifications for a wide variety of public use survey datasets.

- **Working programmatically and/or on multiple columns at once (eg `dplyr::across` and `rlang`'s "curly curly" `{{}}`)**

  - dplyr's included package vignettes ["Column-wise operations"](https://dplyr.tidyverse.org/articles/colwise.html) & ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html)

- **Non-English resources**

  - *Em português:* ["Análise de Dados Amostrais Complexos"](https://djalmapessoa.github.io/adac/) by Djalma Pessoa and Pedro Nascimento Silva

  - *En español:* ["Usando R para jugar con los microdatos del INEGI"](https://medium.com/tacosdedatos/usando-r-para-sacar-información-de-los-microdatos-del-inegi-b21b6946cf4f) by Claudio Daniel Pacheco Castro

  - *Tiếng Việt:* ["Dịch tễ học ứng dụng và y tế công cộng với R"](https://epirhandbook.com/vn/survey-analysis.html)

  - *På norsk:* [Data med vekter i R](https://oyvindsolheim.com/code/vekter%20i%20r/) by Øyvind Bugge Solheim

- **Other cool stuff that uses srvyr**

  - A (free) graphical interface allowing exploratory data analysis of survey data without writing code: [iNZight](https://inzight.nz/) (and [survey data instructions](https://inzight.nz/docs/survey-specification.html))

  - ["serosurvey: Serological Survey Analysis For Prevalence Estimation Under Misclassification"](https://avallecam.github.io/serosurvey/) by Andree Valle Campos

  - Several packages on CRAN depend on srvyr, you can see them by looking at the [reverse Imports/Suggestions on CRAN](https://cran.r-project.org/package=srvyr).

**Still need help?**

I think the best way to get help is to form a specific question and ask it in some place like [posit's community website](https://forum.posit.co/) (known for it's friendly community) or [stackoverflow.com](https://stackoverflow.com) (maybe not known for being quite as friendly, but probably has more people). If you think you've found a bug in srvyr's code, please file an [issue on GitHub](https://github.com/gergness/srvyr/issues/new), but note that I'm not a great resource for helping specific issue, both because I have limited capacity but also because I do not consider myself an expert in the statistical methods behind survey analysis.

**Have something to add?**

These resources were mostly found via vanity searches on twitter & github. If you know of anything I missed, or have written something yourself, [please let me know in this GitHub issue]()!

## What people are saying about srvyr

> minimal changes to my #r #dplyr script to incorporate survey weights, thanks to the amazing #srvyr and #survey packages. Thanks to @gregfreedman & @tslumley. Integrates soooo nicely into tidyverse

>

> --Brian Guay ([\@BrianMGuay on Jun 16, 2021](https://twitter.com/brianmguay/status/1405224564196622338))

> Spending my afternoon using `srvyr` for tidy analysis of weighted survey data in #rstats and it's so elegant. Vignette here: https://CRAN.R-project.org/package=srvyr/vignettes/srvyr-vs-survey.html

> 

> --Chris Skovron ([\@cskovron on Nov 20, 2018](https://twitter.com/cskovron/status/1065015904784842752))

> 1. Yay!

>

> --Thomas Lumley, [in the Biased and Inefficient blog](http://notstatschat.tumblr.com/post/161225885311/pipeable-survey-analysis-in-r)

## Contributing

I do appreciate bug reports, suggestions and pull requests! I started this as a

way to learn about R package development, and am still learning, so you'll have

to bear with me. Please review the [Contributor Code of

Conduct](https://github.com/gergness/srvyr/blob/main/CODE_OF_CONDUCT.md), as all participants are required to abide by its

terms.

If you're unfamiliar with contributing to an R package, I recommend the guides

provided by Rstudio's tidyverse team, such as Jim Hester's [blog

post](https://www.tidyverse.org/blog/2017/08/contributing/) or Hadley

Wickham's [R packages book](https://r-pkgs.org/).