{"id":24775657,"url":"https://github.com/inseefr/gustave","last_synced_at":"2025-10-12T00:31:28.449Z","repository":{"id":56934409,"uuid":"102273910","full_name":"InseeFr/gustave","owner":"InseeFr","description":"Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation","archived":false,"fork":false,"pushed_at":"2024-10-18T15:11:13.000Z","size":5073,"stargazers_count":9,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-11T11:10:02.519Z","etag":null,"topics":["official-statistics","r","sampling","survey-sampling","variance-estimation"],"latest_commit_sha":null,"homepage":"https://CRAN.R-project.org/package=gustave","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InseeFr.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-09-03T15:24:29.000Z","updated_at":"2024-12-31T16:52:54.000Z","dependencies_parsed_at":"2024-01-15T18:41:30.562Z","dependency_job_id":"5de96ed6-8342-4115-b395-42da65876e34","html_url":"https://github.com/InseeFr/gustave","commit_stats":{"total_commits":386,"total_committers":8,"mean_commits":48.25,"dds":0.1295336787564767,"last_synced_commit":"ee16dfa954fe76374168ac0ed30a3b22866018e5"},"previous_names":["inseefr/gustave","martinchevalier/gustave"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/InseeFr/gustave","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InseeFr%2Fgustave","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InseeFr%2Fgustave/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InseeFr%2Fgustave/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InseeFr%2Fgustave/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InseeFr","download_url":"https://codeload.github.com/InseeFr/gustave/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InseeFr%2Fgustave/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279009509,"owners_count":26084609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["official-statistics","r","sampling","survey-sampling","variance-estimation"],"created_at":"2025-01-29T06:55:10.626Z","updated_at":"2025-10-12T00:31:24.489Z","avatar_url":"https://github.com/InseeFr.png","language":"R","readme":"gustave [![R-CMD-check](https://github.com/InseeFr/gustave/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/InseeFr/gustave/actions/workflows/R-CMD-check.yaml) [![CRAN_Status](http://www.r-pkg.org/badges/version/gustave)](https://cran.r-project.org/package=gustave) [![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](https://github.com/SNStatComp/awesome-official-statistics-software) \n=======\n\nGustave (Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation) is an R package that provides a **toolkit for analytical variance estimation in survey sampling**. \n\nApart from the implementation of standard variance estimators (Sen-Yates-Grundy, Deville-Tillé), its main feature is to **help he methodologist produce easy-to-use variance estimation *wrappers***, where systematic operations (statistic linearization, domain estimation) are handled in a consistent and transparent way. \n\nThe **ready-to-use variance estimation wrapper `qvar()`**, adapted for common cases (e.g. stratified simple random sampling, non-response correction through reweighting in homogeneous response groups, calibration), is also included. The core functions of the package (e.g. `define_variance_wrapper()`) are to be used for more complex cases.\n\n## Install\n\ngustave is available on CRAN and can therefore be installed with the `install.packages()` function:\n\n```\ninstall.packages(\"gustave\")\n```\n\nHowever, if you wish to install the latest version of gustave, you can use `devtools::install_github()` to install it directly from the [github.com repository](https://github.com/InseeFr/gustave):\n\n```\ninstall.packages(\"devtools\")\ndevtools::install_github(\"martinchevalier/gustave\")\n```\n\n## Example\n\nIn this example, we aim at estimating the variance of estimators computed using simulated data inspired from the Information and communication technology (ICT) survey. This survey has the following characteristics:\n\n- stratified one-stage sampling design;\n- non-response correction through reweighting in homogeneous response groups based on economic sub-sector and turnover;\n- calibration on margins (number of firms and turnover broken down by economic sub-sector).\n\nThe ICT simulated data files are shipped with the gustave package:\n\n```\nlibrary(gustave)\ndata(package = \"gustave\")\n? ict_survey\n```\n\n### Methodological description of the survey\n\nA variance estimation can be perform in a single call of `qvar()`:\n```\nqvar(\n\n  # Sample file\n  data = ict_sample,\n  \n  # Dissemination and identification information\n  dissemination_dummy = \"dissemination\",\n  dissemination_weight = \"w_calib\",\n  id = \"firm_id\",\n  \n  # Scope\n  scope_dummy = \"scope\",\n  \n  # Sampling design\n  sampling_weight = \"w_sample\", \n  strata = \"strata\",\n  \n  # Non-response correction\n  nrc_weight = \"w_nrc\", \n  response_dummy = \"resp\", \n  hrg = \"hrg\",\n  \n  # Calibration\n  calibration_weight = \"w_calib\",\n  calibration_var = c(paste0(\"N_\", 58:63), paste0(\"turnover_\", 58:63)),\n  \n  # Statistic(s) and variable(s) of interest\n  mean(employees)\n \n)\n```\n\nThe survey methodology description is however cumbersome when several variance estimations are to be conducted. As it does not change from one estimation to another, it could be defined once and for all and then re-used for all variance estimations. `qvar()` allows for this by defining a so-called variance *wrapper*, that is an easy-to-use function where the variance estimation methodology for the given survey is implemented and all the technical data used to do so included.\n\n```\n# Definition of the variance estimation wrapper precision_ict\nprecision_ict \u003c- qvar(\n\n  # As before\n  data = ict_sample,\n  dissemination_dummy = \"dissemination\",\n  dissemination_weight = \"w_calib\",\n  id = \"firm_id\",\n  scope_dummy = \"scope\",\n  sampling_weight = \"w_sample\", \n  strata = \"strata\",\n  nrc_weight = \"w_nrc\", \n  response_dummy = \"resp\", \n  hrg = \"hrg\",\n  calibration_weight = \"w_calib\",\n  calibration_var = c(paste0(\"N_\", 58:63), paste0(\"turnover_\", 58:63)),\n  \n  # Replacing the variables of interest by define = TRUE\n  define = TRUE\n  \n)\n\n# Use of the variance estimation wrapper\nprecision_ict(ict_sample, mean(employees))\n\n# The variance estimation wrapper can also be used on the survey file\nprecision_ict(ict_survey, mean(speed_quanti))\n```\n\n### Features of the variance estimation wrapper\n\nThe variance estimation *wrapper* is much easier-to-use than a standard variance estimation function: \n\n- several statistics in one call (with optional labels): \n\n    ```\n    precision_ict(ict_survey, \n      \"Mean internet speed in Mbps\" = mean(speed_quanti), \n      \"Turnover per employee\" = ratio(turnover, employees)\n    )\n    ```\n    \n- domain estimation with where and by arguments\n\n    ```\n    precision_ict(ict_survey, \n      mean(speed_quanti), \n      where = employees \u003e= 50\n    )\n    precision_ict(ict_survey, \n      mean(speed_quanti), \n      by = division\n    )\n    \n    # Domain may differ from one estimator to another\n    precision_ict(ict_survey, \n      \"Mean turnover, firms with 50 employees or more\" = mean(turnover, where = employees \u003e= 50),\n      \"Mean turnover, firms with 100 employees or more\" = mean(turnover, where = employees \u003e= 100)\n    )\n    ```\n\n- handy variable evaluation\n\n    ```\n    # On-the-fly evaluation (e.g. discretization)\n    precision_ict(ict_survey, mean(speed_quanti \u003e 100))\n    \n    # Automatic discretization for qualitative (character or factor) variables\n    precision_ict(ict_survey, mean(speed_quali))\n    \n    # Standard evaluation capabilities\n    variables_of_interest \u003c- c(\"speed_quanti\", \"speed_quali\")\n    precision_ict(ict_survey, mean(variables_of_interest))\n    ```\n    \n- Integration with %\u003e% and dplyr\n\n    ```\n    library(dplyr)\n    ict_survey %\u003e% \n      precision_ict(\"Internet speed above 100 Mbps\" = mean(speed_quanti \u003e 100)) %\u003e% \n      select(label, est, lower, upper)\n    ```\n\n## Colophon\n\nThis software is an [R](https://cran.r-project.org/) package developed with the [RStudio IDE](https://posit.co/) and the [devtools](https://CRAN.R-project.org/package=devtools), [roxygen2](https://CRAN.R-project.org/package=roxygen2) and [testthat](https://CRAN.R-project.org/package=testthat) packages. Much help was found in [R packages](https://r-pkgs.org/) and [Advanced R](https://adv-r.hadley.nz/) both written by [Hadley Wickham](https://hadley.nz/).\n\nFrom the methodological point of view, this package is related to the [Poulpe SAS macro (in French)](http://jms-insee.fr/jms1998_programme/#1513415199356-a8a1bdde-becd) developed at the French statistical institute. From the implementation point of view, some inspiration was found in the [ggplot2](https://CRAN.R-project.org/package=ggplot2) package. The idea of developing an R package on this specific topic was stimulated by the [icarus](https://CRAN.R-project.org/package=icarus) package and its author.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finseefr%2Fgustave","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finseefr%2Fgustave","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finseefr%2Fgustave/lists"}