{"id":13857482,"url":"https://github.com/gdemin/expss","last_synced_at":"2025-04-09T08:09:47.466Z","repository":{"id":27782899,"uuid":"31271628","full_name":"gdemin/expss","owner":"gdemin","description":"expss: Tables and Labels in R","archived":false,"fork":false,"pushed_at":"2024-04-09T15:38:46.000Z","size":16790,"stargazers_count":84,"open_issues_count":6,"forks_count":16,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-02T05:08:11.144Z","etag":null,"topics":["excel","labels","labels-support","msexcel","pivot-tables","r","recode","spss","spss-statistics","tables","variable-labels","vlookup"],"latest_commit_sha":null,"homepage":"https://cran.r-project.org/web/packages/expss/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gdemin.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":"gdemin","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":["bit.ly/2D9E3vZ","paypal.me/gdemin77"]}},"created_at":"2015-02-24T17:16:42.000Z","updated_at":"2024-11-14T21:41:11.000Z","dependencies_parsed_at":"2022-07-08T22:45:21.547Z","dependency_job_id":"c6f205d6-2631-4f34-8359-1ddaa6b86493","html_url":"https://github.com/gdemin/expss","commit_stats":{"total_commits":1376,"total_committers":6,"mean_commits":"229.33333333333334","dds":"0.10828488372093026","last_synced_commit":"668d7bace676b555cb34d5e0d633fad516c0f19b"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gdemin%2Fexpss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gdemin%2Fexpss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gdemin%2Fexpss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gdemin%2Fexpss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gdemin","download_url":"https://codeload.github.com/gdemin/expss/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247999861,"owners_count":21031046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","labels","labels-support","msexcel","pivot-tables","r","recode","spss","spss-statistics","tables","variable-labels","vlookup"],"created_at":"2024-08-05T03:01:38.500Z","updated_at":"2025-04-09T08:09:47.448Z","avatar_url":"https://github.com/gdemin.png","language":"R","funding_links":["https://patreon.com/gdemin","bit.ly/2D9E3vZ","paypal.me/gdemin77"],"categories":["R"],"sub_categories":[],"readme":"# expss\n\n[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/expss)](https://cran.r-project.org/package=expss)\n[![](https://cranlogs.r-pkg.org/badges/expss)](https://cran.rstudio.com/web/packages/expss/index.html)\n[![](https://cranlogs.r-pkg.org/badges/grand-total/expss)](https://cran.rstudio.com/web/packages/expss/index.html)\n[![Coverage Status](https://img.shields.io/codecov/c/github/gdemin/expss/master.svg)](https://codecov.io/github/gdemin/expss?branch=master)\n\n## Introduction\n\n`expss` computes and displays tables with support for 'SPSS'-style labels, multiple / nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', R notebooks, 'Shiny' and 'Jupyter' notebooks. Proper methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package offers useful functions for data processing in marketing research / social surveys - popular data transformation functions from 'SPSS' Statistics and 'Excel' ('RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP', etc.). Package is intended to help people to move data processing from 'Excel'/'SPSS' to R. See examples below. You can get help about any function by typing `?function_name` in the R console.\n\n### Links\n\n- [Online introduction](http://gdemin.github.io/expss/)\n- [expss on CRAN](https://cran.r-project.org/package=expss)\n- [expss on Github](https://github.com/gdemin/expss)\n- [expss on Stackoverflow](https://stackoverflow.com/questions/tagged/expss)\n- [Issues](https://github.com/gdemin/expss/issues)\n\n## Installation\n\n`expss` is on CRAN, so for installation you can print in the console\n`install.packages(\"expss\")`.\n\n## Cross-tablulation examples\n\nWe will use for demonstartion well-known `mtcars` dataset. Let's start with adding labels to the dataset. Then we can continue with tables creation.\n\n```R\nlibrary(expss)\ndata(mtcars)\nmtcars = apply_labels(mtcars,\n                      mpg = \"Miles/(US) gallon\",\n                      cyl = \"Number of cylinders\",\n                      disp = \"Displacement (cu.in.)\",\n                      hp = \"Gross horsepower\",\n                      drat = \"Rear axle ratio\",\n                      wt = \"Weight (1000 lbs)\",\n                      qsec = \"1/4 mile time\",\n                      vs = \"Engine\",\n                      vs = c(\"V-engine\" = 0,\n                             \"Straight engine\" = 1),\n                      am = \"Transmission\",\n                      am = c(\"Automatic\" = 0,\n                             \"Manual\"=1),\n                      gear = \"Number of forward gears\",\n                      carb = \"Number of carburetors\"\n)\n\n```\n\nFor quick cross-tabulation there are `fre` and `cross` family of function. For simplicity we demonstrate here only `cross_cpct` which calculates column percent. Documentation for other functions, such as `cross_cases` for counts, `cross_rpct` for row percent, `cross_tpct` for table percent and `cross_fun` for custom summary functions can be seen by typing `?cross_cpct` and `?cross_fun` in the console. \n\n```R\n# 'cross_*' examples\n# just simple crosstabulation, similar to base R 'table' function\ncross_cases(mtcars, am, vs)\n\n# Table column % with multiple banners\ncross_cpct(mtcars, cyl, list(total(), am, vs))\n\n# magrittr pipe usage and nested banners\nmtcars %\u003e% \n    cross_cpct(cyl, list(total(), am %nest% vs))      \n\n```\nWe have more sophisticated interface for table construction with `magrittr` piping. Table construction consists of at least of three functions chained with pipe operator: `%\u003e%`. At first we need to specify variables for which statistics will be computed with `tab_cells`. Secondary, we calculate statistics with one of the `tab_stat_*` functions. And last, we finalize table creation with `tab_pivot`, e. g.: `dataset %\u003e% tab_cells(variable) %\u003e% tab_stat_cases() %\u003e% tab_pivot()`. After that we can optionally sort table with `tab_sort_asc`, drop empty rows/columns with `drop_rc` and transpose with `tab_transpose`. Resulting table is just a `data.frame` so we can use usual R operations on it. Detailed documentation for table creation can be seen via `?tables`. For significance testing see `?significance`.\nGenerally, tables automatically translated to HTML for output in knitr or Jupyter notebooks. However, if we want HTML output in the R notebooks or in the RStudio viewer we need to set options for that: `expss_output_rnotebook()` or `expss_output_viewer()`. \n\n```R\n# simple example\nmtcars %\u003e% \n    tab_cells(cyl) %\u003e% \n    tab_cols(total(), am) %\u003e% \n    tab_stat_cpct() %\u003e% \n    tab_pivot()\n\n# table with caption\nmtcars %\u003e% \n    tab_cells(mpg, disp, hp, wt, qsec) %\u003e%\n    tab_cols(total(), am) %\u003e% \n    tab_stat_mean_sd_n() %\u003e%\n    tab_last_sig_means(subtable_marks = \"both\") %\u003e% \n    tab_pivot() %\u003e% \n    set_caption(\"Table with summary statistics and significance marks.\")\n\n# Table with the same summary statistics. Statistics labels in columns.\nmtcars %\u003e% \n    tab_cells(mpg, disp, hp, wt, qsec) %\u003e%\n    tab_cols(total(label = \"#Total| |\"), am) %\u003e% \n    tab_stat_fun(Mean = w_mean, \"Std. dev.\" = w_sd, \"Valid N\" = w_n, method = list) %\u003e%\n    tab_pivot()\n\n# Different statistics for different variables.\nmtcars %\u003e%\n    tab_cols(total(), vs) %\u003e%\n    tab_cells(mpg) %\u003e% \n    tab_stat_mean() %\u003e% \n    tab_stat_valid_n() %\u003e% \n    tab_cells(am) %\u003e%\n    tab_stat_cpct(total_row_position = \"none\", label = \"col %\") %\u003e%\n    tab_stat_rpct(total_row_position = \"none\", label = \"row %\") %\u003e%\n    tab_stat_tpct(total_row_position = \"none\", label = \"table %\") %\u003e%\n    tab_pivot(stat_position = \"inside_rows\") \n\n# Table with split by rows and with custom totals.\nmtcars %\u003e% \n    tab_cells(cyl) %\u003e% \n    tab_cols(total(), vs) %\u003e% \n    tab_rows(am) %\u003e% \n    tab_stat_cpct(total_row_position = \"above\",\n                  total_label = c(\"number of cases\", \"row %\"),\n                  total_statistic = c(\"u_cases\", \"u_rpct\")) %\u003e% \n    tab_pivot()\n\n# Linear regression by groups.\nmtcars %\u003e% \n    tab_cells(sheet(mpg, disp, hp, wt, qsec)) %\u003e% \n    tab_cols(total(label = \"#Total| |\"), am) %\u003e% \n    tab_stat_fun_df(\n        function(x){\n            frm = reformulate(\".\", response = as.name(names(x)[1]))\n            model = lm(frm, data = x)\n            sheet('Coef.' = coef(model), \n                  confint(model)\n            )\n        }    \n    ) %\u003e% \n    tab_pivot() \n```\n\n## Example of data processing with multiple-response variables\n\nHere we use truncated dataset with data from product test of two samples of\nchocolate sweets. 150 respondents tested two kinds of sweets (codenames:\nVSX123 and SDF546). Sample was divided into two groups (cells) of 75\nrespondents in each group. In cell 1 product VSX123 was presented first and\nthen SDF546. In cell 2 sweets were presented in reversed order. Questions\nabout respondent impressions about first product are in the block A (and\nabout second tested product in the block B). At the end of the questionnaire \nthere was a question about the preferences between sweets.\n\nList of variables:\n\n- `id` Respondent Id\n- `cell` First tested product (cell number)\n- `s2a` Age\n- `a1_1-a1_6` What did you like in these sweets? Multiple response. First tested product\n- `a22` Overall quality. First tested product\n- `b1_1-b1_6` What did you like in these sweets? Multiple response. Second tested product\n- `b22` Overall quality. Second tested product\n- `c1` Preferences\n\n```R\n\ndata(product_test)\n\nw = product_test # shorter name to save some keystrokes\n\n# here we recode variables from first/second tested product to separate variables for each product according to their cells\n# 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample\n# also we recode preferences from first/second product to true names\n# for first cell there are no changes, for second cell we should exchange 1 and 2.\nw = w %\u003e% \n    let_if(cell == 1, \n        h1_1 %to% h1_6 := recode(a1_1 %to% a1_6, other ~ copy),\n        p1_1 %to% p1_6 := recode(b1_1 %to% b1_6, other ~ copy),\n        h22 := recode(a22, other ~ copy), \n        p22 := recode(b22, other ~ copy),\n        c1r = c1\n    ) %\u003e% \n    let_if(cell == 2, \n        p1_1 %to% p1_6 := recode(a1_1 %to% a1_6, other ~ copy), \n        h1_1 %to% h1_6 := recode(b1_1 %to% b1_6, other ~ copy),\n        p22 := recode(a22, other ~ copy),\n        h22 := recode(b22, other ~ copy), \n        c1r := recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) \n    ) %\u003e% \n    let(\n        # recode age by groups\n        age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),\n        # count number of likes\n        # codes 2 and 99 are ignored.\n        h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6), \n        p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6) \n    )\n\n# here we prepare labels for future usage\ncodeframe_likes = num_lab(\"\n    1 Liked everything\n    2 Disliked everything\n    3 Chocolate\n    4 Appearance\n    5 Taste\n    6 Stuffing\n    7 Nuts\n    8 Consistency\n    98 Other\n    99 Hard to answer\n\")\n\noverall_liking_scale = num_lab(\"\n    1 Extremely poor \n    2 Very poor\n    3 Quite poor\n    4 Neither good, nor poor\n    5 Quite good\n    6 Very good\n    7 Excellent\n\")\n\nw = apply_labels(w, \n    c1r = \"Preferences\",\n    c1r = num_lab(\"\n        1 VSX123 \n        2 SDF456\n        3 Hard to say\n    \"),\n    \n    age_cat = \"Age\",\n    age_cat = c(\"18 - 25\" = 1, \"26 - 35\" = 2),\n    \n    h1_1 = \"Likes. VSX123\",\n    p1_1 = \"Likes. SDF456\",\n    h1_1 = codeframe_likes,\n    p1_1 = codeframe_likes,\n    \n    h_likes = \"Number of likes. VSX123\",\n    p_likes = \"Number of likes. SDF456\",\n    \n    h22 = \"Overall quality. VSX123\",\n    p22 = \"Overall quality. SDF456\",\n    h22 = overall_liking_scale,\n    p22 = overall_liking_scale\n)\n\n```\nAre there any significant differences between preferences? Yes, difference is significant.\n```R\n# 'tab_mis_val(3)' remove 'hard to say' from vector \nw %\u003e% tab_cols(total(), age_cat) %\u003e% \n      tab_cells(c1r) %\u003e% \n      tab_mis_val(3) %\u003e% \n      tab_stat_cases() %\u003e% \n      tab_last_sig_cases() %\u003e% \n      tab_pivot()\n    \n```\nFurther we calculate distribution of answers in the survey questions. \n```R\n# lets specify repeated parts of table creation chains\nbanner = w %\u003e% tab_cols(total(), age_cat, c1r) \n# column percent with significance\ntab_cpct_sig = . %\u003e% tab_stat_cpct() %\u003e% \n                    tab_last_sig_cpct(sig_labels = paste0(\"\u003cb\u003e\",LETTERS, \"\u003c/b\u003e\"))\n\n# means with siginifcance\ntab_means_sig = . %\u003e% tab_stat_mean_sd_n(labels = c(\"\u003cb\u003e\u003cu\u003eMean\u003c/u\u003e\u003c/b\u003e\", \"sd\", \"N\")) %\u003e% \n                      tab_last_sig_means(\n                          sig_labels = paste0(\"\u003cb\u003e\",LETTERS, \"\u003c/b\u003e\"),   \n                          keep = \"means\")\n\n# Preferences\nbanner %\u003e% \n    tab_cells(c1r) %\u003e% \n    tab_cpct_sig() %\u003e% \n    tab_pivot() \n\n# Overall liking\nbanner %\u003e%  \n    tab_cells(h22) %\u003e% \n    tab_means_sig() %\u003e% \n    tab_cpct_sig() %\u003e%  \n    tab_cells(p22) %\u003e% \n    tab_means_sig() %\u003e% \n    tab_cpct_sig() %\u003e%\n    tab_pivot() \n\n# Likes\nbanner %\u003e% \n    tab_cells(h_likes) %\u003e% \n    tab_means_sig() %\u003e% \n    tab_cells(mrset(h1_1 %to% h1_6)) %\u003e% \n    tab_cpct_sig() %\u003e% \n    tab_cells(p_likes) %\u003e% \n    tab_means_sig() %\u003e% \n    tab_cells(mrset(p1_1 %to% p1_6)) %\u003e% \n    tab_cpct_sig() %\u003e%\n    tab_pivot() \n\n# below more complicated table where we compare likes side by side\n# Likes - side by side comparison\nw %\u003e% \n    tab_cols(total(label = \"#Total| |\"), c1r) %\u003e% \n    tab_cells(list(unvr(mrset(h1_1 %to% h1_6)))) %\u003e% \n    tab_stat_cpct(label = var_lab(h1_1)) %\u003e% \n    tab_cells(list(unvr(mrset(p1_1 %to% p1_6)))) %\u003e% \n    tab_stat_cpct(label = var_lab(p1_1)) %\u003e% \n    tab_pivot(stat_position = \"inside_columns\") \n\n```\n\nWe can save labelled dataset as *.csv file with accompanying R code for labelling.\n\n```R\nwrite_labelled_csv(w, file  filename = \"product_test.csv\")\n```\n\nOr, we can save dataset as *.csv file with SPSS syntax to read data and apply labels.\n\n```R\nwrite_labelled_spss(w, file  filename = \"product_test.csv\")\n```\n\n## Export to Microsoft Excel\n\nTo export `expss` tables to *.xlsx you need to install excellent `openxlsx` package. To install it just type in the console `install.packages(\"openxlsx\")`. \n\n### Examples\n\nFirst we apply labels on the mtcars dataset and build simple table with caption.\n```R\nlibrary(expss)\nlibrary(openxlsx)\ndata(mtcars)\nmtcars = apply_labels(mtcars,\n                      mpg = \"Miles/(US) gallon\",\n                      cyl = \"Number of cylinders\",\n                      disp = \"Displacement (cu.in.)\",\n                      hp = \"Gross horsepower\",\n                      drat = \"Rear axle ratio\",\n                      wt = \"Weight (lb/1000)\",\n                      qsec = \"1/4 mile time\",\n                      vs = \"Engine\",\n                      vs = c(\"V-engine\" = 0,\n                             \"Straight engine\" = 1),\n                      am = \"Transmission\",\n                      am = c(\"Automatic\" = 0,\n                             \"Manual\"=1),\n                      gear = \"Number of forward gears\",\n                      carb = \"Number of carburetors\"\n)\n\nmtcars_table = mtcars %\u003e% \n    cross_cpct(\n        cell_vars = list(cyl, gear),\n        col_vars = list(total(), am, vs)\n    ) %\u003e% \n    set_caption(\"Table 1\")\n\nmtcars_table\n```\n\nThen we create workbook and add worksheet to it.\n```R\nwb = createWorkbook()\nsh = addWorksheet(wb, \"Tables\")\n```\nExport - we should specify workbook and worksheet. \n```R\nxl_write(mtcars_table, wb, sh)\n```\nAnd, finally, we save workbook with table to the xlsx file.\n```R\nsaveWorkbook(wb, \"table1.xlsx\", overwrite = TRUE)\n```\nScreenshot of the exported table:\n![table1.xlsx](vignettes/screen_xlsx1.png)\n\n### Automation of the report generation \n\nFirst of all, we create banner which we will use for all our tables. \n```R\nbanner = with(mtcars, list(total(), am, vs))\n```\nThen we generate list with all tables. If variables have small number of discrete values we create column percent table. In other cases we calculate table with means. For both types of tables we mark significant differencies between groups.\n```R\nlist_of_tables = lapply(mtcars, function(variable) {\n    if(length(unique(variable))\u003c7){\n        cro_cpct(variable, banner) %\u003e% significance_cpct()\n    } else {\n        # if number of unique values greater than seven we calculate mean\n        cro_mean_sd_n(variable, banner) %\u003e% significance_means()\n        \n    }\n    \n})\n```\nCreate workbook:\n```R\nwb = createWorkbook()\nsh = addWorksheet(wb, \"Tables\")\n```\nHere we export our list with tables with additional formatting. We remove '#' sign from totals and mark total column with bold. You can read about formatting options in the manual fro `xl_write` (`?xl_write` in the console).\n```R\nxl_write(list_of_tables, wb, sh, \n         # remove '#' sign from totals \n         col_symbols_to_remove = \"#\",\n         row_symbols_to_remove = \"#\",\n         # format total column as bold\n         other_col_labels_formats = list(\"#\" = createStyle(textDecoration = \"bold\")),\n         other_cols_formats = list(\"#\" = createStyle(textDecoration = \"bold\")),\n         )\n```\nSave workbook:\n```R\nsaveWorkbook(wb, \"report.xlsx\", overwrite = TRUE)\n```\nScreenshot of the generated report:\n![report.xlsx](vignettes/screen_xlsx2.png)\n\n## Labels support for base R\n\nVariable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable. With this description it is easier to remember what those variable names refer to.\nValue labels are similar to variable labels, but value labels are descriptions of the values a variable can take. Labeling values means we don’t have to remember if 1=Extremely poor and 7=Excellent or vice-versa. We can easily get dataset description and variables summary with `info` function.\n\nThe usual way to connect numeric data to labels in R is factor variables. However, factors miss important features which the value labels provide. Factors only allow for integers to be mapped to a text label, these integers have to be a count starting at 1 and every value need to be labelled. Also, we can’t calculate means or other numeric statistics on factors. \n\nWith labels we can manipulate short variable names and codes when we analyze our data but in the resulting tables and graphs we will see human-readable text. \n\nIt is easy to store labels as variable attributes in R but most R functions cannot use them or even drop them. `expss` package integrates value labels support into base R functions and into functions from other packages. Every function which internally converts variable to factor will utilize labels. Labels will be preserved during variables subsetting and concatenation. Additionally, there is a function (`use_labels`) which greatly simplify variable labels usage. See examples below.\n\n### Getting and setting variable and value labels\n\nFirst, apply value and variables labels to dataset:\n```R\nlibrary(expss)\ndata(mtcars)\nmtcars = apply_labels(mtcars,\n                      mpg = \"Miles/(US) gallon\",\n                      cyl = \"Number of cylinders\",\n                      disp = \"Displacement (cu.in.)\",\n                      hp = \"Gross horsepower\",\n                      drat = \"Rear axle ratio\",\n                      wt = \"Weight (1000 lbs)\",\n                      qsec = \"1/4 mile time\",\n                      vs = \"Engine\",\n                      vs = c(\"V-engine\" = 0,\n                             \"Straight engine\" = 1),\n                      am = \"Transmission\",\n                      am = c(\"Automatic\" = 0,\n                             \"Manual\"=1),\n                      gear = \"Number of forward gears\",\n                      carb = \"Number of carburetors\"\n)\n\n```\nIn addition to `apply_labels` we have SPSS-style `var_lab` and `val_lab` functions:\n```R\nnps = c(-1, 0, 1, 1, 0, 1, 1, -1)\nvar_lab(nps) = \"Net promoter score\"\nval_lab(nps) = num_lab(\"\n            -1 Detractors\n             0 Neutralists    \n             1 Promoters    \n\")\n\n```\nWe can read, add or remove existing labels:\n```R\nvar_lab(nps) # get variable label\nval_lab(nps) # get value labels\n\n# add new labels\nadd_val_lab(nps) = num_lab(\"\n                           98 Other    \n                           99 Hard to say\n                           \")\n\n# remove label by value\n# %d% - diff, %n_d% - names diff \nval_lab(nps) = val_lab(nps) %d% 98\n# or, remove value by name\nval_lab(nps) = val_lab(nps) %n_d% \"Other\"\n```\nAdditionaly, there are some utility functions. They can applied on one variable as well as on the entire dataset.\n```R\ndrop_val_labs(nps)\ndrop_var_labs(nps)\nunlab(nps)\ndrop_unused_labels(nps)\nprepend_values(nps)\n```\nThere is also `prepend_names` function but it can be applied only to data.frame.\n\n### Labels with base R and ggplot2 functions \n\nBase `table` and plotting with value labels:\n```R\nwith(mtcars, table(am, vs))\nwith(mtcars, \n     barplot(\n         table(am, vs), \n         beside = TRUE, \n         legend = TRUE)\n     )\n```\n\nThere is a special function for variables labels support - `use_labels`. By now variables labels support available only for expression which will be evaluated inside data.frame.\n```R\n# table with dimension names\nuse_labels(mtcars, table(am, vs)) \n\n# linear regression\nuse_labels(mtcars, lm(mpg ~ wt + hp + qsec)) %\u003e% summary\n\n# boxplot with variable labels\nuse_labels(mtcars, boxplot(mpg ~ am))\n```\n\nAnd, finally, `ggplot2` graphics with variables and value labels. Note that with ggplot2 version 3.2.0 and higher you need to explicitly convert labelled variables to factors in the `facet_grid` formula:\n```R\nlibrary(ggplot2, warn.conflicts = FALSE)\n\nuse_labels(mtcars, {\n    # '..data' is shortcut for all 'mtcars' data.frame inside expression \n    ggplot(..data) +\n        geom_point(aes(y = mpg, x = wt, color = qsec)) +\n        facet_grid(factor(am) ~ factor(vs))\n}) \n```\n\n### Extreme value labels support\n\nWe have an option for extreme values lables support: `expss_enable_value_labels_support_extreme()`. With this option `factor`/`as.factor` will take into account empty levels. However, `unique` will give weird result for labelled variables: labels without values will be added to unique values. That's why it is recommended to turn off this option immediately after usage. See examples. \n\nWe have label 'Hard to say' for which there are no values in `nps`:\n```R\nnps = c(-1, 0, 1, 1, 0, 1, 1, -1)\nvar_lab(nps) = \"Net promoter score\"\nval_lab(nps) = num_lab(\"\n            -1 Detractors\n             0 Neutralists    \n             1 Promoters\n             99 Hard to say\n\")\n```\nHere we disable labels support and get results without labels:\n```R\nexpss_disable_value_labels_support()\ntable(nps) # there is no labels in the result\nunique(nps)\n```\nResults with default value labels support - three labels are here but \"Hard to say\" is absent.\n```R\nexpss_enable_value_labels_support()\n# table with labels but there are no label \"Hard to say\"\ntable(nps)\nunique(nps)\n```\nAnd now extreme value labels support - we see \"Hard to say\" with zero counts. Note the weird `unique` result.\n```R\nexpss_enable_value_labels_support_extreme()\n# now we see \"Hard to say\" with zero counts\ntable(nps) \n# weird 'unique'! There is a value 99 which is absent in 'nps'\nunique(nps) \n\n```\nReturn immediately to defaults to avoid issues:\n```R\nexpss_enable_value_labels_support()\n```\n\n### Labels are preserved during common operations on the data\n\nThere are special methods for subsetting and concatenating labelled variables. These methods preserve labels during common operations. We don't need to restore labels on subsetted or sorted data.frame. \n\n`mtcars` with labels:\n```R\nstr(mtcars)\n```\nMake subset of the data.frame:\n```R\nmtcars_subset = mtcars[1:10, ]\n```\nLabels are here, nothing is lost:\n```R\nstr(mtcars_subset)\n```\n\n### Interaction with 'haven'\n\nTo use `expss` with `haven` you need to load `expss` strictly after `haven` (or other package with implemented 'labelled' class) to avoid conflicts. And it is better to use `read_spss` with explict package specification: `haven::read_spss`. See example below. \n`haven` package doesn't set 'labelled' class for variables which have variable label but don't have value labels. It leads to labels losing during subsetting and other operations. We have a special function to fix this: `add_labelled_class`. Apply it to dataset loaded by `haven`.\n\n```R\n# we need to load packages strictly in this order to avoid conflicts\nlibrary(haven)\nlibrary(expss)\nspss_data = haven::read_spss(\"spss_file.sav\")\n# add missing 'labelled' class\nspss_data = add_labelled_class(spss_data) \n```\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgdemin%2Fexpss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgdemin%2Fexpss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgdemin%2Fexpss/lists"}