{"id":13399280,"url":"https://github.com/ropensci/skimr","last_synced_at":"2025-05-13T21:04:59.640Z","repository":{"id":21344840,"uuid":"92435362","full_name":"ropensci/skimr","owner":"ropensci","description":"A frictionless, pipeable approach to dealing with summary statistics","archived":false,"fork":false,"pushed_at":"2025-01-28T22:58:29.000Z","size":3948,"stargazers_count":1125,"open_issues_count":32,"forks_count":79,"subscribers_count":32,"default_branch":"main","last_synced_at":"2025-04-19T08:51:29.383Z","etag":null,"topics":["peer-reviewed","r","r-package","ropensci","rstats","summary-statistics","unconf","unconf17"],"latest_commit_sha":null,"homepage":"https://docs.ropensci.org/skimr","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2017-05-25T19:04:26.000Z","updated_at":"2025-04-17T00:17:28.000Z","dependencies_parsed_at":"2024-12-12T14:00:32.936Z","dependency_job_id":"524dfd8b-90d8-4b5e-88ed-fccc3ba1d8f5","html_url":"https://github.com/ropensci/skimr","commit_stats":{"total_commits":1002,"total_committers":43,"mean_commits":"23.302325581395348","dds":0.5668662674650699,"last_synced_commit":"d5126aa020e703f37740af7ee56a4acb5830fd08"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fskimr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fskimr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fskimr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2Fskimr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/skimr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311332,"owners_count":21569009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["peer-reviewed","r","r-package","ropensci","rstats","summary-statistics","unconf","unconf17"],"created_at":"2024-07-30T19:00:35.941Z","updated_at":"2025-04-28T12:11:03.209Z","avatar_url":"https://github.com/ropensci.png","language":"HTML","funding_links":[],"categories":["HTML","其他_机器学习与深度学习"],"sub_categories":[],"readme":"---\noutput: md_document\n---\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n# skimr \u003ca href='https://docs.ropensci.org/skimr/'\u003e\n\u003cimg src='https://docs.ropensci.org/skimr/reference/figures/logo.png'\nalign=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\n```{r set-options, echo=FALSE, message=FALSE}\nlibrary(skimr)\noptions(pillar.width = Inf)\noptions(width = 100)\n```\n\n\u003c!-- badges: start --\u003e\n[![Project Status: Active – The project has reached a stable, usable\nstate and is being actively\ndeveloped.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/)\n[![R-CMD-check](https://github.com/ropensci/skimr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ropensci/skimr/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/ropensci/skimr/graph/badge.svg)](https://app.codecov.io/gh/ropensci/skimr)\n[![This is an ROpenSci Peer reviewed\npackage](https://badges.ropensci.org/175_status.svg)](https://github.com/ropensci/software-review/issues/175)\n[![CRAN\\_Status\\_Badge](https://www.r-pkg.org/badges/version/skimr)](https://cran.r-project.org/package=skimr)\n[![cran\nchecks](https://badges.cranchecks.info/worst/skimr.svg)](https://badges.cranchecks.info/worst/skimr.svg)\n\u003c!-- badges: end --\u003e\n\n\n`skimr` provides a frictionless approach to summary statistics which conforms\nto the [principle of least\nsurprise](https://en.wikipedia.org/wiki/Principle_of_least_astonishment),\ndisplaying summary statistics the user can skim quickly to understand their\ndata. It handles different data types and returns a `skim_df` object which can\nbe included in a pipeline or displayed nicely for the human reader.\n\n**Note: `skimr` version 2 has major changes when skimr is used programmatically.\nUpgraders should review this document, the release notes and vignettes\ncarefully.**\n\n## Installation\n\nThe current released version of `skimr` can be installed from CRAN. If you wish\nto install the current build of the next release you can do so using the\nfollowing:\n\n```{r, eval = FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"ropensci/skimr\")\n```\n\nThe APIs for this branch should be considered reasonably stable but still\nsubject to change if an issue is discovered.\n\nTo install the version with the most recent changes that have not yet been\nincorporated in the main branch (and may not be):\n\n```{r, eval = FALSE}\ndevtools::install_github(\"ropensci/skimr\", ref = \"develop\")\n```\n\nDo not rely on APIs from the develop branch, as they are likely to change.\n\n## Skim statistics in the console\n\n`skimr`:\n\n- Provides a larger set of statistics than `summary()`, including missing,\n  complete, n, and sd.\n- reports each data types separately\n- handles dates, logicals, and a variety of other types\n- supports spark-bar and spark-line based on the\n  [pillar package](https://github.com/r-lib/pillar).\n\n### Separates variables by class:\n\n```{r, render = knitr::normal_print}\nskim(chickwts)\n```\n\n### Presentation is in a compact horizontal format:\n\n```{r, render = knitr::normal_print}\nskim(iris)\n```\n\n### Built in support for strings, lists and other column classes\n\n```{r, render = knitr::normal_print}\nskim(dplyr::starwars)\n```\n\n### Has a useful summary function\n\n```{r, render = knitr::normal_print}\nskim(iris) %\u003e%\n  summary()\n```\n\n### Individual columns can be selected using tidyverse-style selectors\n\n```{r, render = knitr::normal_print}\nskim(iris, Sepal.Length, Petal.Length)\n```\n\n### Handles grouped data\n\n`skim()` can handle data that has been grouped using `dplyr::group_by()`.\n\n```{r, render = knitr::normal_print}\niris %\u003e%\n  dplyr::group_by(Species) %\u003e%\n  skim()\n```\n\n### Behaves nicely in pipelines\n\n```{r, render = knitr::normal_print}\niris %\u003e%\n  skim() %\u003e%\n  dplyr::filter(numeric.sd \u003e 1)\n```\n\n## Knitted results\n\nSimply skimming a data frame will produce the horizontal print\nlayout shown above. We provide a `knit_print` method for the types of objects\nin this package so that similar results are produced in documents. To use this,\nmake sure the `skimmed` object is the last item in your code chunk.\n\n```{r}\nfaithful %\u003e%\n  skim()\n```\n\n## Customizing skimr\n\nAlthough skimr provides opinionated defaults, it is highly customizable.\nUsers can specify their own statistics, change the formatting of results,\ncreate statistics for new classes and develop skimmers for data structures\nthat are not data frames.\n\n### Specify your own statistics and classes\n\nUsers can specify their own statistics using a list combined with the\n`skim_with()` function factory. `skim_with()` returns a new `skim` function that\ncan be called on your data. You can use this factory to produce summaries for\nany type of column within your data.\n\nAssignment within a call to `skim_with()` relies on a helper function, `sfl` or\n`skimr` function list. By default, functions in the `sfl` call are appended to\nthe default skimmers, and names are automatically generated as well.\n\n```{}\nmy_skim \u003c- skim_with(numeric = sfl(mad))\nmy_skim(iris, Sepal.Length)\n```\n\nBut you can also helpers from the `tidyverse` to create new anonymous functions\nthat set particular function arguments. The behavior is the same as in `purrr`\nor `dplyr`, with both `.` and `.x` as acceptable pronouns. Setting the\n`append = FALSE` argument uses only those functions that you've provided.\n\n```{}\nmy_skim \u003c- skim_with(\n  numeric = sfl(\n    iqr = IQR,\n    p01 = ~ quantile(.x, probs = .01)\n    p99 = ~ quantile(., probs = .99)\n  ),\n  append = FALSE\n)\nmy_skim(iris, Sepal.Length)\n```\n\nAnd you can remove default skimmers by setting them to `NULL`.\n\n```{}\nmy_skim \u003c- skim_with(numeric = sfl(hist = NULL))\nmy_skim(iris, Sepal.Length)\n```\n\n### Skimming other objects\n\n`skimr` has summary functions for the following types of data by default:\n\n* `numeric` (which includes both `double` and `integer`)\n* `character`\n* `factor`\n* `logical`\n* `complex`\n* `Date`\n* `POSIXct`\n* `ts`\n* `AsIs`\n\n`skimr` also provides a small API for writing packages that provide their own\ndefault summary functions for data types not covered above. It relies on\nR S3 methods for the `get_skimmers` function. This function should return\na `sfl`, similar to customization within `skim_with()`, but you should also\nprovide a value for the `class` argument. Here's an example.\n\n```{r}\nget_skimmers.my_data_type \u003c- function(column) {\n  sfl(\n    .class = \"my_data_type\",\n    p99 = quantile(., probs = .99)\n  )\n}\n```\n\n## Limitations of current version\n\nWe are aware that there are issues with rendering the inline histograms and\nline charts in various contexts, some of which are described below.\n\n### Support for spark histograms\n\nWith versions of R before 4.2.1, there are known issues with\nprinting the spark-histogram characters when\nprinting a data frame. For example, `\"▂▅▇\"` is printed as\n`\"\u003cU+2582\u003e\u003cU+2585\u003e\u003cU+2587\u003e\"`. This longstanding problem [originates in\nthe low-level\ncode](https://stat.ethz.ch/pipermail/r-devel/2015-May/071250.html)\nfor printing dataframes.\nWhile some cases have been addressed, there are, for example, reports of this\nissue in Emacs ESS. While this is a deep issue, there is [ongoing\nwork to address it in base R](https://blog.r-project.org/2020/05/02/utf-8-support-on-windows/). \nWe recommend upgrading to at least R 4.2.1 to address this issue.\n\nThis means that while `skimr` can render the histograms to the console and in\nRMarkdown documents, it cannot in other circumstances. This includes:\n\n* converting a `skimr` data frame to a vanilla R data frame, but tibbles render\n  correctly\n* in the context of rendering to a pdf using an engine that does not support\n  utf-8.\n\nOne workaround for showing these characters in Windows is to set the CTYPE part\nof your locale to Chinese/Japanese/Korean with `Sys.setlocale(\"LC_CTYPE\",\n\"Chinese\")`. The helper function `fix_windows_histograms()` does this for you.\n\nAnd last but not least, we provide `skim_without_charts()` as a fallback.\nThis makes it easy to still get summaries of your data, even if unicode issues\ncontinue.\n\n### Printing spark histograms and line graphs in knitted documents\n\nSpark-bar and spark-line work in the console, but may not work when you knit\nthem to a specific document format. The same session that produces a correctly\nrendered HTML document may produce an incorrectly rendered PDF, for example.\nThis issue can generally be addressed by changing fonts to one with good\nbuilding block (for histograms) and Braille support (for line graphs). For\nexample, the open font \"DejaVu Sans\" from the `extrafont` package supports\nthese. You may also want to try wrapping your results in `knitr::kable()`.\nPlease see the vignette on using fonts for details.\n\nDisplays in documents of different types will vary. For example, one user found\nthat the font \"Yu Gothic UI Semilight\" produced consistent results for\nMicrosoft Word and Libre Office Write.\n\n## Inspirations\n\n* [TextPlots](https://github.com/sunetos/TextPlots.jl) for use of Braille\n  characters\n\n* [spark](https://github.com/holman/spark) for use of block characters.\n\nThe earliest use of unicode characters to generate sparklines appears to be [from 2009](https://blog.jonudell.net/2009/01/13/fuel-prices-and-pageviews/).\n\nExercising these ideas to their fullest requires a font with good support for block drawing characters. [PragamataPro](https://fsd.it/shop/fonts/pragmatapro/) is one such font.\n\n## Contributing\n\nWe welcome issue reports and pull requests, including potentially adding\nsupport for commonly used variable classes. However, in general, we encourage\nusers to take advantage of skimr's flexibility to add their own customized\nclasses. Please see the\n[contributing](https://docs.ropensci.org/skimr/CONTRIBUTING.html) and\n[conduct](https://ropensci.org/code-of-conduct/) documents.\n\n[![ropenci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fskimr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2Fskimr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fskimr/lists"}