{"id":18430173,"url":"https://github.com/friendly/histdata","last_synced_at":"2025-05-07T15:21:06.907Z","repository":{"id":48676764,"uuid":"106572219","full_name":"friendly/HistData","owner":"friendly","description":"Data Sets from the History of Statistics and Data Visualization","archived":false,"fork":false,"pushed_at":"2025-04-29T03:28:25.000Z","size":41208,"stargazers_count":63,"open_issues_count":3,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-06T13:07:52.245Z","etag":null,"topics":["graphics","historical-data"],"latest_commit_sha":null,"homepage":"https://friendly.github.io/HistData","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/friendly.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-11T15:25:26.000Z","updated_at":"2025-04-29T03:28:29.000Z","dependencies_parsed_at":"2024-05-04T03:24:10.667Z","dependency_job_id":"0ccf1b34-f191-48fb-a57b-683fa0df8c74","html_url":"https://github.com/friendly/HistData","commit_stats":{"total_commits":209,"total_committers":4,"mean_commits":52.25,"dds":0.4593301435406698,"last_synced_commit":"70c369e1eecca17b16dd928d978c90d52e16081a"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2FHistData","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2FHistData/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2FHistData/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/friendly%2FHistData/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/friendly","download_url":"https://codeload.github.com/friendly/HistData/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252902718,"owners_count":21822288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphics","historical-data"],"created_at":"2024-11-06T05:19:47.858Z","updated_at":"2025-05-07T15:21:06.900Z","avatar_url":"https://github.com/friendly.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  warning = FALSE,\n  comment = \"##\",\n  fig.path = \"man/figures/README-\",\n  fig.height = 5,\n  fig.width = 5\n#  out.width = \"100%\"\n)\n\nlibrary(HistData)\n```\n\n\u003c!-- badges: start --\u003e\n[![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) \n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/HistData)](https://cran.r-project.org/package=HistData)\n[![HistData status badge](https://friendly.r-universe.dev/badges/HistData)](https://friendly.r-universe.dev/HistData)\n[![cranlog](http://cranlogs.r-pkg.org/badges/grand-total/HistData)](https://cran.r-project.org/package=HistData)\n[![DOI](https://zenodo.org/badge/106572219.svg)](https://zenodo.org/badge/latestdoi/106572219)\n[![docs](https://img.shields.io/badge/documentation-blue)](https://friendly.github.io/HistData/)\n\n\u003c!-- badges: end --\u003e\n\n\n# HistData  \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"200px\" /\u003e\n**Data Sets from the History of Statistics and Data Visualization**\n\nDev. Version: 0.9-4\n\nThe `HistData` package provides a collection of small data sets\nthat are interesting and important in the history of statistics and data\nvisualization. The goal of the package is to make these available, both for\ninstructional use (as examples, problem sets or projects) and for historical research\n(extending or criticizing a previous analysis).\nSome of these present interesting challenges, or opportunities to \"show off\",\nwith graphics or analysis in R. \n\nMany of the data sets have examples which reproduce an historical graph or analysis.\nThese are meant mainly as starters for more extensive re-analysis or graphical\nelaboration. If you are interested in any of these problems or data sets, I've purposely left\nlots of room to do better!\n\nThey are part of a program of research called *statistical historiography*\n(Friendly, 2007; Friendly \u0026 Denis, 2001; Friendly et-al, 2016)\nmeaning the use of statistical methods to study problems and questions in the\nhistory of statistics and graphics. A main aspect of this is the increased\nunderstanding of historical problems in science and data analysis\ntrough the process of trying to reproduce a graph or analysis using\nmodern methods. I call this \"Re-visioning\", meaning _to see again, hopefully in a new light_.\n\nThey are also used in our book, (Friendly \u0026 Wainer, 2021),\n_A History of Data Visualization \u0026 Graphic Communication_, https://www.hup.harvard.edu/books/9780674975231. \nSee also the [companion website for this book](https://friendly.github.io/HistDataVis/).\n\nIf you are looking more widely for datasets to use for examples, teaching or research, check out Vincent Arel-Bundock's\n[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/) package, with over 2200 datasets from various\nR packages, with this list of [Available datasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html).\n\n### Data science\n\nThere is another R aspect that should be noted here: \nA great deal of \"data sciency\" work was involved in constructing this package,\nalas (for teaching) not captured in the resulting CRAN-friendly package.\n\n* In some cases, data had to be **extracted** from historical documents, using a variety of techniques (web scraping, OCR of PDS files followed by conversion to a data set), each problem with its own toolbox, in R or outside. In many cases, transcription errors had to be corrected \nwith code or manually;\n* **digitization** of data from an image;\n* **conversion** of text-based data sets to a CSV file and then to an `.RData` file with proper column names. Ever seen a Unix `.shar` (shell archive) file? Well, I have.\n* **cleaning** variable names, e.g., using `janitor::clean_names()`, or, in some cases, manually editing an excel file.\n* Applying **type-conversion**, e.g., `chr` to `factor` or `ordered`; constructing appropriate contrasts for factors to facilitate re-analysis.\n* **tidying** data.frames: long \u003c--\u003e wide, abbreviations of character string labels, ...\n* **documentation**: The thankless task? No -- considerable effort was made to give detailed descriptions, notes on methods, executable examples, references to original sources and\nanalyses, ...\n\n\n\n## Installation\n\nGet the released version from CRAN \n\n    install.packages(\"HistData\")\n\nThe development version can be installed to your R library directly from github or my [R-universe](https://friendly.r-universe.dev/) via:\n\n    install.packages('HistData', repos = 'https://friendly.r-universe.dev')\n    remotes::install_github(\"friendly/HistData\")\n\n## Data sets\nHere are the data sets in the package, with links to their documentation. Some topics are represented by two or more\ndata sets.\n\n```{r datasets, results='asis'}\n# link dataset to pkgdown doc\nrefurl \u003c- \"https://friendly.github.io/HistData/reference/\"\n\ndsets \u003c- vcdExtra::datasets(\"HistData\") |\u003e \n  dplyr::select(Item, Title) |\u003e \n  dplyr::mutate(Item = glue::glue(\"[{Item}]({refurl}{Item}.html)\")) \n\n#knitr::kable(dsets)\n\nlibrary(tinytable)\n# tt(dsets) |\u003e\n#   format_tt(j = 1, markdown = TRUE) |\u003e\n#   style_tt(j = 1, bootstrap_css = \"width: 30%;\") |\u003e\n#   style_tt(j = 2, bootstrap_css = \"width: 70%;\")\ntt(dsets, width = c(.2, .8)) |\u003e \n    format_tt(j = 1, markdown = TRUE) \n#    save_tt(\"html\") |\u003e\n#    knitr::asis_output()\n```\n\n## Contributors\nPlease note that the `HistData` project is released with a [Contributor Code of Conduct](https://friendly.github.io/HistData/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\nOver the years, many people have contributed new data sets, offered corrections,\nsuggestions, or documentation examples. They are appreciatively listed below:\n\nDavid Bellhouse,\nBrian Clair,\nStephane Dray, \nLuiz Droubi,\nAntoine de Falguerolles,\nMonique Graf,\nJames Hanley, \nPeter Li, \nDennis Murphy, \nJim Oeppen,\nJames Riley,\nNeville Verlander, \nHadley Wickham. \n\n## References\n\nFriendly, M. (2007). A Brief History of Data Visualization.\nIn Chen, C., Hardle, W. \u0026 Unwin, A. (eds.)  \n*Handbook of Computational Statistics: Data Visualization*, Springer-Verlag, III, Ch. 1, 1-34.\n[Preprint](http://datavis.ca/papers/hbook.pdf)\n\nFriendly, M. \u0026 Denis, D. (2001).\nMilestones in the history of thematic cartography, statistical graphics, and data visualization.\nWeb stite: [http://datavis.ca/milestones/](http://datavis.ca/milestones/)\n\nFriendly, M. \u0026 Sigal, M. \u0026 Harnanansingh, D. (2016).\n\"The Milestones Project: A Database for the History of Data Visualization,\"\nIn Kostelnick, C. \u0026 Kimball, M. (ed.), *Visible Numbers: The History of Data Visualization*, Ashgate Press, Chapter 10. [Preprint](https://www.datavis.ca/papers/MilestonesProject.pdf)\n\nFriendly, M. \u0026 Wainer, H. (2021). _A History of Data Visualization and Graphic Communication_,\nhttps://www.hup.harvard.edu/books/9780674975231,\nHarvard University Press. Companion [web site](https://friendly.github.io/HistDataVis/)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendly%2Fhistdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffriendly%2Fhistdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendly%2Fhistdata/lists"}