{"id":15051453,"url":"https://github.com/lydialucchesi/smallsets","last_synced_at":"2025-06-24T19:45:43.932Z","repository":{"id":40389942,"uuid":"305591994","full_name":"lydialucchesi/smallsets","owner":"lydialucchesi","description":"Visual documentation for data preprocessing in R and Python","archived":false,"fork":false,"pushed_at":"2025-01-23T18:31:56.000Z","size":16830,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-17T05:05:40.820Z","etag":null,"topics":["data-science","data-visualization","documentation-tool","machine-learning","preprocessing","python","r","r-package","visualization-tools"],"latest_commit_sha":null,"homepage":"https://lydialucchesi.github.io/smallsets/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lydialucchesi.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-20T04:42:32.000Z","updated_at":"2025-02-04T19:24:43.000Z","dependencies_parsed_at":"2023-12-21T07:03:41.590Z","dependency_job_id":"20f66978-6b9e-4649-826f-21d017c49eb1","html_url":"https://github.com/lydialucchesi/smallsets","commit_stats":{"total_commits":226,"total_committers":4,"mean_commits":56.5,"dds":"0.026548672566371723","last_synced_commit":"e66a5c1321e90b09f7bf1dce6fca94e60026f075"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lydialucchesi%2Fsmallsets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lydialucchesi%2Fsmallsets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lydialucchesi%2Fsmallsets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lydialucchesi%2Fsmallsets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lydialucchesi","download_url":"https://codeload.github.com/lydialucchesi/smallsets/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248144960,"owners_count":21055017,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-visualization","documentation-tool","machine-learning","preprocessing","python","r","r-package","visualization-tools"],"created_at":"2024-09-24T21:35:21.366Z","updated_at":"2025-04-10T02:41:49.982Z","avatar_url":"https://github.com/lydialucchesi.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r, echo=FALSE, out.width=\"17%\", fig.align=\"right\", out.extra='style=\"float:right; padding:15px\"'}\nknitr::include_graphics(\"man/figures/hex_sticker.png\")\n```\n\n# smallsets: Visual Documentation for Data Preprocessing in R and Python\n\n[![CRAN status](https://www.r-pkg.org/badges/version/smallsets)](https://CRAN.R-project.org/package=smallsets)\n![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/smallsets)\n\n**`smallsets` website: [lydialucchesi.github.io/smallsets/](https://lydialucchesi.github.io/smallsets/)**\n\nDo you use R or Python to preprocess datasets for analyses? `smallsets` is an R package (https://CRAN.R-project.org/package=smallsets) that transforms the preprocessing code in your R, R Markdown, Python, or Jupyter Notebook file into a Smallset Timeline. A Smallset Timeline is a static, compact visualisation composed of small data snapshots of different preprocessing steps. A full description of the Smallset Timeline can be found in the paper [**Smallset Timelines: A Visual Representation of Data Preprocessing Decisions**](https://doi.org/10.1145/3531146.3533175) in the proceedings of ACM FAccT '22.\n\nThe `smallsets` user guide is available [here](https://lydialucchesi.github.io/smallsets/articles/smallsets.html) and in the package in `vignette(\"smallsets\")`. If you have questions or would like help building a Smallset Timeline, please [email Lydia](mailto:lydia.lucchesi@anu.edu.au).\n\n**[Download the smallsets cheatsheet (1-page PDF)](https://lydialucchesi.github.io/smallsets_cheatsheet/smallsets_cheatsheet.pdf)**\n\n## Install from CRAN\n\n```{r, eval=FALSE}\ninstall.packages(\"smallsets\")\n```\n\n## Quick start example\n\nRun this snippet of code to build your first Smallset Timeline! It's based on the synthetic dataset s_data, with 100 observations and eight variables (C1-C8), and the preprocessing script s_data_preprocess.R, discussed in the following section.\n\n```{r quick-start-example, eval=FALSE}\nlibrary(smallsets)\n\nset.seed(145)\n\nSmallset_Timeline(data = s_data,\n                  code = system.file(\"s_data_preprocess.R\", package = \"smallsets\"))\n```\n\n![](man/figures/quick_start_figure.png)\n\n## Structured comments\n\nThe Smallset Timeline above is based on the R preprocessing script below, s_data_preprocess.R. Structured comments were added to it, informing `smallsets` what to do.\n\n```{r, code=readLines(system.file(\"s_data_preprocess.R\", package=\"smallsets\")), eval=FALSE, class.source=\"view-only\"}\n```\n\n## Citing `smallsets`\n\nIf you use the `smallsets` software, please cite the Smallset Timeline paper.\n\nLydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022. Smallset Timelines: A Visual Representation of Data Preprocessing Decisions. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 1136–1153. https://doi.org/10.1145/3531146.3533175\n\n```\n@inproceedings{SmallsetTimelines, \n  author = {Lucchesi, Lydia R. and Kuhnert, Petra M. and Davis, Jenny L. and Xie, Lexing}, \n  title = {Smallset Timelines: A Visual Representation of Data Preprocessing Decisions}, \n  year = {2022}, \n  isbn = {9781450393522}, \n  publisher = {Association for Computing Machinery}, \n  address = {New York, NY, USA}, \n  url = {https://doi.org/10.1145/3531146.3533175}, \n  doi = {10.1145/3531146.3533175}, \n  location = {Seoul, Republic of Korea}, \n  series = {FAccT '22}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flydialucchesi%2Fsmallsets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flydialucchesi%2Fsmallsets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flydialucchesi%2Fsmallsets/lists"}