{"id":18612377,"url":"https://github.com/winvector/cdata","last_synced_at":"2025-04-10T23:31:22.090Z","repository":{"id":56934845,"uuid":"86544301","full_name":"WinVector/cdata","owner":"WinVector","description":"Higher order fluid or coordinatized data transforms in R. Distributed under choice of GPL-2 or GPL-3 license.","archived":false,"fork":false,"pushed_at":"2023-08-19T23:38:50.000Z","size":9111,"stargazers_count":43,"open_issues_count":0,"forks_count":8,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-04-25T03:43:57.504Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://winvector.github.io/cdata/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WinVector.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-29T06:04:37.000Z","updated_at":"2024-02-06T13:44:59.000Z","dependencies_parsed_at":"2023-10-21T01:00:21.932Z","dependency_job_id":null,"html_url":"https://github.com/WinVector/cdata","commit_stats":{"total_commits":521,"total_committers":2,"mean_commits":260.5,"dds":"0.013435700575815779","last_synced_commit":"eee8759f2e5c4f512d7175e969d48364a61245a8"},"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fcdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fcdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fcdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fcdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WinVector","download_url":"https://codeload.github.com/WinVector/cdata/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248316042,"owners_count":21083369,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T03:16:51.100Z","updated_at":"2025-04-10T23:31:17.070Z","avatar_url":"https://github.com/WinVector.png","language":"R","readme":"---\noutput: github_document\n---\n\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/cdata)](https://cran.r-project.org/package=cdata)\n[![status](https://tinyverse.netlify.com/badge/cdata)](https://CRAN.R-project.org/package=cdata)\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n\n```{r, echo = FALSE, warning=FALSE, message=FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \" # \",\n  fig.path = \"tools/README-\"\n)\n```\n\n\n[`cdata`](https://CRAN.R-project.org/package=cdata) is a general data re-shaper that has the great virtue of adhering to Raymond's \"Rule of Representation\", and using Codd's \"Guaranteed Access Rule\".\n\n\u003e Fold knowledge into data, so program logic can be stupid and robust.\n\u003e\n\u003e [*The Art of Unix Programming*, Erick S. Raymond, Addison-Wesley, 2003](http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2878263)\n\n\u003e Rule 2: The guaranteed access rule.\n\u003e\n\u003e Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.\n\u003e\n\u003e [Edgar F. Codd](https://en.wikipedia.org/wiki/Codd%27s_12_rules)\n\nThe point being: it is much easier to reason about data than to try to reason about code, so using data to control your code is often a very good trade-off.  `cdata` also has a `Python` implementation that it can inter-operate with in the [`data_algebra` package](https://github.com/WinVector/data_algebra) (example [here](https://github.com/WinVector/data_algebra/blob/master/Examples/cdata/cdata.ipynb)).\n\n\n![](https://raw.githubusercontent.com/WinVector/cdata/master/tools/cdata.png)\n\nBriefly: `cdata` supplies data transform operators that:\n\n * Work on local data or with any `DBI` data source.\n * Are powerful generalizations of the operations commonly called `pivot` and `un-pivot`.\n * Allow for example-driven graphical specification of data transforms or data layout control.\n * Work in-memory or with `SQL` databases.\n\nA quick example: plot iris petal and sepal dimensions in a faceted graph.\n\n\n```{r ex0}\niris \u003c- data.frame(iris)\niris$iris_id \u003c- seq_len(nrow(iris))\n\n# show the data\nhead(iris)\n\nlibrary(\"ggplot2\")\nlibrary(\"cdata\")\n\n#\n# build a control table with a \"key column\" flower_part\n# and \"value columns\" Length and Width\n#\ncontrolTable \u003c- wrapr::qchar_frame(\n  \"flower_part\", \"Length\"     , \"Width\"     |\n    \"Petal\"    , Petal.Length , Petal.Width |\n    \"Sepal\"    , Sepal.Length , Sepal.Width )\n\ntransform \u003c- rowrecs_to_blocks_spec(\n  controlTable,\n  recordKeys = c(\"iris_id\", \"Species\"))\n\n# do the unpivot to convert the row records to block records\niris_aug \u003c- iris %.\u003e% transform\n\n# show the tranformed data\nhead(iris_aug)\n\n# plot the graph\nggplot(iris_aug, aes(x=Length, y=Width)) +\n  geom_point(aes(color=Species, shape=Species)) + \n  facet_wrap(~flower_part, labeller = label_both, scale = \"free\") +\n  ggtitle(\"Iris dimensions\") +  scale_color_brewer(palette = \"Dark2\")\n\n# show the transform\nprint(transform)\n\n# show the representation of the transform\nunclass(transform)\n```\n\nMore details on the above example can be found [here](https://win-vector.com/2018/10/21/faceted-graphs-with-cdata-and-ggplot2/). A tutorial on how to design a `controlTable` can be found [here](https://winvector.github.io/cdata/articles/design.html).  And some discussion of the nature of records in `cdata` can be found [here](https://winvector.github.io/cdata/articles/blocksrecs.html).\n\n----\n\nA more detailed video tutorial is available [here](https://github.com/WinVector/cdata/blob/master/Examples/OrderedGrouping/OrderedGrouping.md).\n\n----\n\nWe can also exhibit a larger example of using `cdata` to create a scatter-plot matrix, or pair plot:\n\n```{r ex0_1}\n\niris \u003c- data.frame(iris)\niris$iris_id \u003c- seq_len(nrow(iris))\n\nlibrary(\"ggplot2\")\nlibrary(\"cdata\")\n\n# declare our columns of interest\nmeas_vars \u003c- qc(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)\ncategory_variable \u003c- \"Species\"\n\n# build a control with all pairs of variables as value columns\n# and pair_key as the key column\ncontrolTable \u003c- data.frame(expand.grid(meas_vars, meas_vars, \n                                       stringsAsFactors = FALSE))\n# one copy of columns is coordinate names second copy is values\ncontrolTable \u003c- cbind(controlTable, controlTable)\n# name the value columns value1 and value2\ncolnames(controlTable) \u003c- qc(v1, v2, value1, value2)\ntransform \u003c- rowrecs_to_blocks_spec(\n  controlTable,\n  recordKeys = c(\"iris_id\", \"Species\"),\n  controlTableKeys = qc(v1, v2),\n  checkKeys = FALSE)\n\n# do the unpivot to convert the row records to multiple block records\niris_aug \u003c- iris %.\u003e% transform\n# alternate notation: layout_by(transform, iris)\n\n\nggplot(iris_aug, aes(x=value1, y=value2)) +\n  geom_point(aes_string(color=category_variable, shape=category_variable)) + \n  facet_grid(v2~v1, labeller = label_both, scale = \"free\") +\n  ggtitle(\"Iris dimensions\") +\n  scale_color_brewer(palette = \"Dark2\") +\n  ylab(NULL) + \n  xlab(NULL)\n\n# show transform\nprint(transform)\n```\n\n\nThe above is now wrapped into a [one-line command in `WVPlots`](https://winvector.github.io/WVPlots/reference/PairPlot.html).\n\n\n----\n\nThe `cdata` package develops the idea of the [\"coordinatized data\" theory](https://winvector.github.io/FluidData/RowsAndColumns.html) and includes an implementation of the [\"fluid data\" methodology](https://winvector.github.io/FluidData/FluidData.html).   \n\nThe main `cdata` interfaces are given by the following set of methods:\n\n  * [`rowrecs_to_blocks_spec()`](https://winvector.github.io/cdata/reference/rowrecs_to_blocks_spec.html), for specifying how single row records map to general multi-row (or block) records.\n  * [`blocks_to_rowrecs_spec()`](https://winvector.github.io/cdata/reference/blocks_to_rowrecs_spec.html), for specifying how multi-row block records map to single-row records.\n  * [`layout_specification()`](https://winvector.github.io/cdata/reference/layout_specification.html), for specifying transforms from multi-row records to other multi-row records.\n  * [`layout_by()`](https://winvector.github.io/cdata/reference/layout_by.html) or the [wrapr dot arrow pipe](https://winvector.github.io/wrapr/reference/dot_arrow.html) for applying a layout to re-arrange data.\n  * `t()` (transpose/adjoint) to invert or reverse layout specifications.\n\nSome convenience functions include:\n\n  * [`pivot_to_rowrecs()`](https://winvector.github.io/cdata/reference/pivot_to_rowrecs.html), for moving data from multi-row block records with one value per row (a single column of values) to single-row records `spread` or `dcast`.\n  * [`pivot_to_blocks()`/`unpivot_to_blocks()`](https://winvector.github.io/cdata/reference/unpivot_to_blocks.html), for moving data from single-row records to possibly multi row block records with one row per value (a single column of values) `gather` or `melt`.\n  * [`wrapr::qchar_frame()`](https://winvector.github.io/wrapr/reference/qchar_frame.html) a helper function for specifying record control table layout specifications.\n * [`wrapr::build_frame()`](https://winvector.github.io/wrapr/reference/build_frame.html) a helper function for specifying data frames.\n  \nThe package vignettes can be found in the \"Articles\" tab of [the `cdata` documentation site](https://winvector.github.io/cdata/).\n\nThe (older) recommended tutorial is: [Fluid data reshaping with cdata](https://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html). We also have a (older) [short free cdata screencast](https://youtu.be/4cYbP3kbc0k) (and another example can be found [here](https://winvector.github.io/FluidData/DataWranglingAtScale.html)).  These concepts were later adapted from `cdata` by the `tidyr` package.\n\n----\n\nInstall via CRAN:\n\n```{r, eval=FALSE}\ninstall.packages(\"cdata\")\n```\n\n----\n\n\n\nNote: `cdata` is targeted at data with \"tame column names\" (column names that are valid both in databases, and as `R` unquoted variable names) and basic types (column values that are simple `R` types such as `character`, `numeric`, `logical`, and so on). \n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fcdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwinvector%2Fcdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fcdata/lists"}