{"id":17606644,"url":"https://github.com/corybrunson/ordr","last_synced_at":"2025-04-12T17:12:53.929Z","repository":{"id":39601096,"uuid":"148827439","full_name":"corybrunson/ordr","owner":"corybrunson","description":"manage ordinations and render biplots in a tidyverse workflow","archived":false,"fork":false,"pushed_at":"2025-03-25T15:08:37.000Z","size":102365,"stargazers_count":23,"open_issues_count":20,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T17:12:47.121Z","etag":null,"topics":["biplot","data-visualization","dimension-reduction","geometric-data-analysis","grammar-of-graphics","log-ratio-analysis","multivariate-analysis","multivariate-statistics","ordination","tidymodels","tidyverse"],"latest_commit_sha":null,"homepage":"https://corybrunson.github.io/ordr/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/corybrunson.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-14T18:30:40.000Z","updated_at":"2025-03-22T08:13:14.000Z","dependencies_parsed_at":"2024-09-13T02:21:06.601Z","dependency_job_id":"985ec0dc-3c9f-4af0-a5aa-1984d3a9832f","html_url":"https://github.com/corybrunson/ordr","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corybrunson%2Fordr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corybrunson%2Fordr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corybrunson%2Fordr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corybrunson%2Fordr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/corybrunson","download_url":"https://codeload.github.com/corybrunson/ordr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248602312,"owners_count":21131616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biplot","data-visualization","dimension-reduction","geometric-data-analysis","grammar-of-graphics","log-ratio-analysis","multivariate-analysis","multivariate-statistics","ordination","tidymodels","tidyverse"],"created_at":"2024-10-22T15:51:42.771Z","updated_at":"2025-04-12T17:12:53.909Z","avatar_url":"https://github.com/corybrunson.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003c!-- edit README.rmd --\u003e\n\n# ordr\n\n\u003c!-- badges: start --\u003e\n\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![CRAN](http://www.r-pkg.org/badges/version/ordr)](https://cran.r-project.org/package=ordr)\n\u003c!-- badges: end --\u003e\n\n**ordr** integrates ordination analysis and biplot visualization into\n[**tidyverse**](https://github.com/tidyverse/tidyverse) workflows.\n\n## motivation\n\n\u003e Wherever there is an SVD, there is a biplot.[^1]\n\n### ordination and biplots\n\n*Ordination* is a catch-all term for a variety of statistical techniques\nthat introduce an artificial coordinate system for a data set in such a\nway that a few coordinates capture a large amount of the data structure\n[^2]. The branch of mathematical statistics called [geometric data\nanalysis](https://link.springer.com/book/10.1007/1-4020-2236-0) (GDA)\nprovides the theoretical basis for (most of) these techniques.\nOrdination overlaps with regression and with dimension reduction, which\ncan be [contrasted to clustering and\nclassification](https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d)\nin that they assign continuous rather than categorical values to data\nelements [^3].\n\nMost ordination techniques decompose a numeric rectangular data set into\nthe product of two matrices, often using singular value decomposition\n(SVD). The coordinates of the shared dimensions of these matrices (over\nwhich they are multiplied) are the artificial coordinates. In some\ncases, such as principal components analysis, the decomposition is\nexact; in others, such as non-negative matrix factorization, it is\napproximate. Some techniques, such as correspondence analysis, transform\nthe data before decomposition. Ordination techniques may be supervised,\nlike linear discriminant analysis, or unsupervised, like\nmultidimensional scaling.\n\nAnalysis pipelines that use these techniques may use the artificial\ncoordinates directly, in place of natural coordinates, to arrange and\ncompare data elements or to predict responses. This is possible because\nboth the rows and the columns of the original table can be located, or\npositioned, along these shared coordinates. The number of artificial\ncoordinates used in an application, such as regression or visualization,\nis called the *rank* of the ordination [^4]. A common application is the\n*biplot*, which positions the rows and columns of the original table in\na scatterplot in 1, 2, or 3 artificial coordinates, usually those that\nexplain the most variation in the data.\n\n### implementations in R\n\nAn extensive range of ordination techniques are implemented in R, from\nclassical multidimensional scaling (`stats::cmdscale()`) and principal\ncomponents analysis (`stats::prcomp()` and `stats::princomp()`) in the\n**stats** package distributed with base R, across widely-used\nimplementations of linear discriminant analysis (`MASS::lda()`) and\ncorrespondence analysis (`ca::ca()`) in general-use statistical\npackages, to highly specialized packages that implement cutting-edge\ntechniques or adapt conventional techniques to challenging settings.\nThese implementations come with their own conventions, tailored to the\nresearch communities that produced them, and it would be impractical\n(and probably unhelpful) to try to consolidate them.\n\nInstead, **ordr** provides a streamlined process by which the models\noutput by these methods—in particular, the matrix factors into which the\noriginal data are approximately decomposed and the artificial\ncoordinates they share—can be inspected, annotated, tabulated,\nsummarized, and visualized. On this last point, most biplot\nimplementations in R provide limited customizability. **ordr** adopts\nthe grammar of graphics paradigm from\n[**ggplot2**](https://github.com/tidyverse/ggplot2) to modularize and\nstandardize biplot elements [^5]. Overall, the package is designed to\nfollow the broader syntactic conventions of the **tidyverse**, so that\nusers familiar with a this workflow can more easily and quickly\nintegrate ordination models into practice.\n\n## usage\n\n### installation\n\n**ordr** is now on CRAN and can be installed using base R:\n\n``` r\ninstall.packages(\"ordr\")\n```\n\nThe development version can be installed from the (default) `main`\nbranch using [**remotes**](https://github.com/r-lib/remotes):\n\n``` r\nremotes::install_github(\"corybrunson/ordr\")\n```\n\n### example\n\n\u003e Morphologically, *Iris versicolor* is much closer to *Iris virginica*\n\u003e than to *Iris setosa*, though in every character by which it differs\n\u003e from *Iris virginica* it departs in the direction of *Iris\n\u003e setosa*.[^6]\n\nA very common illustration of ordination in R applies principal\ncomponents analysis (PCA) to Anderson’s iris measurements. These data\nconsist of lengths and widths of the petals and surrounding sepals from\n50 each of three species of iris:\n\n``` r\nhead(iris)\n#\u003e   Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n#\u003e 1          5.1         3.5          1.4         0.2  setosa\n#\u003e 2          4.9         3.0          1.4         0.2  setosa\n#\u003e 3          4.7         3.2          1.3         0.2  setosa\n#\u003e 4          4.6         3.1          1.5         0.2  setosa\n#\u003e 5          5.0         3.6          1.4         0.2  setosa\n#\u003e 6          5.4         3.9          1.7         0.4  setosa\nsummary(iris)\n#\u003e   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   \n#\u003e  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  \n#\u003e  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  \n#\u003e  Median :5.800   Median :3.000   Median :4.350   Median :1.300  \n#\u003e  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  \n#\u003e  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  \n#\u003e  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  \n#\u003e        Species  \n#\u003e  setosa    :50  \n#\u003e  versicolor:50  \n#\u003e  virginica :50  \n#\u003e                 \n#\u003e                 \n#\u003e \n```\n\n**ordr** provides a convenience function to send a subset of columns to\nan ordination function, wrap the resulting model in the\n[**tibble**](https://github.com/tidyverse/tibble)-derived ‘tbl_ord’\nclass, and append both model diagnostics and other original data columns\nas annotations to the appropriate matrix factors:[^7]\n\n``` r\n(iris_pca \u003c- ordinate(iris, cols = 1:4, model = ~ prcomp(., scale. = TRUE)))\n#\u003e # A tbl_ord of class 'prcomp': (150 x 4) x (4 x 4)'\n#\u003e # 4 coordinates: PC1, PC2, ..., PC4\n#\u003e # \n#\u003e # Rows (principal): [ 150 x 4 | 1 ]\n#\u003e     PC1    PC2     PC3 ... |   Species\n#\u003e                            |   \u003cfct\u003e  \n#\u003e 1 -2.26 -0.478  0.127      | 1 setosa \n#\u003e 2 -2.07  0.672  0.234  ... | 2 setosa \n#\u003e 3 -2.36  0.341 -0.0441     | 3 setosa \n#\u003e 4 -2.29  0.595 -0.0910     | 4 setosa \n#\u003e 5 -2.38 -0.645 -0.0157     | 5 setosa \n#\u003e # ℹ 145 more rows\n#\u003e # \n#\u003e # Columns (standard): [ 4 x 4 | 3 ]\n#\u003e      PC1     PC2    PC3 ... |   name         center scale\n#\u003e                             |   \u003cchr\u003e         \u003cdbl\u003e \u003cdbl\u003e\n#\u003e 1  0.521 -0.377   0.720     | 1 Sepal.Length   5.84 0.828\n#\u003e 2 -0.269 -0.923  -0.244 ... | 2 Sepal.Width    3.06 0.436\n#\u003e 3  0.580 -0.0245 -0.142     | 3 Petal.Length   3.76 1.77 \n#\u003e 4  0.565 -0.0669 -0.634     | 4 Petal.Width    1.20 0.762\n```\n\nAdditional annotations can be added using several row- and\ncolumn-specific **dplyr**-style verbs:\n\n``` r\niris_meta \u003c- data.frame(\n  Species = c(\"setosa\", \"versicolor\", \"virginica\"),\n  Colony = c(1L, 1L, 2L),\n  Cytotype = c(\"diploid\", \"hexaploid\", \"tetraploid\")\n)\n(iris_pca \u003c- left_join_rows(iris_pca, iris_meta, by = \"Species\"))\n#\u003e # A tbl_ord of class 'prcomp': (150 x 4) x (4 x 4)'\n#\u003e # 4 coordinates: PC1, PC2, ..., PC4\n#\u003e # \n#\u003e # Rows (principal): [ 150 x 4 | 3 ]\n#\u003e     PC1    PC2     PC3 ... |   Species Colony Cytotype\n#\u003e                            |   \u003cchr\u003e    \u003cint\u003e \u003cchr\u003e   \n#\u003e 1 -2.26 -0.478  0.127      | 1 setosa       1 diploid \n#\u003e 2 -2.07  0.672  0.234  ... | 2 setosa       1 diploid \n#\u003e 3 -2.36  0.341 -0.0441     | 3 setosa       1 diploid \n#\u003e 4 -2.29  0.595 -0.0910     | 4 setosa       1 diploid \n#\u003e 5 -2.38 -0.645 -0.0157     | 5 setosa       1 diploid \n#\u003e # ℹ 145 more rows\n#\u003e # \n#\u003e # Columns (standard): [ 4 x 4 | 3 ]\n#\u003e      PC1     PC2    PC3 ... |   name         center scale\n#\u003e                             |   \u003cchr\u003e         \u003cdbl\u003e \u003cdbl\u003e\n#\u003e 1  0.521 -0.377   0.720     | 1 Sepal.Length   5.84 0.828\n#\u003e 2 -0.269 -0.923  -0.244 ... | 2 Sepal.Width    3.06 0.436\n#\u003e 3  0.580 -0.0245 -0.142     | 3 Petal.Length   3.76 1.77 \n#\u003e 4  0.565 -0.0669 -0.634     | 4 Petal.Width    1.20 0.762\n```\n\nFollowing the [**broom**](https://github.com/tidymodels/broom) package,\nthe `tidy()` method produces a tibble describing the model components,\nin this case the principal coordinates, which is suitable for scree\nplotting:\n\n``` r\ntidy(iris_pca) %T\u003e% print() %\u003e%\n  ggplot(aes(x = name, y = prop_var)) +\n  geom_col() +\n  labs(x = \"\", y = \"Proportion of inertia\") +\n  ggtitle(\"PCA of Anderson's iris measurements\",\n          \"Distribution of inertia\")\n#\u003e # A tibble: 4 × 5\n#\u003e   name   sdev inertia prop_var quality\n#\u003e   \u003cfct\u003e \u003cdbl\u003e   \u003cdbl\u003e    \u003cdbl\u003e   \u003cdbl\u003e\n#\u003e 1 PC1   1.71   435.    0.730     0.730\n#\u003e 2 PC2   0.956  136.    0.229     0.958\n#\u003e 3 PC3   0.383   21.9   0.0367    0.995\n#\u003e 4 PC4   0.144    3.09  0.00518   1\n```\n\n![](man/figures/README-model%20components%20and%20scree%20plot-1.png)\u003c!-- --\u003e\n\nFollowing **ggplot2**, the `fortify()` method row-binds the factor\ntibbles with an additional `.matrix` column. This is used by\n`ggbiplot()` to redirect row- and column-specific plot layers to the\nappropriate subsets:[^8]\n\n``` r\nggbiplot(iris_pca, sec.axes = \"cols\", scale.factor = 2) +\n  geom_rows_point(aes(color = Species, shape = Species)) +\n  stat_rows_ellipse(aes(color = Species), alpha = .5, level = .99) +\n  geom_cols_vector(aes(label = name)) +\n  expand_limits(y = c(-3.5, NA)) +\n  ggtitle(\"PCA of Anderson's iris measurements\",\n          \"99% confidence ellipses; variables use top \u0026 right axes\")\n```\n\n![](man/figures/README-interpolative%20biplot-1.png)\u003c!-- --\u003e\n\nWhen variables are represented in standard coordinates, as typically in\nPCA, their rules can be rescaled to yield a predictive biplot.[^9] For\nlegibility, the axes are limited to the data range and offset from the\norigin:\n\n``` r\nggbiplot(iris_pca, axis.type = \"predictive\", axis.percents = FALSE) +\n  theme_scaffold() +\n  geom_rows_point(aes(color = Species, shape = Species)) +\n  stat_rows_center(\n    aes(color = Species, shape = Species),\n    size = 5, alpha = .5, fun.data = mean_se\n  ) +\n  stat_cols_rule(aes(label = name, center = center, scale = scale)) +\n  ggtitle(\"Predictive biplot of Anderson's iris measurements\",\n          \"Project a marker onto an axis to approximate its measurement\")\n```\n\n![](man/figures/README-predictive%20biplot-1.png)\u003c!-- --\u003e\n\n``` r\naggregate(iris[, 1:4], by = iris[, \"Species\", drop = FALSE], FUN = mean)\n#\u003e      Species Sepal.Length Sepal.Width Petal.Length Petal.Width\n#\u003e 1     setosa        5.006       3.428        1.462       0.246\n#\u003e 2 versicolor        5.936       2.770        4.260       1.326\n#\u003e 3  virginica        6.588       2.974        5.552       2.026\n```\n\n### more methods\n\nThe auxiliary package\n[**ordr.extra**](https://github.com/corybrunson/ordr.extra) provides\nrecovery methods for several additional ordination models—and has room\nfor several more!\n\n## acknowledgments\n\n### contribute\n\nAny feedback on the package is very welcome! If you encounter confusion\nor errors, do create an issue, with a [minimal reproducible\nexample](https://stackoverflow.com/help/minimal-reproducible-example) if\nfeasible. If you have requests, suggestions, or your own implementations\nfor new features, feel free to create an issue or submit a pull request.\nMethods for additional ordination classes (see the `methods-*.r` scripts\nin the `R` folder) are especially welcome, as are new plot layers.\nPlease try to follow the [contributing\nguidelines](https://github.com/corybrunson/ordr/blob/main/CONTRIBUTING.md)\nand respect the [Code of\nConduct](https://github.com/corybrunson/ordr/blob/main/CODE_OF_CONDUCT.md).\n\n### inspirations\n\nThis package was originally inspired by the **ggbiplot** extension\ndeveloped by [Vincent Q. Vu](https://github.com/vqv/ggbiplot), [Richard\nJ Telford](https://github.com/richardjtelford/ggbiplot), and [Vilmantas\nGegzna](https://github.com/forked-packages/ggbiplot), among others. It\nprobably first brought biplots into the **tidyverse** framework. The\nmotivation to unify a variety of ordination methods came from several\nbooks and articles by [Michael\nGreenacre](https://www.fbbva.es/microsite/multivariate-statistics/resources.html),\nin particular [*Biplots in\nPractice*](https://www.fbbva.es/microsite/multivariate-statistics/resources.html#biplots).\nSeveral answers at CrossValidated, in particular by\n[amoeba](https://stats.stackexchange.com/users/28666/amoeba) and\n[ttnphns](https://stats.stackexchange.com/users/3277/ttnphns), provided\ntheoretical insights and informed design choices. Thomas Lin Pedersen’s\n[**tidygraph**](https://github.com/thomasp85/tidygraph) prequel to\n**ggraph** finally induced the shift from the downstream generation of\nscatterplots to the upstream handling and manipulating of models.\nAdditional design elements and features have been informed by the\nmonograph\n[*Biplots*](https://www.google.com/books/edition/Biplots/lTxiedIxRpgC)\nand the textbook [*Understanding\nBiplots*](https://www.wiley.com/en-us/Understanding+Biplots-p-9781119972907)\nby John C. Gower, David J. Hand, Sugnet Gardner–Lubbe, and Niël J. Le\nRoux, and by the volume [*Principal Components\nAnalysis*](https://link.springer.com/book/10.1007/b98835) by I. T.\nJolliffe.\n\n### exposition\n\nThis work was presented ([slideshow\nPDF](https://raw.githubusercontent.com/corybrunson/tidy-factor/main/tidy-factor-x/tidy-factor-x.pdf))\nat an invited panel on [New Developments in Graphing Multivariate\nData](https://ww2.amstat.org/meetings/jsm/2022/onlineprogram/ActivityDetails.cfm?SessionID=222053)\nat the [Joint Statistical\nMeetings](https://ww2.amstat.org/meetings/jsm/2022/), on 2022 August 8\nin Washington DC. I’m grateful to Joyce Robbins for the invitation and\nfor organizing such a fun first experience, to Naomi Robbins for\nchairing the event, and to my co-panelists Ursula Laa and Hengrui Luo\nfor sharing and sparking such exciting ideas and conversations. An\nupdate was presented to the [ggplot2\nextenders](https://teunbrand.github.io/ggplot-extension-club/), which\nelicited additional valuable feedback.\n\n### resources\n\nDevelopment of this package benefitted from the use of equipment and the\nsupport of colleagues at [UConn Health](https://health.uconn.edu/) and\nat [UF Health](https://ufhealth.org/).\n\n[^1]: Greenacre MJ (2010) *Biplots in Practice*. Fundacion BBVA, ISBN:\n    978-84-923846.\n    \u003chttps://www.fbbva.es/microsite/multivariate-statistics/biplots.html\u003e\n\n[^2]: The term *ordination* is most prevalent among ecologists; no\n    catch-all term seems to be in common use outside ecology.\n\n[^3]: This is not a hard rule: PCA is often used to compress data before\n    clustering, and LDA uses dimension reduction to perform\n    classification tasks.\n\n[^4]: Regression and clustering models, like classical [linear\n    regression](https://www.fbbva.es/microsite/multivariate-statistics/)\n    and\n    [*k*-means](http://joelcadwell.blogspot.com/2015/08/matrix-factorization-comes-in-many.html),\n    can also be understood as matrix decomposition approximations and\n    even visualized in biplots. Their shared coordinates, which are\n    pre-defined rather than artificial, are the predictor coefficients\n    and the cluster assignments, respectively. Methods for `stats::lm()`\n    and `stats::kmeans()`, for example, are implemented for the sake of\n    novelty and instruction, but are not widely used in practice.\n\n[^5]: Biplot elments must be chosen with care, and it is useful and\n    appropriate that many model-specific biplot methods have limited\n    flexibility. This package adopts the trade-off articulated in\n    [Wilkinson’s *The Grammar of\n    Graphics*](https://www.google.com/books/edition/_/iI1kcgAACAAJ)\n    (p. 15): “This system is capable of producing some hideous graphics.\n    There is nothing in its design to prevent its misuse. … This system\n    cannot produce a meaningless graphic, however.”\n\n[^6]: Anderson E (1936) “The Species Problem in Iris”. *Annals of the\n    Missouri Botanical Garden* **23**(3),\n    457-469+471-483+485-501+503-509. \u003chttps://doi.org/10.2307/2394164\u003e\n\n[^7]: The data must be in the form of a data frame that can be\n    understood by the modeling function. Step-by-step methods also exist\n    to build and annotate a ‘tbl_ord’ from a fitted ordination model.\n\n[^8]: The radiating text geom, like several other features, is adapted\n    from the **ggbiplot** package.\n\n[^9]: This is an experimental feature only available for linear methods,\n    namely eigendecomposition, singular value decomposition, and\n    principal components analysis.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorybrunson%2Fordr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcorybrunson%2Fordr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorybrunson%2Fordr/lists"}