{"id":13712064,"url":"https://github.com/tdaverse/ggtda","last_synced_at":"2025-05-06T21:33:24.519Z","repository":{"id":38814718,"uuid":"143955038","full_name":"tdaverse/ggtda","owner":"tdaverse","description":"ggplot2 extension to visualize persistent homology","archived":false,"fork":false,"pushed_at":"2025-04-24T20:57:39.000Z","size":47646,"stargazers_count":23,"open_issues_count":9,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-24T21:50:58.345Z","etag":null,"topics":["ggplot-extension","ggplot2","persistence-data","persistent-homology","r","rstats","simplicial-complex","tda","tidyverse","topological-data-analysis","visualization"],"latest_commit_sha":null,"homepage":"https://tdaverse.github.io/ggtda/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tdaverse.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-08T03:15:13.000Z","updated_at":"2025-03-04T10:08:44.000Z","dependencies_parsed_at":"2024-05-02T19:54:11.841Z","dependency_job_id":null,"html_url":"https://github.com/tdaverse/ggtda","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdaverse%2Fggtda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdaverse%2Fggtda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdaverse%2Fggtda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tdaverse%2Fggtda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tdaverse","download_url":"https://codeload.github.com/tdaverse/ggtda/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252772401,"owners_count":21801919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ggplot-extension","ggplot2","persistence-data","persistent-homology","r","rstats","simplicial-complex","tda","tidyverse","topological-data-analysis","visualization"],"created_at":"2024-08-02T23:01:14.528Z","updated_at":"2025-05-06T21:33:24.502Z","avatar_url":"https://github.com/tdaverse.png","language":"R","funding_links":[],"categories":["Plot layers","Frameworks and Libs"],"sub_categories":["R"],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\",\n  dev = \"png\", dpi = 300\n)\n```\n\n# ggtda\n\n[![Coverage Status](https://img.shields.io/codecov/c/github/tdaverse/ggtda/main.svg)](https://codecov.io/github/tdaverse/ggtda?branch=main)\n[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\n[![CRAN version](http://www.r-pkg.org/badges/version/ggtda)](https://CRAN.R-project.org/package=ggtda)\n[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/grand-total/ggtda)](https://CRAN.R-project.org/package=ggtda)\n\n## Overview\n\nThe **ggtda** package provides **ggplot2** layers for the visualization of constructions and statistics arising from topological data analysis.\n\n## Installation\n\nThe development version can be installed used the **remotes** package:\n\n```r\n# install from GitHub\nremotes::install_github(\"tdaverse/ggtda\", vignettes = TRUE)\n```\n\nFor an introduction to package functionality, read the vignettes:\n\n```r\n# read vignettes\nvignette(topic = \"visualize-persistence\", package = \"ggtda\")\nvignette(topic = \"illustrate-constructions\", package = \"ggtda\")\nvignette(topic = \"grouped-list-data\", package = \"ggtda\")\n```\n\nWe aim to submit to [CRAN](https://CRAN.R-project.org) in Spring 2024!\n\n## Example\n\n```{r}\n# attach {ggtda}\nlibrary(ggtda)\n```\n\n### Sample data set\n\nThis example illustrates **ggtda** features using an artificial point cloud $X$ sampled with noise from a circle:\n\n```{r, fig.height=3}\n# generate a noisy circle\nn \u003c- 36\nset.seed(0)\nt \u003c- stats::runif(n = n, min = 0, max = 2*pi)\nd \u003c- data.frame(\n  x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),\n  y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)\n)\n# plot the data\nggplot(d, aes(x, y)) + geom_point() + coord_equal() + theme_bw()\n```\n\n### Topological constructions\n\n**ggtda** provides stat and geom layers for common TDA constructions.\nTo illustrate, pick a proximity, or threshold, to consider points in the cloud to be neighbors:\n\n```{r}\n# choose a proximity threshold\nprox \u003c- 2/3\n```\n\nThe homology $H_k(X)$ of a point cloud is uninteresting ($H_0(X) = \\lvert X \\rvert$ and $H_k(X) = 0$ for $k \u003e 0$). The most basic space of interest to the topological data analyst is the union of a _ball cover_ $B_r(X)$ of $X$---a ball of common radius $r$ around each point. The common radius will be $r =$ `prox / 2`.\n\nThe figure below compares the ball cover (left) with the _Vietoris_ (or _Rips_) _complex_ ${VR}_r(X)$ constructed using the same proximity (right).\nThe complex comprises a simplex at each subset of points having diameter at most `prox`---that is, each pair of which are within `prox` of each other.\nA key result in TDA is that the homology of the ball union is \"very close\" to that of the complex.\n\n```{r, fig.height=3}\n# visualize disks of fixed radii and the Vietoris complex for this proximity\np_d \u003c- ggplot(d, aes(x = x, y = y)) +\n  coord_fixed() +\n  geom_disk(radius = prox/2, fill = \"aquamarine3\") +\n  geom_point() +\n  theme_bw()\np_sc \u003c- ggplot(d, aes(x = x, y = y)) +\n  coord_fixed() +\n  stat_simplicial_complex(diameter = prox, fill = \"darkgoldenrod\") +\n  theme_bw() +\n  theme(legend.position = \"none\")\n# combine the plots\ngridExtra::grid.arrange(\n  p_d, p_sc,\n  layout_matrix = matrix(c(1, 2), nrow = 1)\n)\n```\n\nThis cover and simplex clearly contain a non-trivial 1-cycle (loop), which makes $H_1(B_r(X)) = H_1({VR}_r(X)) = 1$.\nBut detecting this feature depended crucially on the choice of `prox`, and there's no guarantee with new data that this choice will be correct or even that a single best choice exists.\nInstead, we tend to be interested in considering those features that persist across many values of `prox`.\nThe GIF below[^TDAvis] illustrates this point: Observe how features appear and disappear as the disk covers grow:\n\n[^TDAvis]: The GIF and many features of **ggtda** were originally developed in the separate package [**TDAvis**](https://github.com/jamesotto852/TDAvis).\n\n\u003cimg src=\"man/figures/Small-Clouds.gif\" width=\"100%\" /\u003e\n\n### Persistent homology\n\nPersistent homology (PH) encodes the homology group ranks across the full range $0 \\leq r \u003c \\infty$, corresponding to the full filtration of simplicial complexes constructed on the point cloud.\nWe use [**ripserr**](https://cran.r-project.org/package=ripserr) to compute the PH of the point cloud $X$:\n\n```{r}\n# compute the persistent homology\nph \u003c- ripserr::vietoris_rips(as.matrix(d), dim = 1)\nprint(ph)\n```\n\nThe loop is detected, though we do not yet know whether its persistence stands out from that of other features.\nTo prepare for `ggplot()`, we convert the result to a data frame and its numeric `dimension` column to a factor:\n\n```{r}\npd \u003c- as.data.frame(ph)\npd \u003c- transform(pd, dimension = as.factor(dimension))\nhead(pd)\ntail(pd)\n```\n\n### Persistence plots\n\n**ggtda** also provides stat and geom layers for common visualizations of persistence data.\nWe visualize these data using a barcode (left) and a persistence diagram (right).\nIn the barcode, the dashed line indicates the cutoff at the proximity `prox`; in the persistence diagram, the fundamental box contains the features that are detectable at this cutoff.\n\n```{r, fig.height=3}\n# visualize the persistence data, indicating cutoffs at this proximity\np_bc \u003c- ggplot(pd, aes(start = birth, end = death)) +\n  geom_barcode(linewidth = 1, aes(color = dimension, linetype = dimension)) +\n  labs(x = \"Diameter\", y = \"Homological features\",\n       color = \"Dimension\", linetype = \"Dimension\") +\n  geom_vline(xintercept = prox, color = \"darkgoldenrod\", linetype = \"dotted\") +\n  theme_barcode()\nmax_prox \u003c- max(pd$death)\np_pd \u003c- ggplot(pd) +\n  coord_fixed() +\n  stat_persistence(aes(start = birth, end = death,\n                       colour = dimension, shape = dimension)) +\n  geom_abline(slope = 1) +\n  labs(x = \"Birth\", y = \"Death\", color = \"Dimension\", shape = \"Dimension\") +\n  lims(x = c(0, max_prox), y = c(0, max_prox)) +\n  geom_fundamental_box(\n    t = prox,\n    fill = \"darkgoldenrod\", color = \"transparent\"\n  ) +\n  theme_persist()\n# combine the plots\ngridExtra::grid.arrange(\n  p_bc, p_pd,\n  layout_matrix = matrix(c(1, 2), nrow = 1)\n)\n```\n\nThe barcode lines are color- and linetype-coded by feature dimension: the 0-dimensional features, i.e. the gaps between connected components, versus the 1-dimensional feature, i.e. the loop.\nThese groups of lines do not overlap, which means that the loop exists only in the persistence domain where all the data points are part of the same connected component. Our choice of `prox` is between the birth and death of the loop, which is why the complex above recovers it.\n\nThe persistence diagram shows that the loop persists for longer than any of the gaps. This is consistent with the gaps being artifacts of the sampling procedure but the loop being an intrinsic property of the underlying space.\n\n### Multiple data sets\n\nTDA usually involves comparisons of topological data between spaces.\nTo illustrate such a comparison, we construct a larger sample and examine the persistence of its cumulative subsets:\n\n```{r}\n# larger point cloud sampled from a noisy circle\nset.seed(0)\nn \u003c- 180\nt \u003c- stats::runif(n = n, min = 0, max = 2*pi)\nd \u003c- data.frame(\n  x = cos(t) + stats::rnorm(n = n, mean = 0, sd = .2),\n  y = sin(t) + stats::rnorm(n = n, mean = 0, sd = .2)\n)\n# list of cumulative point clouds\nns \u003c- c(12, 36, 60, 180)\ndl \u003c- lapply(ns, function(n) d[seq(n), ])\n```\n\nFirst we construct a nested data frame containing these subsets and plot their Vietoris complexes.\n(We specify the [**simplextree**](https://github.com/peekxc/simplextree) engine and restrict to 2-simplices to reduce runtime.)\n\n```{r, fig.height=5}\n# formatted as grouped data\ndg \u003c- do.call(rbind, dl)\ndg$n \u003c- rep(ns, vapply(dl, nrow, 0L))\n# faceted plots of cumulative simplicial complexes\nggplot(dg, aes(x, y)) +\n  coord_fixed() +\n  facet_wrap(facets = vars(n), labeller = label_both) +\n  stat_simplicial_complex(\n    diameter = prox, dimension_max = 2L,\n    engine = \"simplextree\",\n    fill = \"darkgoldenrod\"\n  ) +\n  theme_bw() +\n  theme(legend.position = \"none\")\n```\n\nThe Vietoris complexes on these subsets for the fixed proximity are not a filtration; instead they show us how increasing the sample affects the detection of homology at that threshold.\nNotice that, while a cycle exists at $n = 36$, the \"true\" cycle is only detected at $n = 60$.\n\nWe can also conveniently plot the persistence diagrams from all four cumulative subsets, this time using a list-column of data sets passed to the `dataset` aesthetic:\n\n```{r, fig.height=5}\n# nested data frame of samples of different cumulative sizes\nds \u003c- data.frame(n = ns, d = I(dl))\nprint(ds)\n# faceted plot of persistence diagrams\nggplot(ds, aes(dataset = d)) +\n  coord_fixed() +\n  facet_wrap(facets = vars(n), labeller = label_both) +\n  stat_persistence(aes(colour = after_stat(factor(dimension)),\n                       shape = after_stat(factor(dimension)))) +\n  geom_abline(slope = 1) +\n  labs(x = \"Birth\", y = \"Death\", color = \"Dimension\", shape = \"Dimension\") +\n  lims(x = c(0, max_prox), y = c(0, max_prox)) +\n  theme_persist()\n```\n\nThe diagrams reveal that a certain sample is necessary to distinguish bona fide features from noise, as only occurs here at $n = 36$.\nWhile the true feature retains about the same persistence (death value less birth value) from diagram to diagram, the persistence of the noise gradually lowers.\n\n## Contribute\n\nTo contribute to **ggtda**, you can create issues for any bugs you find or any suggestions you have on the [issues page](https://github.com/tdaverse/ggtda/issues).\n\nIf you have a feature in mind you think will be useful for others, you can also [fork this repository](https://help.github.com/en/articles/fork-a-repo) and [create a pull request](https://help.github.com/en/articles/creating-a-pull-request).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftdaverse%2Fggtda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftdaverse%2Fggtda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftdaverse%2Fggtda/lists"}