{"id":13788584,"url":"https://github.com/const-ae/ggupset","last_synced_at":"2025-05-15T01:06:40.512Z","repository":{"id":45006733,"uuid":"171710862","full_name":"const-ae/ggupset","owner":"const-ae","description":"Combination matrix axis for 'ggplot2' to create 'UpSet' plots","archived":false,"fork":false,"pushed_at":"2025-02-11T11:11:53.000Z","size":2768,"stargazers_count":362,"open_issues_count":18,"forks_count":25,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-13T22:39:38.651Z","etag":null,"topics":["ggplot","ggplot-extension","r","upset"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/const-ae.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-20T16:39:47.000Z","updated_at":"2025-03-31T09:20:40.000Z","dependencies_parsed_at":"2025-02-28T15:12:35.489Z","dependency_job_id":"ee41751d-c982-4c08-a2b4-5b6395f17f2c","html_url":"https://github.com/const-ae/ggupset","commit_stats":{"total_commits":64,"total_committers":5,"mean_commits":12.8,"dds":0.125,"last_synced_commit":"2a03c58d46f35625189910ae24cd8926aac12fd4"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fggupset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fggupset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fggupset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fggupset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/const-ae","download_url":"https://codeload.github.com/const-ae/ggupset/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254254041,"owners_count":22039792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ggplot","ggplot-extension","r","upset"],"created_at":"2024-08-03T21:00:50.579Z","updated_at":"2025-05-15T01:06:35.461Z","avatar_url":"https://github.com/const-ae.png","language":"R","funding_links":[],"categories":["R","Presentation, composition and scales","ggplot"],"sub_categories":["Additional Plot Types"],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"70%\",\n  dpi = 150\n  \n)\nset.seed(1)\n```\n# ggupset\n\nPlot a combination matrix instead of the standard x-axis and create UpSet plots with ggplot2.\n\n\u003cimg src=\"man/figures/README-violinexample-1.png\" width=\"70%\" /\u003e\n\n## Installation\n\nYou can install the released version of ggupset from [CRAN](https://cran.r-project.org/package=ggupset) with:\n\n``` r\n# Download package from CRAN\ninstall.packages(\"ggupset\")\n\n# Or get the latest version directly from GitHub\ndevtools::install_github(\"const-ae/ggupset\")\n```\n\n## Example\n\nThis is a basic example which shows you how to solve a common problem:\n\n```{r example}\n# Load helper packages\nlibrary(ggplot2)\nlibrary(tidyverse, warn.conflicts = FALSE)\n\n# Load my package\nlibrary(ggupset)\n```\n\nIn the following I will work with a tidy version of the movies dataset from\nggplot. It contains a list of all movies in IMDB, their release data and other\ngeneral information on the movie. It also includes a `list` column that contains\nannotation to which genre a movie belongs (Action, Drama, Romance etc.)\n\n```{r}\ntidy_movies\n```\n\n\n\n\n`ggupset` makes it easy to get an immediate impression how many movies are in each\ngenre and their combination. For example there are slightly more than 1200 Dramas\nin the set, more than 1000 which don't belong to any genre and ~170 that are Comedy\nand Drama.\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_upset(n_intersections = 20)\n```\n\n## Adding Numbers on top\n\nThe best feature about `ggupset` is that it plays well with existing tricks from `ggplot2`. For example, you can easily add the size of the counts on top of the bars with this trick from [stackoverflow](https://stackoverflow.com/a/26556180/604854)\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    geom_text(stat='count', aes(label=after_stat(count)), vjust=-1) +\n    scale_x_upset(n_intersections = 20) +\n    scale_y_continuous(breaks = NULL, lim = c(0, 1350), name = \"\")\n```\n\n\n\n\n## Reshaping quadratic data\n\nOften enough the raw data you are starting with is not in such a neat tidy\nshape. But that is a prerequisite to make such `ggupset` plots, so how can you get\nfrom wide dataset to a useful one? And how to actually create a `list`-column, anyway?\n\nImagine we measured for a set of genes if they are a member of certain pathway. \nA gene can be a member of multiple pathways and we want to see which pathways\nhave a large overlap. Unfortunately, we didn't record the data in a tidy format\nbut as a simple matrix.\n\nA ficitional dataset of this type is provided as `gene_pathway_membership` variable\n\n```{r}\ndata(\"gene_pathway_membership\")\ngene_pathway_membership[, 1:7]\n```\n\n\nWe will now turn first turn this matrix into a tidy tibble and then plot it\n\n```{r}\ntidy_pathway_member \u003c- gene_pathway_membership %\u003e%\n  as_tibble(rownames = \"Pathway\") %\u003e%\n  gather(Gene, Member, -Pathway) %\u003e%\n  filter(Member) %\u003e%\n  select(- Member)\n\ntidy_pathway_member\n```\n\n`tidy_pathway_member` is already a very good starting point for plotting with \n`ggplot`. But we care about the genes that are members of multiple pathways so\nwe will aggregate the data by `Gene` and create a `list`-column with the `Pathway`\ninformation.\n\n```{r}\ntidy_pathway_member %\u003e%\n  group_by(Gene) %\u003e%\n  summarize(Pathways = list(Pathway))\n```\n\n\n```{r}\ntidy_pathway_member %\u003e%\n  group_by(Gene) %\u003e%\n  summarize(Pathways = list(Pathway)) %\u003e%\n  ggplot(aes(x = Pathways)) +\n    geom_bar() +\n    scale_x_upset()\n```\n\n\n\n## What if I need more flexibility?\n\nThe first important idea is to realize that a list column is just as good as a character\nvector with the list elements collapsed\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  mutate(Genres_collapsed = sapply(Genres, function(x) paste0(sort(x), collapse = \"-\"))) %\u003e%\n  select(title, Genres, Genres_collapsed)\n```\n\nWe can easily make a plot using the strings as categorical axis labels\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  mutate(Genres_collapsed = sapply(Genres, function(x) paste0(sort(x), collapse = \"-\"))) %\u003e%\n  ggplot(aes(x=Genres_collapsed)) +\n    geom_bar() +\n    theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))\n```\n\nBecause the process of collapsing list columns into delimited strings is fairly generic,\nI provide a new scale that does this automatically (`scale_x_mergelist()`).\n\n```{R}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_mergelist(sep = \"-\") +\n    theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))\n```\n\nBut the problem is that it can be difficult to read those labels.\nInstead I provide a third function that replaces the axis labels\nwith a combination matrix (`axis_combmatrix()`).\n\n```{R}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_mergelist(sep = \"-\") +\n    axis_combmatrix(sep = \"-\")\n```\n\n\nOne thing that is only possible with the `scale_x_upset()` function is to automatically order\nthe categories and genres by `freq` or by `degree`.\n\n```{R}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_upset(order_by = \"degree\")\n```\n\n\n## Styling\n\nTo make publication ready plots, you often want to have complete control\nhow each part of a plot looks. This is why I provide an easy way to style\nthe combination matrix. Simply add a `theme_combmatrix()` to the plot.\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_upset(order_by = \"degree\") +\n    theme_combmatrix(combmatrix.panel.point.color.fill = \"green\",\n                     combmatrix.panel.line.size = 0,\n                     combmatrix.label.make_space = FALSE)\n```\n\n## Maximum Flexibility\n\nSometimes the limited styling options using `combmatrix.panel.point.color.fill` are not enough. To fully customize the combination matrix plot, `axis_combmatrix` has an `override_plotting_function` parameter, that allows us to plot anything in place of the combination matrix.\n\nLet us first reproduce the standard combination plot, but use the `override_plotting_function` parameter to see how it works:\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_mergelist(sep = \"-\") +\n    axis_combmatrix(sep = \"-\", override_plotting_function = function(df){\n      ggplot(df, aes(x= at, y= single_label)) +\n        geom_rect(aes(fill= index %% 2 == 0), ymin=df$index-0.5, ymax=df$index+0.5, xmin=0, xmax=1) +\n        geom_point(aes(color= observed), size = 3) +\n        geom_line(data= function(dat) dat[dat$observed, ,drop=FALSE], aes(group = labels), linewidth= 1.2) +\n        ylab(\"\") + xlab(\"\") +\n        scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) +\n        scale_fill_manual(values= c(`TRUE` = \"white\", `FALSE` = \"#F7F7F7\")) +\n        scale_color_manual(values= c(`TRUE` = \"black\", `FALSE` = \"#E0E0E0\")) +\n        guides(color=\"none\", fill=\"none\") +\n        theme(\n          panel.background = element_blank(),\n          axis.text.x = element_blank(),\n          axis.ticks.y = element_blank(),\n          axis.ticks.length = unit(0, \"pt\"),\n          axis.title.y = element_blank(),\n          axis.title.x = element_blank(),\n          axis.line = element_blank(),\n          panel.border = element_blank()\n        )\n    })\n```\n\nWe can use the above template, to specifically highlight for example all sets that include the _Action_ category.\n\n```{r}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_mergelist(sep = \"-\") +\n    axis_combmatrix(sep = \"-\", override_plotting_function = function(df){\n      print(class(df))\n      print(df)\n      df %\u003e%\n        mutate(action_movie = case_when(\n          ! observed ~ \"not observed\",\n          map_lgl(labels_split, ~ \"Action\" %in% .x) ~ \"Action\",\n          observed ~ \"Non-Action\"\n        )) %\u003e%\n        ggplot(aes(x = at, y = single_label)) +\n          geom_rect(aes(fill = index %% 2 == 0), ymin=df$index-0.5, ymax=df$index+0.5, xmin=0, xmax=1) +\n          geom_point(aes(color = action_movie), size = 3) +\n          geom_line(data= function(dat) dat[dat$observed, ,drop=FALSE], aes(group = labels, color = action_movie), linewidth= 1.2) +\n          ylab(\"\") + xlab(\"\") +\n          scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) +\n          scale_fill_manual(values= c(`TRUE` = \"white\", `FALSE` = \"#F7F7F7\")) +\n          scale_color_manual(values= c(\"Action\" = \"red\", \"Non-Action\" = \"black\", \"not observed\" = \"lightgrey\")) +\n          guides(fill=\"none\") +\n          theme(\n            legend.position = \"bottom\",\n            panel.background = element_blank(),\n            axis.text.x = element_blank(),\n            axis.ticks.y = element_blank(),\n            axis.ticks.length = unit(0, \"pt\"),\n            axis.title.y = element_blank(),\n            axis.title.x = element_blank(),\n            axis.line = element_blank(),\n            panel.border = element_blank()\n          )\n    }) +\n    theme(combmatrix.label.total_extra_spacing = unit(30, \"pt\"))\n```\n\nThe `override_plotting_function` is incredibly powerful, but also an advanced feature that comes with pitfalls. Use at your own risk.\n\n\n## Alternative Packages\n\nThere is already a package called `UpSetR` ([GitHub](https://github.com/hms-dbmi/UpSetR),\n[CRAN](https://cran.r-project.org/package=UpSetR)) that provides very similar functionality\nand that heavily inspired me to write this package.\nIt produces a similar plot with an additional view that shows the overall size\nof each genre.\n\n```{r}\n\n# UpSetR\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  unnest(cols = Genres) %\u003e%\n  mutate(GenreMember=1) %\u003e%\n  pivot_wider(names_from = Genres, values_from = GenreMember, values_fill = list(GenreMember = 0)) %\u003e%\n  as.data.frame() %\u003e%\n  UpSetR::upset(sets = c(\"Action\", \"Romance\", \"Short\", \"Comedy\", \"Drama\"), keep.order = TRUE)\n\n# ggupset\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_upset(order_by = \"degree\", n_sets = 5)\n```\n\nThe `UpSetR` package provides a lot convenient helpers around this kind of plot; the main\nadvantage of my package is that it can be combined with any kind of ggplot\nthat uses a categorical x-axis. This additional flexibility can be useful if\nyou want to create non-standard plots. The following plot for example shows\nwhen movies of a certain genre were published.\n\n```{r violinexample}\ntidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres, y=year)) +\n    geom_violin() +\n    scale_x_upset(order_by = \"freq\", n_intersections = 12)\n```\n\n\n# Advanced examples\n\n#### 1. Complex experimental design\n\nThe combination matrix axis can be used to show complex experimental designs,\nwhere each sample got a combination of different treatments.\n\n```{r}\ndf_complex_conditions\n\ndf_complex_conditions %\u003e%\n  mutate(Label = pmap(list(KO, DrugA, Timepoint), function(KO, DrugA, Timepoint){\n    c(if(KO) \"KO\" else \"WT\", if(DrugA == \"Yes\") \"Drug\", paste0(Timepoint, \"h\"))\n  })) %\u003e%\n  ggplot(aes(x=Label, y=response)) +\n    geom_boxplot() +\n    geom_jitter(aes(color=KO), width=0.1) +\n    geom_smooth(method = \"lm\", aes(group = paste0(KO, \"-\", DrugA))) +\n    scale_x_upset(order_by = \"degree\",\n                  sets = c(\"KO\", \"WT\", \"Drug\", \"8h\", \"24h\", \"48h\"),\n                  position=\"top\", name = \"\") +\n    theme_combmatrix(combmatrix.label.text = element_text(size=12),\n                     combmatrix.label.extra_spacing = 5)\n```\n\n\n#### 2. Aggregation of information\n\n`dplyr` currently does not support list columns\nas grouping variables. In that case it makes\nsense to collapse it manually and use the\n`axis_combmatrix()` function to get a good looking\nplot.\n\n```{r}\n# Percentage of votes for n stars for top 12 genres\navg_rating \u003c- tidy_movies %\u003e%\n  mutate(Genres_collapsed = sapply(Genres, function(x) paste0(sort(x), collapse=\"-\"))) %\u003e%\n  mutate(Genres_collapsed = fct_lump(fct_infreq(as.factor(Genres_collapsed)), n=12)) %\u003e%\n  group_by(stars, Genres_collapsed) %\u003e%\n  summarize(percent_rating = sum(votes * percent_rating)) %\u003e%\n  group_by(Genres_collapsed) %\u003e%\n  mutate(percent_rating = percent_rating / sum(percent_rating)) %\u003e%\n  arrange(Genres_collapsed)\n\navg_rating\n\n# Plot using the combination matrix axis\n# the red lines indicate the average rating per genre\nggplot(avg_rating, aes(x=Genres_collapsed, y=stars)) +\n    geom_tile(aes(fill=percent_rating)) +\n    stat_summary_bin(aes(y=percent_rating * stars), fun = sum,  geom=\"point\", \n                     shape=\"—\", color=\"red\", size=6) +\n    axis_combmatrix(sep = \"-\", levels = c(\"Drama\", \"Comedy\", \"Short\", \n                    \"Documentary\", \"Action\", \"Romance\", \"Animation\", \"Other\")) +\n    scale_fill_viridis_c()\n\n```\n\n\n## Saving Plots\n\nThere is an important pitfall when trying to save a plot with a combination matrix.\nWhen you use `ggsave()`, ggplot2 automatically saves the last plot that was created.\nHowever, here `last_plot()` refers to only the combination matrix. To store the full\nplot, you need to explicitly assign it to a variable and save that.\n```{r warning=FALSE}\npl \u003c- tidy_movies %\u003e%\n  distinct(title, year, length, .keep_all=TRUE) %\u003e%\n  ggplot(aes(x=Genres)) +\n    geom_bar() +\n    scale_x_upset(n_intersections = 20)\nggsave(\"/tmp/movie_genre_barchart.png\", plot = pl)\n```\n\n\n## Session Info\n\n```{r}\nsessionInfo()\n```\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fggupset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconst-ae%2Fggupset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fggupset/lists"}