{"id":13737317,"url":"https://github.com/cxli233/customized_upset_plots","last_synced_at":"2025-10-24T18:32:35.553Z","repository":{"id":45456294,"uuid":"240962726","full_name":"cxli233/customized_upset_plots","owner":"cxli233","description":"Make customized upset plots: from raw data to plots ","archived":false,"fork":false,"pushed_at":"2024-07-11T01:58:47.000Z","size":13255,"stargazers_count":61,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-30T13:32:02.223Z","etag":null,"topics":["data-visualization","r"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cxli233.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-16T20:42:28.000Z","updated_at":"2025-04-04T18:35:32.000Z","dependencies_parsed_at":"2024-11-15T05:42:10.545Z","dependency_job_id":null,"html_url":"https://github.com/cxli233/customized_upset_plots","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fcustomized_upset_plots","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fcustomized_upset_plots/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fcustomized_upset_plots/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fcustomized_upset_plots/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cxli233","download_url":"https://codeload.github.com/cxli233/customized_upset_plots/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251712928,"owners_count":21631464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","r"],"created_at":"2024-08-03T03:01:41.408Z","updated_at":"2025-10-24T18:32:35.546Z","avatar_url":"https://github.com/cxli233.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# Customized Upset Plots\n\n[![DOI](https://zenodo.org/badge/240962726.svg)](https://zenodo.org/badge/latestdoi/240962726)\n\nThis repository contains scripts to produce customized upset plots, an alternative to Venn diagrams. \n\n * Author: Chenxin Li, Ph.D., Assistant Professor at Department of Plant Biology, Michigan State University.  \n * Contact: lichen27@msu.edu | [@chenxinli2.bsky.social‬](https://bsky.app/profile/chenxinli2.bsky.social)\n\nThe `Scripts/` directory contains `.Rmd` files that generate the graphics shown below. \nIt requires R, RStudio, and the rmarkdown package. \n\n* R: [R Download](https://cran.r-project.org/bin/)\n* RStudio: [RStudio Download](https://www.rstudio.com/products/rstudio/download/)\n* rmarkdown can be installed using the intall packages interface in RStudio\n\n# Use bar lengths to present set or subset sizes\nIn Venn diagrams, we use area to represent set or subset sizes. \nHowever, I have found it much easier to discern different lengths than different area. \n\n![Example with 4 sets](https://github.com/cxli233/customized_upset_plots/blob/master/Results/upset_full.svg)\n\nThis is a workflow for set/intersect visualization using [UpSet plots](https://github.com/hms-dbmi/UpSetR). \nThe upstream segment of the workflow (intersect size determination) is based on the re-implementation of `UpSetR` by [ComplexHeatmap](https://jokergoo.github.io/ComplexHeatmap-reference/book/). \nList, data frame, and plot handling was provided by the [tidyverse](https://www.tidyverse.org/). \nLastly, construction of composite plots is provided by [patchwork](https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html).\n\nIn traditional [upset plots](https://upset.app/), intersects/subsets are indicated by dots. \nWhen two dots are connected by a line, it represents the distinct intersect between the two sets. \nSet and intersect sizes are then represented by bars. \n\nThe workflow produces customized upset plot where intersects/subsets are indicated by a heatmap.\nThe customized upset plot has 4 parts: \n\n* The upper left shows the total set sizes.\n* The upper right is legend/color scheme.\n* The lower left is a matrix showing subsets. E.g., when Set 1 and Set 2 are colored, it means the *intersection of Set 1 \u0026 Set 2, but not in any other sets.*\n* The lower right shows the sizes of subsets.   \n\n# Subsetting which intersect to show\nWith upset plot, you can subset which intersect to show. \nE.g., if I only want to show intersects involving Set 3, I can do that. \n\n![Example only involving Set 3](https://github.com/cxli233/customized_upset_plots/blob/master/Results/upset_3only.svg)\n\n# Extending the upset plot to visualize other variables \nIn addition, upset plots can be extended. \nMean separation plots (e.g., box plot, bar plot) and annotations (heatmaps) can be added to the sides of the upset plot using `patchwork`.\n\n![Example with extended lower right corner](https://github.com/cxli233/customized_upset_plots/blob/master/Results/upset_extended.svg) \n\n# Try it out with real data!\nI also provided some example data. \nData from [Li et al., 2020, Genome Research](https://www.ncbi.nlm.nih.gov/pubmed/31896557)\n\n![Example with real data](https://github.com/cxli233/customized_upset_plots/blob/master/Results/real_data_upset_extended.svg)\n\n# Dependencies \n```{r}\nlibrary(tidyverse) \nlibrary(patchwork) \nlibrary(ComplexHeatmap)\n\nlibrary(RVenn) # Only required if you want Venn diagrams \nlibrary(RColorBrewer) # This is for the colors only, not actually necessary\n```\n\nAuxiliary dependencies: \n\n* For 2-3 sets, Venn diagrams can be made readily using the [RVenn package](https://cran.r-project.org/web/packages/RVenn/vignettes/vignette.html). The `ggVenn()` function from `RVenn` produces a ggplot object that is a Venn Diagram. \n* The official way to install `ComplexHeatmap` is via `devtools::install_github(\"jokergoo/ComplexHeatmap\")`, which requires the `devtools` package.\n* For mean separation plots, a suggested package is [ggbeeswarm](https://github.com/eclarke/ggbeeswarm), a violin plot, but with actual data points. \n* For color palettes, suggested are `viridis` and `RColorBrewer` packages. \n* If you want to save plot as .svg file, you may need the R package `svglite`. If you are using Mac, you may need to install [XQuart](https://www.xquartz.org/).\n\n# Getting started\nHere are example scripts for 3 sets. \nThe workflow is scalable to more sets, as intersect size calculation is automatic (provided by `ComplexHeatmap`). \nHowever, as the number of sets increases, the number of subsets increases geometrically, and thus filtering for subset of interest will be important. \nThe easiest way to use this workflow is copy the code from this README file, or download one of the .Rmd files from the [Scripts/ folder](https://github.com/cxli233/customized_upset_plots/tree/master/Scripts). \nThen modify the code to suit your data and taste. \n\n# Data \n```{r}\nmy_list \u003c- list(\n  data1 = letters[1:10], \n  data2 = letters[3:13], \n  data3 = letters[6:18])\n```\nThe required input is a list of vectors. \n\n# If you want a Venn diagram\n```{r}\nmy_object \u003c- RVenn::Venn(my_list)\n\nggvenn(\n  my_object, slice = 1:3, \n  thickness = 0.5,\n  alpha = 0.5, \n  fill = brewer.pal(8, \"Set2\")\n) +\n  theme_void() +\n  theme(\n    legend.position = \"none\"\n  )\n\nggsave(\"../Results/VennDiagram_quick_start.svg\", height = 4, width = 4, bg = \"white\")\nggsave(\"../Results/VennDiagram_quick_start.png\", height = 4, width = 4, bg = \"white\")\n```\n![example venn diagram](https://github.com/cxli233/customized_upset_plots/blob/master/Results/VennDiagram_quick_start.svg)\n\n`ggVenn()` only goes up to 3 sets. For more sets, it is better to use upset plot. \n\n# ComplexHeatmap for heavy lifting \n```{r}\ncomb_mat \u003c- make_comb_mat(my_list)\nmy_names \u003c- set_name(comb_mat)\n```\n`make_comb_mat()` from `ComplexHeatmap` calculate intersect/subset sizes. \n`make_comb_mat()` produces a matrix object from the list of vectors. The matrix itself can be filtered for intersects/subsets of interst.  \nFor examples, see [this .Rmd file](https://github.com/cxli233/customized_upset_plots/blob/master/Scripts/customized_upset_plot_2023_01_19_downstream_of_complexheatmap.Rmd) at section \"Subsetting the intersects\".  \n\n**The rest of code is to produce the 4 pieces that make up a customized upset plot. Since every step along the way is customizable, the result can be highly tailored towards the needs and taste of the user.**\n\n## Total set size \n```{r}\nmy_set_sizes \u003c- set_size(comb_mat) %\u003e% \n  as.data.frame() %\u003e% \n  rename(sizes = \".\") %\u003e% \n  mutate(Set = row.names(.)) \n\np1 \u003c- my_set_sizes %\u003e% \n  mutate(Set = reorder(Set, sizes)) %\u003e% \n  ggplot(aes(x = Set, y= sizes)) +\n  geom_bar(stat = \"identity\", aes(fill = Set), alpha = 0.8, width = 0.7) +\n  geom_text(aes(label = sizes), \n            size = 5, angle = 90, hjust = 0, y = 1) +\n  scale_fill_manual(values = brewer.pal(4, \"Set2\"),  # feel free to use some other colors  \n                     limits = my_names) + \n  labs(x = NULL,\n       y = \"Set size\",\n       fill = NULL) +\n  theme_classic() +\n  theme(legend.position = \"right\",\n        text = element_text(size= 14),\n        axis.ticks.y = element_blank(),\n        axis.text = element_blank()\n        ) \n```\n\n## Legend \nIt's not easy to extract legend.\nBut we can write a function for that.  \n```{r}\nget_legend \u003c- function(p) {\n  tmp \u003c- ggplot_gtable(ggplot_build(p))\n  leg \u003c- which(sapply(tmp$grobs, function(x) x$name) == \"guide-box\")\n  legend \u003c- tmp$grobs[[leg]]\n  legend\n}\n\np2 \u003c- get_legend(p1)\n```\n\n## Overlap sizes\n```{r}\nmy_overlap_sizes \u003c- comb_size(comb_mat) %\u003e% \n  as.data.frame() %\u003e% \n  rename(overlap_sizes = \".\") %\u003e% \n  mutate(category = row.names(.))\n\np3 \u003c- my_overlap_sizes %\u003e% \n  mutate(category = reorder(category, -overlap_sizes)) %\u003e% \n  ggplot(aes(x = category, y = overlap_sizes)) +\n  geom_bar(stat = \"identity\", fill = \"grey80\", color = NA, alpha = 0.8, width = 0.7) +\n  geom_text(aes(label = overlap_sizes, y = 0), \n            size = 5, hjust = 0, vjust = 0.5) +\n  labs(y = \"Intersect sizes\",\n       x = NULL) +\n  theme_classic() +\n  theme(text = element_text(size= 14, color = \"black\"),\n        axis.text =element_blank(),\n        axis.ticks.x = element_blank(),\n        axis.title.x = element_text(hjust = 0),\n        ) +\n  coord_flip()\n```\n\n## Overlap matrix \n```{r}\nmy_overlap_matrix \u003c- str_split(string = my_overlap_sizes$category, pattern = \"\", simplify = T) %\u003e% \n  as.data.frame() \n\ncolnames(my_overlap_matrix) \u003c- my_names\n\nmy_overlap_matrix_tidy \u003c- my_overlap_matrix %\u003e% \n  cbind(category = my_overlap_sizes$category) %\u003e% \n  pivot_longer(cols = !category, names_to = \"Set\", values_to = \"value\") %\u003e% \n  full_join(my_overlap_sizes, by = \"category\") %\u003e% \n  full_join(my_set_sizes, by = \"Set\")\n\np4 \u003c- my_overlap_matrix_tidy %\u003e% \n  mutate(category = reorder(category, -overlap_sizes)) %\u003e%  \n  mutate(Set = reorder(Set, sizes)) %\u003e%  \n  ggplot(aes(x = Set, y = category))+\n  geom_tile(aes(fill = Set, alpha = value), color = \"grey30\", size = 1) +\n  scale_fill_manual(values = brewer.pal(4, \"Set2\"), # feel free to use other colors \n                    limits = my_names) +\n  scale_alpha_manual(values = c(0.8, 0),  # color the grid for 1, don't color for 0. \n                     limits = c(\"1\", \"0\")) +\n  labs(x = \"Sets\",  \n       y = \"Overlap\") +\n  theme_minimal() +\n  theme(legend.position = \"none\",\n        text = element_text(color = \"black\", size= 14),\n        panel.grid = element_blank(),\n        axis.text = element_blank()\n        )\n```\n\n# Put them together \n```{r}\nwrap_plots(p1, p2, p4, p3, \n          nrow = 2, \n          ncol = 2,\n          heights = c(1, 2), # the more rows in the lower part, the longer it should be\n          widths = c(1, 0.8),\n          guides = \"collect\") \u0026\n  theme(legend.position = \"none\")\n\nggsave(\"../Results/quick_start.svg\", height = 3.5, width = 3, bg = \"white\") \n# this should be a tall \u0026 skinny plot \n# I prefer .svg, but you can also save as phd or png \n# I will open up the .svg file and mannually adjust the size until it's good\n# check that nothing is cut off from the plot \n# png is for twitter posting \nggsave(\"../Results/quick_start.png\", height = 3.5, width = 3, bg = \"white\")\n```\n![quick start](https://github.com/cxli233/customized_upset_plots/blob/master/Results/quick_start.svg)\n\n# Conclusions\nI hope you like it and find it pretty. \nIf you use this code for a publication, I'd greatly appreciate if you can cite or acknowledge this repository. \nDOI: 10.5281/zenodo.7555525  \n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxli233%2Fcustomized_upset_plots","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcxli233%2Fcustomized_upset_plots","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxli233%2Fcustomized_upset_plots/lists"}