{"id":20431934,"url":"https://github.com/cxli233/ggpathway","last_synced_at":"2025-04-12T20:52:02.794Z","repository":{"id":150948413,"uuid":"574269912","full_name":"cxli233/ggpathway","owner":"cxli233","description":"A tutorial for pathway visualization using tidyverse, igraph, and ggraph ","archived":false,"fork":false,"pushed_at":"2024-10-26T03:55:18.000Z","size":483,"stargazers_count":56,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T20:51:51.231Z","etag":null,"topics":["data-visualization","pathway-analysis","r"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cxli233.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-12-04T23:25:07.000Z","updated_at":"2025-03-19T13:45:33.000Z","dependencies_parsed_at":"2023-05-06T19:31:18.400Z","dependency_job_id":null,"html_url":"https://github.com/cxli233/ggpathway","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fggpathway","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fggpathway/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fggpathway/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cxli233%2Fggpathway/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cxli233","download_url":"https://codeload.github.com/cxli233/ggpathway/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248631728,"owners_count":21136560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","pathway-analysis","r"],"created_at":"2024-11-15T08:13:25.519Z","updated_at":"2025-04-12T20:52:02.773Z","avatar_url":"https://github.com/cxli233.png","language":null,"funding_links":[],"categories":["Miscellaneous"],"sub_categories":[],"readme":"# ggpathway\nA tutorial for pathway visualization using tidyverse, igraph, and ggraph. \n\n![Krebs cycle](https://github.com/cxli233/ggpathway/blob/main/Results/TCA_2.svg) \n\n[![DOI](https://zenodo.org/badge/574269912.svg)](https://zenodo.org/badge/latestdoi/574269912)\n\n\n# Table of contents\n\n1. [Introduction](https://github.com/cxli233/ggpathway#introduction)\n     - [Dependencies](https://github.com/cxli233/ggpathway#dependencies)\n     - [The theory behind this workflow](https://github.com/cxli233/ggpathway#the-theory-behind-this-workflow)\n     - [Required input](https://github.com/cxli233/ggpathway#required-input)\n2. [Example 1: simple linear pathway](https://github.com/cxli233/ggpathway#example-1-simple-linear-pathway) \n3. [Example 2: more complex pathway](https://github.com/cxli233/ggpathway#example-2-more-complex-pathway)\n4. [Example 3: circular pathway](https://github.com/cxli233/ggpathway#example-3-circular-pathway) \n5. [Subsetting pathway](https://github.com/cxli233/ggpathway#subsetting-pathway)\n6. [Combinding pathways](https://github.com/cxli233/ggpathway#combining-pathways)\n7. [Other examples](https://github.com/cxli233/ggpathway#other-examples)\n   - [Pipeline/workflow visualized as network](https://github.com/cxli233/ggpathway#pipelineworkflow-visualized-as-network)\n   - [Signaling pathway with inhibitory edges](https://github.com/cxli233/ggpathway#signaling-pathway-with-inhibitory-edges)\n\n# Introduction \n\nThis markdown page describes how to make pathway diagrams using ggplot compatible functions. \nIt requires: \n\n* [R](https://cran.r-project.org/)\n* [RStudio](https://posit.co/downloads/)\n* Rmarkdown, can be downloaded using `install.packages(\"rmarkdown\")` in R.\n\n## Dependencies \n\nThe workflow is built upon [tidyverse](https://www.tidyverse.org/) and [igraph](https://igraph.org/).\nInteractions between `ggplot` \u0026 `igraph` functions are achieved via [ggraph](https://ggraph.data-imaginist.com/). \n\nIf you want to read in excel files, you will need the `readxl` package. \n\n```r\nlibrary(tidyverse)\nlibrary(igraph)\nlibrary(ggraph)\n\nlibrary(readxl)\n\nlibrary(viridis)\nlibrary(RColorBrewer)\nlibrary(rcartocolor)\n```\n\nThe rest of the loaded packages are for data visualization only (some nice colors in graphs). \n\n## The theory behind this workflow\n\nTo plot a pathway, we can model the pathway as a network, or a \"graph\" in graph theory. \nIn mathematics, a graph is a structure that models the relationship between objects. \nA network can be constructed by: \n\n* an edge table \n* a node table \n\nFor example, we want to visualize a metabolic pathway. \nIn this context, each metabolite is a node; each enzyme is an edge that connects the metabolites.\nThis concept can be applied to signaling pathways as well, with modifications. \n\nWe will use `tidyverse` functions to handle tabular data operations regarding the edge and node tables.\nWe will then use `igraph` functions to produce a network object from edge and node tables. \nFinally, we will use `ggraph`, a `ggplot` extension of `igraph` to make pretty plots. \n\n## Required input\n\n* Edge table - each row is an edge, with the following columns: \n     - from: where the edge starts, e.g., name of metabolite (**required**).\n     - to: where the edge ends, e.g., name of metabolite (**required**).\n     - label: if you want the edge to be labeled, e.g., name of the enzyme.\n     - other information as different columns, e.g., condition, tissue, cell types... \n\n* Node table - each row is a node, with the following columns: \n     - name: name of the node, e.g., name of the metabolite (**required**).\n     - x: x coordinate of the node on the graph.\n     - y: y coordinate of the node on the graph.\n     - other information as different columns, e.g., molecular weight, localization... \n\nNote that the edge and node tables are tidy data frames. \nEach row is an observation, and each column is a variable. \nAlso note that the union of `from` and `to` columns in the edge table should be identical to the `name` column of the node table. \n\nHopefully the above explanation will become more straightforward when do an example. \nExample input files can be found in the [Data](https://github.com/cxli233/ggpathway/tree/main/Data) folder. \n\n# Example 1: simple linear pathway \n\nWe will start with a simple example, a linear pathway with 3 steps and 4 metabolites. \nWe will use the oxidative segment of [pentose phosphate pathway](https://en.wikipedia.org/wiki/Pentose_phosphate_pathway) as an example.  \n\nThis is a very short pathway, so we can actually write the tables in R by hand. \nWe can write the tables row-by-row using the `tribble()` function in `tidyverse`.\n\n## Edge table\n```r\nexample1_edge_table \u003c- tribble(\n  ~from, ~to,  ~label,\n  \"Glc6P\", \"6P-gluconolactone\",  \"Glc6PHD\",\n  \"6P-gluconolactone\", \"6P-glucoconate\",  \"6P-gluconolactonase\",\n  \"6P-glucoconate\", \"Ru5P\", \"6P-gluconateDH\"\n)\n\nhead(example1_edge_table)\n```\n## Node table\n```r\nexample1_nodes_table \u003c- tribble(\n  ~name, ~x,  ~y,\n  \"Glc6P\", 1, 0,\n  \"6P-gluconolactone\", 2, 0,  \n  \"6P-glucoconate\", 3, 0,\n  \"Ru5P\", 4, 0\n)\n\nhead(example1_nodes_table)\n```\nNotice here I provided a manual layout; each node is given an x and y coordinate. \nFor example, Glc6P will show up at (1, 0) on the graph and so on. \n\n## Make network object and graph \nOnce the node and edge tables are written, we can combined them into a network object. \nWe use the `graph_from_data_frame()` function from `igraph`. \n\n```r\nexample1_network \u003c- graph_from_data_frame(\n  d = example1_edge_table,\n  vertices = example1_nodes_table,\n  directed = T\n)\n```\n\nNote that the `directed` argument is set to `TRUE`. \n\nOnce the network object is made, we can visualize it using `ggraph()`\n\n```r\nggraph(example1_network, layout = \"manual\", \n      x = x, y = y) +\n  geom_node_text(aes(label = name), hjust = 0.5) +\n  geom_edge_link(aes(label = example1_edge_table$label), \n                   angle_calc = 'along',\n                   label_dodge = unit(2, 'lines'),\n                   arrow = arrow(length = unit(0.5, 'lines')), \n                   start_cap = circle(4, 'lines'),\n                   end_cap = circle(4, 'lines')) +\n  theme_void()  \n\nggsave(\"../Results/Pentose_1.svg\", height = 2, width = 6.5, bg = \"white\")\nggsave(\"../Results/Pentose_1.png\", height = 2, width = 6.5, bg = \"white\")\n```\n![OPPP_short](https://github.com/cxli233/ggpathway/blob/main/Results/Pentose_1.svg)\n\nAnd there it is!\nNot very sophisticated, but now we have the frame work to build more complex pathways.  \n\n# Example 2: more complex pathway\n\nFor the 2nd example, let's do a more complex pathway.\nBy more complex I mean more edges and more nodes, as well as branches. \nWe will use the rest of the pentose phosphate pathway. \n\nOnce the pathway gets complex enough, it's better to prepare edge \u0026 node tables in Excel. \nOnce they are written, you can load them into R. \n\n```r\nexample2_edges \u003c- read_excel(\"../Data/OPPP_edges.xlsx\")\nexample2_nodes \u003c- read_excel(\"../Data/OPPP_nodes.xlsx\")\n\nhead(example2_edges)\nhead(example2_nodes)\n```\n\n**Important!** If a compound appears multiple times in the pathway at different locations, each instance *must* have a different name. \n\nIn this example, Xu5P, Glyceral-3P, and Frc-6P all appear twice. \nSo I named them {name}{1} or {name}{2}. \nFor aesthetic purposes, we can make a new column in the node table called \"label\",\nsuch that different nodes can have the same label, but they must have unique names. \n\n```r\nexample2_nodes \u003c- example2_nodes %\u003e% \n  mutate(label = str_remove(name, \"_\\\\d\"))\n\n\nhead(example2_nodes)\n```\n\nI think we are all good to go. \n```r\nexample2_network \u003c- graph_from_data_frame(\n  d = example2_edges,\n  vertices = example2_nodes,\n  directed = T\n)\n```\n\nFor a complex pathway with multiple branch points, instead of manual layout, we can also use the layout methods provides by `igraph` and `ggraph`. \nRead more about layouts [here](https://www.data-imaginist.com/2017/ggraph-introduction-layouts/).\n\n```r\nggraph(example2_network, layout = \"kk\") +\n  geom_node_point(size = 3, aes(fill = as.factor(carbons)), \n                  alpha = 0.8, shape = 21, color = \"grey20\") +\n  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +\n  geom_edge_link(#aes(label = example2_edges$label), \n                   #angle_calc = 'along',\n                   label_dodge = unit(2, 'lines'),\n                   arrow = arrow(length = unit(0.4, 'lines')), \n                   start_cap = circle(1, 'lines'),\n                   end_cap = circle(2, 'lines')) +\n  scale_fill_manual(values = carto_pal(7, \"Vivid\")) +\n  labs(fill = \"Carbons\") +\n  theme_void()  \n\nggsave(\"../Results/Pentose_2.svg\", height = 5, width = 4, bg = \"white\")\nggsave(\"../Results/Pentose_2.png\", height = 5, width = 4, bg = \"white\")\n```\n![OPPP_2](https://github.com/cxli233/ggpathway/blob/main/Results/Pentose_2.svg)\n\nThat looks fine to me. \nI turned off the edge labels, because it's too much text to look at. \nWe can incorporate other info on the graph, such as number of carbons each metabolite has. \nA purpose of the pentose phosphate pathway is to toggle between 6 or 3 carbon molecules for glycolysis and 5 carbon molecules for nucleotide biosynthesis. \n\n\n# Example 3: circular pathway \nFor the next example, let's do a circular pathway. \nAn archtypal example is the TCA cycle, aka the Krebs cycle. \nLet's read in the nodes and edges. \n\n```r\nexample3_edges \u003c- read_excel(\"../Data/TCA_cycle_edges.xlsx\")\nexample3_nodes \u003c- read_excel(\"../Data/TCA_cycle_nodes.xlsx\")\n\nhead(example3_edges)\nhead(example3_nodes)\n```\nIn this example, I also included co-factors (Co-enzymeA, NAD+/NADH, ATP...).\nAgain, when a molecule appears multiple times, each instance *must* have unique names. \nFor aesthetics only, let's make a label column. \n\n```r\nexample3_nodes \u003c- example3_nodes %\u003e% \n  mutate(label = str_remove(name, \"_\\\\d\"))\n\n\nhead(example3_nodes)\n```\n\nI did some high school math to layout the pathway around a circle. \n\n```r\nexample3_network \u003c- graph_from_data_frame(\n  d = example3_edges,\n  vertices = example3_nodes,\n  directed = T\n)\n```\n\n```r\nggraph(example3_network, layout = \"manual\",\n       x = x, y = y) +\n  geom_node_point(size = 3, aes(fill = as.factor(carbons)), \n                  alpha = 0.8, shape = 21, color = \"grey20\") +\n  geom_edge_link(arrow = arrow(length = unit(0.4, 'lines')), \n                   start_cap = circle(0.5, 'lines'),\n                   end_cap = circle(0.5, 'lines'), \n                 width = 1.1, alpha = 0.5) +\n  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +\n  annotate(geom = \"text\", label = \"TCA Cycle\", \n           x = 0, y = 0, size = 5, fontface = \"bold\") +\n  scale_fill_manual(values = carto_pal(7, \"Vivid\")) +\n  labs(fill = \"Carbons\") +\n  theme_void() +\n  coord_fixed()\n\nggsave(\"../Results/TCA_1.svg\", height = 4, width = 5, bg = \"white\")\nggsave(\"../Results/TCA_1.png\", height = 4, width = 5, bg = \"white\")\n```\n![TCA1](https://github.com/cxli233/ggpathway/blob/main/Results/TCA_1.svg)\n\nThis looks fine to me. \nI had to play around with the line and arrow size. \nMaybe I was too ambitious to put all the cofactors on this. \n\n# Subsetting pathway\n\nWe can subset a pathway by removing nodes and edges. \n```r\nexample3_nodes_trim \u003c- example3_nodes %\u003e% \n  filter(carbons != \"cofactor\")\n\nexample3_edges_trim \u003c- example3_edges %\u003e% \n  filter(from %in% example3_nodes_trim$name \u0026\n           to %in% example3_nodes_trim$name)\n```\n\nNow re-make the network object\n```r\nexample3_network_trim \u003c- graph_from_data_frame(\n  d = example3_edges_trim,\n  vertices = example3_nodes_trim,\n  directed = T\n)\n```\n\n```r\nggraph(example3_network_trim, layout = \"manual\",\n       x = x, y = y) +\n  geom_node_point(size = 3, aes(fill = as.factor(carbons)), \n                  alpha = 0.8, shape = 21, color = \"grey20\") +\n  geom_edge_link(arrow = arrow(length = unit(0.4, 'lines')), \n                   start_cap = circle(0.5, 'lines'),\n                   end_cap = circle(1, 'lines'), \n                 width = 1.1, alpha = 0.5) +\n  geom_node_text(aes(label = label), hjust = 0.5, repel = T) +\n  annotate(geom = \"text\", label = \"TCA Cycle\", \n           x = 0, y = 0, size = 5, fontface = \"bold\") +\n  scale_fill_manual(values = carto_pal(7, \"Vivid\")) +\n  labs(fill = \"Carbons\") +\n  theme_void() +\n  coord_fixed()\n\nggsave(\"../Results/TCA_2.svg\", height = 4, width = 5, bg = \"white\")\nggsave(\"../Results/TCA_2.png\", height = 4, width = 5, bg = \"white\")\n```\n![TCA_2](https://github.com/cxli233/ggpathway/blob/main/Results/TCA_2.svg)\n \nThat's it! \n\n# Combining pathways\nWhen we need to combine two pathways, the new edge and node tables are the unions of edges and nodes, respectively. \nThis can be achieved by binding the tables as rows and then removing redundant rows using `distinct(..., .keep.all = T)` .\n\nFirst read in data.\n```r\ncalvin_edges \u003c- read_excel(\"../Data/Calvin_cycle_edges.xlsx\")\ncalvin_nodes \u003c- read_excel(\"../Data/Calvin_cycle_nodes.xlsx\")\n\nPR_edges \u003c- read_excel(\"../Data/Photorespiration_edges.xlsx\")\nPR_nodes \u003c- read_excel(\"../Data/Photorespiration_nodes.xlsx\")\n```\n\nThen combine edges and remove redundant ones.\n```r\ncombined_edges \u003c- rbind(\n  calvin_edges, \n  PR_edges\n) %\u003e% \n  distinct(from, to, .keep_all = T)\n```\n\nThen combine nodes and remove redundant ones.\n```r\ncombined_nodes \u003c- rbind(\n  calvin_nodes %\u003e% \n    select(-carbon),\n  PR_nodes\n) %\u003e% \n  distinct( .keep_all = T)\n```\n\nThen make graph object. \n```r\ncombined_network \u003c- graph_from_data_frame(\n  d = combined_edges,\n  vertices = combined_nodes,\n  directed = T\n)\n```\n\nFinally, plot! \n```r\nggraph(combined_network, layout = \"kk\") +\n  geom_node_point(size = 3, aes(fill = localization), \n                  alpha = 0.8, shape = 21, color = \"grey70\") +\n  geom_edge_link(label_dodge = unit(2, 'lines'),\n                   arrow = arrow(length = unit(0.4, 'lines')), \n                   start_cap = circle(0.75, 'lines'),\n                   end_cap = circle(0.75, 'lines'),\n                 alpha = 0.5, width = 1.1, color = \"grey30\") +\n  geom_node_text(aes(label = name), hjust = 0.5, repel = T) +\n  scale_fill_manual(values = carto_pal(7, \"Vivid\")[c(4, 2, 5)],\n                    limits = c(\"chloroplast\", \"peroxisome\", \"mitochondria\")) +\n  labs(fill = \"Localization\",\n       title = \"Calvin cycle \u0026 photorespiration\") +\n  theme_void() +\n  theme(\n    legend.position = c(0.8, 0.2)\n  ) +\n  scale_y_reverse()\n\nggsave(\"../Results/Calvin_PS_comb.svg\", height = 4.5, width = 5.5, bg = \"white\")\nggsave(\"../Results/Calvin_PS_comb.png\", height = 4.5, width = 5.5, bg = \"white\")\n```\n\n![combined_pathway](https://github.com/cxli233/ggpathway/blob/main/Results/Calvin_PS_comb.svg) \n\nDone!\nExample script on Calvin cycle, photorespiration, and combined can be found [here](https://github.com/cxli233/ggpathway/blob/main/Scripts/calvin_cycle.Rmd). \n\n# Other examples\n## Pipeline/workflow visualized as network\nPipelines and workflows can be visualized as a network using ggraph. \n\n![Example pipeline](https://github.com/cxli233/ggpathway/blob/main/Results/Pipeline.png)\n\nExample script for this pipeline visualization can be found [here](https://github.com/cxli233/ggpathway/blob/main/Scripts/Pipeline_graph.Rmd). \n\n## Signaling pathway with inhibitory edges\nSignaling pathway with inhibitory edges requires additional customization, as activating and repressive interactions require distinct edge shapes. Activating interactions are usually represented by arrows (`-\u003e`), and repressive interactions are usually represented by bars (`-|`). We can use the [ggarrow](https://github.com/teunbrand/ggarrow) package to customize arrow shapes. However, it requires additional tinkering, and for small pathways, I am not sure this is more effective than making the diagram in powerpoint. Here is an example:  \n\n![Ethylene signaling pathway](https://github.com/cxli233/ggpathway/blob/main/Results/Ethylene_Signaling_pathway.png) \n\nExample script for this signaling pathway visualization can be found [here](https://github.com/cxli233/ggpathway/blob/main/Scripts/Inhibitory_edges.Rmd). \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxli233%2Fggpathway","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcxli233%2Fggpathway","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcxli233%2Fggpathway/lists"}