Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cxli233/ggtraces

A tidyverse and grammar of graphics powered line traces visualizer
https://github.com/cxli233/ggtraces

chromatogram-traces data-visualization line-traces r

Last synced: about 1 month ago
JSON representation

A tidyverse and grammar of graphics powered line traces visualizer

Awesome Lists containing this project

README

        

# ggtraces - A tidyverse and grammar of graphics powered line traces visualizer

v.1.0.0 release:
[![DOI](https://zenodo.org/badge/516867891.svg)](https://zenodo.org/badge/latestdoi/516867891)

Author: Chenxin Li, Ph.D., Assistant Research Scientist at Department of Crop & Soil Sciences and Center for Applied Genetic Technologies, University of Georgia.

Contact: [[email protected]]([email protected])

The main goal of this repository is to empower R users such that we can produce publication quality chromatograms with R.
Examples and explanations are below.

The `Scripts/` directory contains `.Rmd` files that generate the graphics shown below.
It requires R, RStudio, and the rmarkdown package.

* R: [R Download](https://cran.r-project.org/bin/)
* RStudio: [RStudio Download](https://www.rstudio.com/products/rstudio/download/)
* rmarkdown can be installed using the intall packages interface in RStudio

# Table of contents

1. [Dependencies](https://github.com/cxli233/ggtraces#dependencies)
2. [Required input](https://github.com/cxli233/ggtraces#required-input)
3. [Functions generated by the workflow](https://github.com/cxli233/ggtraces#functions-defined-by-the-workflow)
4. [Example output](https://github.com/cxli233/ggtraces#example-output)
5. [Real datasets](https://github.com/cxli233/ggtraces#real-datasets)
- [LC-MS example](https://github.com/cxli233/ggtraces#lc-ms-data)
- [Metagene example](https://github.com/cxli233/ggtraces#metagene-data)
6. [Getting started](https://github.com/cxli233/ggtraces#getting-started)
7. [Example script](https://github.com/cxli233/ggtraces#example-script)
- [Load data](https://github.com/cxli233/ggtraces#load-data)
- [Rename columns](https://github.com/cxli233/ggtraces#rename-columns)
- [Run ggtraces functions](https://github.com/cxli233/ggtraces#run-ggtrace-functions-one-by-one)
- [Final touches](https://github.com/cxli233/ggtraces#final-touches)
8. [Comparison of perspectives](https://github.com/cxli233/ggtraces#comparison-of-perspectives)
9. [Additional features](https://github.com/cxli233/ggtraces#additional-features)
- [Facet plot](https://github.com/cxli233/ggtraces#facet-plot)
- [Pherogram](https://github.com/cxli233/ggtraces#pherogram)

# Dependencies
```{r}
library(tidyverse)
```
This is a tidyverse based workflow.

# Required input
The workflow requires the input data to be in the tidy format (each row is an observation, and each column is a variable).

It requires the following 3 columns:

1. The column named `x`, which will be the x axis
2. the column named `y`, which will be the y axis
3. A `sample` column that indicates the sample ID of each of the traces.

Addition required values:

1. a vector of sample IDs
2. x_offset, default = 0.2
3. y_offset, default = 0.4
4. number of traces to plot

# Functions defined by the workflow
This workflow defines a 6 functions in this order:

1. `find_xy_ranges()` takes the tidy input data frame and finds xmin, xmax, ymin, and ymax.
2. `make_grid_table()` takes the ranges produced by `find_xy_ranges()` and produce a data frame that will be used to make the coordinate system. Additionally, it requires `x_offset` and `y_offset` and `number_of_traces`.
3. `make_axis_table()` takes the ranges produced by `find_xy_ranges()` and produce a data frame that will be used to make the coordinate system.
4. `make_coord()` takes the output of `find_xy_ranges()`, `make_grid_table()`, `make_axis_table`, to make a ggplot object that is a blank coordinate system. It also requires `x_offset` and `y_offset` and `number_of_traces`.
5. `map_sample_to_trace()` takes a vector of sample IDs and produce a data frame that maps sample IDs to traces (column of 1 to n).
6. `plot_traces()` takes the output of all the above and produce a ggplot object.

# Example output
As a example, let's visualize two sine waves.

The workflow first generates a blank coordinate system, which is a ggplot object (a "grob").

* The coordinate system is definbed by x and y value ranges, as well as number of traces to graph.
* The perspective of the coordinate system is defined by `x_offset` and `y_offset`.

![Example blank coord](https://github.com/cxli233/ggtraces/blob/main/Results/blank_coord.svg)

Again, the blank coordinate is a "grob" object.
We can add ggplot layers to, such as geom, scale, theme, and so on.

The trace plot in its most basic form, is the blank coordinate system + `geom_line()` to plot the line traces.

![Example trace plot](https://github.com/cxli233/ggtraces/blob/main/Results/example_1.svg)

This is showing two sine ways aligned along a parallelogram.
This is a grob object.
We can add more ggplot layers to it if needed, such as replacing the default color palette.
Usually it requires some final touches to make it look nicer.

![Example trace plot, but nicer](https://github.com/cxli233/ggtraces/blob/main/Results/example_1_nicer.svg)

# Real datasets
The best way to use this tool is running `ggtraces.Rmd` in the same environment (same RStudio window) in a different tab.
Doing so will deposite the functions needed into the environment.
Then you can simply call the functions one-by-one.

I tried out two real datasets that are very different.
The first one is LC-MS data.
Data from [Li et al., 2022](https://www.biorxiv.org/content/10.1101/2022.07.04.498697v1)
The second one is small RNA metagene (averaged gene) data.
Data from [Li et al., 2020](https://genome.cshlp.org/content/30/2/173.short) and [Li et al., 2022](https://genome.cshlp.org/content/32/2/309.short).

Running `ggtraces_uses.Rmd` in the `Scripts/` directory will generate these graphs.

## LC-MS data
![LC-MS example](https://github.com/cxli233/ggtraces/blob/main/Results/LC_MS_example.svg)
This is showing the base peak chromatograms (normalized to higest peak) of two samples.

## Metagene data
![Metagene example](https://github.com/cxli233/ggtraces/blob/main/Results/metagene_example.svg)
This is showing normalized coverage of 24-nt siRNAs (per 1000 24-nt siRNAs) arround transcription start sites, averaged across all genes.

# Getting started

1. Clone the repository to your machine.
2. Run `ggtraces.Rmd` under `Scripts/`. You will need to install the rmarkdown package.
3. Call each function in order.
4. Make final touches (e.g., adjust axis range, axis label, color palette, and so on)
5. Done!

# Example script
## Load data
```{r}
metagene <- read_csv("../Data/metagene.csv", col_types = cols())
```
This is already a tidy data frame.
If your data table is not in the tidy format, you'll need to re-format it first.

## Rename columns
```{r}
metagene_2 <- metagene %>%
dplyr::rename(x = `bin start`,
sample = sample_type) %>%
mutate(y = mena_pro_24 * 1000)
```

The workflow requires `x`, `y`, and `sample` columns.

## Run ggtrace functions one by one
```{r}
example3_ranges <- find_xy_ranges(metagene_2)
example3_grid_table <- make_grid_table(example3_ranges, x_offset = 200, y_offset = 150, number_traces = 5)
example3_axis_table <- make_axis_table(example3_ranges)

example3_coord <- make_coord(
grid_table = example3_grid_table,
axis_table = example3_axis_table,
ranges = example3_ranges,
number_traces = 5,
x_offset = 200,
y_offset = 150
)

example3_names <- c("sperm", "egg", "zygote", "seedling")
example3_mapping <- map_sample_to_trace(example3_names)

example3_traces <- plot_traces(
data = metagene_2,
coord = example3_coord,
mapping = example3_mapping,
x_offset = 200,
y_offset = 150,
ranges = example3_ranges,
x_title = "Position relative to TSS",
y_title = "Normalized\ncoverage",
sample_ID_title = "Cell type"
)
```
* You will need to provide `x_offset`, `y_offset`, and `number_of_traces`. These values differ across experiments.
* You will need to provide the names of the traces. They are prodived via `example3_names <- c("sperm", "egg", "zygote", "seedling")`.

## Final touches
Manually adjust axis breaks, axis range, color palette, and axis title position.
Since `example3_traces` is a ggplot object, we can easily make additional customizations.

```{r}
example3_traces +
geom_segment(x = -Inf, xend = -Inf, y = 0, yend = 800, size = 1.1, color = "grey20") +
geom_segment(x = -3000, xend = 2000, y = -Inf, yend = -Inf, size = 1.1, color = "grey20") +
scale_color_manual(values = c("dodgerblue2", "tomato1", "violetred4", "seagreen"),
limits = example3_mapping$sample) +
scale_y_continuous(breaks = c(0, 200, 400, 600, 800)) +
theme(legend.position = "top",
axis.title.y = element_text(hjust = 0.4))
```
![Metagene example](https://github.com/cxli233/ggtraces/blob/main/Results/metagene_example.svg)
Done!

# Comparison of perspectives
Different `x_offset` and `y_offset` values changes the apparence of the final product.
![LC MS different perspectives](https://github.com/cxli233/ggtraces/blob/main/Results/LC_MS_perspectives.svg)

* High x_offset and low y_offset facilitate comparisons along y axis. It gives the sensation that we are looking at the graph from the side.
* Low x_offset and high y_offset facilitate comparisons along x axis. It gives the sensation that we are looking at the graph from the top.

# Additional features
## Facet plot
Facet plot is a plot type where each line trace gets its own x and y axis.

```{r}
plot_facet(LC_MS_data_2, x_title = "Retention time (min)", y_title = "Relative intensity") +
scale_color_manual(values = brewer.pal(8, "Set2")[c(1,4)])
```
![LC MS facet plot](https://github.com/cxli233/ggtraces/blob/main/Results/LC_MS_facet.svg)

The `plot_facet()` function requires the tidy data frame as input. `x_title` and `y_title` are optional.
Defaults are "x" and "y", respectively.

## Pherogram
Pherogram is short for electropherogram, where we imagine the traces are moving down a gel.
The original y value is now represented as color intensity in the heat map.

```{r}
plot_pherogram(data = metagene_2,
y_title = "Position relative to TSS",
legend_title = "Normalized\ncoverage",
mapping = example3_mapping)
```
![Metagene pherogram](https://github.com/cxli233/ggtraces/blob/main/Results/metagene_pherogram.svg)

The `plot_pherogram()` function requires the tidy data frame as input.
`y_title` argument controls the y axis title (default = "x"), since it was the x value in the original line traces.
`legend_title` argument controls the title of the color scale (default = "y"), since it was the y value in the origal line traces.