Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dzhang32/ggtranscript

Visualizing transcript structure and annotation using ggplot2
https://github.com/dzhang32/ggtranscript

gene-annotation ggplot-extension transcripts visualization

Last synced: 4 months ago
JSON representation

Visualizing transcript structure and annotation using ggplot2

Host: GitHub
URL: https://github.com/dzhang32/ggtranscript
Owner: dzhang32
License: other
Created: 2021-12-23T15:28:55.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2022-08-10T21:10:44.000Z (almost 2 years ago)
Last Synced: 2024-01-17T03:18:32.436Z (6 months ago)
Topics: gene-annotation, ggplot-extension, transcripts, visualization
Language: R
Homepage: https://dzhang32.github.io/ggtranscript/
Size: 22.5 MB
Stars: 108
Watchers: 3
Forks: 7
Open Issues: 9
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Lists

repo-5916-awesome-genome-visualization - ggtranscript - genome-visualization/ggtranscript.png) (Gene structure)
awesome-genome-visualization - ggtranscript

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

    collapse = TRUE,

    comment = "#>",

    fig.path = "man/figures/README-",

    out.width = "100%",

    dpi = 300

)

```

# ggtranscript 

[![GitHub issues](https://img.shields.io/github/issues/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/issues)

[![GitHub pulls](https://img.shields.io/github/issues-pr/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/pulls)

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

[![R-CMD-check-bioc](https://github.com/dzhang32/ggtranscript/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/dzhang32/ggtranscript/actions)

[![Codecov test coverage](https://codecov.io/gh/dzhang32/ggtranscript/branch/main/graph/badge.svg)](https://app.codecov.io/gh/dzhang32/ggtranscript?branch=main)

`ggtranscript` is a `ggplot2` extension that makes it to easy to visualize transcript structure and annotation. 

## Installation

```{r "install_dev", eval = FALSE}

# you can install the development version of ggtranscript from GitHub:

# install.packages("devtools")

devtools::install_github("dzhang32/ggtranscript")

```

## Usage

`ggtranscript` introduces 5 new geoms (`geom_range()`, `geom_half_range()`, `geom_intron()`, `geom_junction()` and `geom_junction_label_repel()`) and several helper functions designed to facilitate the visualization of transcript structure and annotation. The following guide takes you on a quick tour of using these geoms, for a more detailed overview see the [Getting Started tutorial](https://dzhang32.github.io/ggtranscript/articles/ggtranscript.html).

`geom_range()` and `geom_intron()` enable the plotting of exons and introns, the core components of transcript annotation. `ggtranscript` also provides `to_intron()`, which converts exon co-ordinates to the corresponding introns. Together, `ggtranscript` enables users to plot transcript structures with only exons as the required input and just a few lines of code.  

```{r geom-range-intron}

library(magrittr)

library(dplyr)

library(ggplot2)

library(ggtranscript)

# to illustrate the package's functionality

# ggtranscript includes example transcript annotation

sod1_annotation %>% head()

# extract exons

sod1_exons <- sod1_annotation %>% dplyr::filter(type == "exon")

sod1_exons %>%

    ggplot(aes(

        xstart = start,

        xend = end,

        y = transcript_name

    )) +

    geom_range(

        aes(fill = transcript_biotype)

    ) +

    geom_intron(

        data = to_intron(sod1_exons, "transcript_name"),

        aes(strand = strand)

    )

```

`ggtranscript` provides the helper function `shorten_gaps()`, which reduces the size of the gaps. `shorten_gaps()` then rescales the exon and intron co-ordinates to preserve the original exon alignment. This allows you to hone in the differences in the exonic structure, which can be particularly useful if the transcript has relatively long introns. 

```{r shorten-gaps}

sod1_rescaled <- shorten_gaps(

  sod1_exons, 

  to_intron(sod1_exons, "transcript_name"), 

  group_var = "transcript_name"

  )

sod1_rescaled %>%

    dplyr::filter(type == "exon") %>%

    ggplot(aes(

        xstart = start,

        xend = end,

        y = transcript_name

    )) +

    geom_range(

      aes(fill = transcript_biotype)

    ) +

    geom_intron(

        data = sod1_rescaled %>% dplyr::filter(type == "intron"), 

        arrow.min.intron.length = 200

    )

```

`geom_range()` can be used for any range-based genomic annotation. For example, when plotting protein-coding transcripts, users may find it helpful to visually distinguish the coding segments from UTRs. 

```{r geom-range-intron-w-cds}

# filter for only exons from protein coding transcripts

sod1_exons_prot_cod <- sod1_exons %>%

    dplyr::filter(transcript_biotype == "protein_coding")

# obtain cds

sod1_cds <- sod1_annotation %>% dplyr::filter(type == "CDS")

sod1_exons_prot_cod %>%

    ggplot(aes(

        xstart = start,

        xend = end,

        y = transcript_name

    )) +

    geom_range(

        fill = "white",

        height = 0.25

    ) +

    geom_range(

        data = sod1_cds

    ) +

    geom_intron(

        data = to_intron(sod1_exons_prot_cod, "transcript_name"),

        aes(strand = strand),

        arrow.min.intron.length = 500,

    )

```

`geom_half_range()` takes advantage of the vertical symmetry of transcript annotation by plotting only half of a range on the top or bottom of a transcript structure. One use case of `geom_half_range()` is to visualize the differences between transcript structure more clearly. 

```{r geom-half-range, fig.height = 3}

# extract exons and cds for the two transcripts to be compared

sod1_201_exons <- sod1_exons %>% dplyr::filter(transcript_name == "SOD1-201")

sod1_201_cds <- sod1_cds %>% dplyr::filter(transcript_name == "SOD1-201")

sod1_202_exons <- sod1_exons %>% dplyr::filter(transcript_name == "SOD1-202")

sod1_202_cds <- sod1_cds %>% dplyr::filter(transcript_name == "SOD1-202")

sod1_201_202_plot <- sod1_201_exons %>%

    ggplot(aes(

        xstart = start,

        xend = end,

        y = "SOD1-201/202"

    )) +

    geom_half_range(

        fill = "white",

        height = 0.125

    ) +

    geom_half_range(

        data = sod1_201_cds

    ) +

    geom_intron(

        data = to_intron(sod1_201_exons, "transcript_name")

    ) +

    geom_half_range(

        data = sod1_202_exons,

        range.orientation = "top",

        fill = "white",

        height = 0.125

    ) +

    geom_half_range(

        data = sod1_202_cds,

        range.orientation = "top",

        fill = "purple"

    ) +

    geom_intron(

        data = to_intron(sod1_202_exons, "transcript_name")

    )

sod1_201_202_plot

```

As a `ggplot2` extension, `ggtranscript` inherits the the familiarity and functionality of `ggplot2`. For instance, by leveraging `coord_cartesian()` users can zoom in on regions of interest. 

```{r geom-half-range-zoomed, fig.height = 3}

sod1_201_202_plot + coord_cartesian(xlim = c(31659500, 31660000))

```

`geom_junction()` enables to plotting of junction curves, which can be overlaid across transcript structures. `geom_junction_label_repel()` adds a label to junction curves, which can often be useful to mark junctions with a metric of their usage such as read counts. 

```{r geom-junction, fig.height = 3}

# ggtranscript includes a set of example (unannotated) junctions

# originating from GTEx and downloaded via the Bioconductor package snapcount

sod1_junctions

# add transcript_name to junctions for plotting

sod1_junctions <- sod1_junctions %>%

    dplyr::mutate(transcript_name = "SOD1-201")

sod1_201_exons %>%

  ggplot(aes(

    xstart = start,

    xend = end,

    y = transcript_name

  )) +

  geom_range(

    fill = "white", 

    height = 0.25

  ) +

  geom_range(

    data = sod1_201_cds

  ) + 

  geom_intron(

    data = to_intron(sod1_201_exons, "transcript_name")

  ) + 

  geom_junction(

    data = sod1_junctions,

    junction.y.max = 0.5

  ) +

  geom_junction_label_repel(

    data = sod1_junctions,

    aes(label = round(mean_count, 2)),

    junction.y.max = 0.5

  )

```

Alternatively, users may prefer to map junction read counts to the thickness of the junction curves. As a `ggplot2` extension, this can be done intuitively by modifying the size `aes()` of `geom_junction()`. In addition, by modifying `ggplot2` scales and themes, users can easily create informative, publication-ready plots.

```{r geom-junction-pub, fig.height = 3}

sod1_201_exons %>%

  ggplot(aes(

    xstart = start,

    xend = end,

    y = transcript_name

  )) +

  geom_range(

    fill = "white", 

    height = 0.25

  ) +

  geom_range(

    data = sod1_201_cds

  ) + 

  geom_intron(

    data = to_intron(sod1_201_exons, "transcript_name")

  ) + 

  geom_junction(

    data = sod1_junctions,

    aes(size = mean_count),

    junction.y.max = 0.5, 

    ncp = 30, 

    colour = "purple"

  ) + 

  scale_size_continuous(range = c(0.1, 1), guide = "none") + 

  xlab("Genomic position (chr21)") + 

  ylab("Transcript name") + 

  theme_bw()

```

## Citation

```{r citing-ggtranscript}

citation("ggtranscript")

```

## Credits

* `ggtranscript` was developed using `r BiocStyle::Biocpkg("biocthis")`.