Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nschan/syri_gggenomes
Plot syri output via gggenomes
https://github.com/nschan/syri_gggenomes
Last synced: 8 days ago
JSON representation
Plot syri output via gggenomes
- Host: GitHub
- URL: https://github.com/nschan/syri_gggenomes
- Owner: nschan
- Created: 2024-09-12T14:09:18.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-16T12:47:50.000Z (about 2 months ago)
- Last Synced: 2024-09-16T14:54:40.179Z (about 2 months ago)
- Language: R
- Size: 1.48 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Parse SyRi into R and plot with gggenomes
================
Niklas Schandry# About
I was looking for a way to plot syri-output, similar to what
[`plotsr`](https://github.com/schneebergerlab/plotsr/) does, but with
easier costumization and in R. I could not find anything, so I wrote
something. The files included here in data for demonstration are the
plotsr example files.This requires ‘tidyverse’ (only `dplyr` and `vroom`) and
[`gggenomes`](https://github.com/thackl/gggenomes). This repo also comes
with a snapshot that can be used with `renv::restore()```` r
renv::install("tidyverse","thackl/gggenomes")
`````` r
library(tidyverse)
library(gggenomes)
library(magrittr)
source("functions/parse_syri.R")
```The calculation of polygons for curves between sequences is directly
lifted from [`GENESPACE`](https://github.com/jtlovell/GENESPACE)# Input
The script expects the syri output to be named
`genomeA_on_genomeB.syri.out`, and will split based on this. There is
*no* flexibility here.# Running
In this example, genomeA is col and genomeB is ler.
``` r
dat <- parse_syri("data/col_on_ler.syri.out",
order = data.frame(bin_id = c("col","ler")))
```## Created seqtab
## Created links
## Calculating polygons
# Plotting
``` r
gggenomes::gggenomes(seqs = dat$seqs,
links = dat$links) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.6
) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.8
) +
geom_seq(linewidth = 1) +
geom_bin_label(size=7) +
syri_plot_fills +
ggtitle("Synteny between Col and Ler")
```![](parse_files/figure-gfm/unnamed-chunk-4-1.png)
# Options
## No resizing
By default, short syntenic regions larger than 5000bp are resized to
make them visible. Since this does not reflect the original input, this
can be disabled:``` r
dat <- parse_syri("data/col_on_ler.syri.out",
order = data.frame(bin_id = c("col","ler")),
resize_polygons = F)
```## Created seqtab
## Created links
## Calculating polygons
``` r
gggenomes::gggenomes(seqs = dat$seqs,
links = dat$links) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.6
) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.8
) +
geom_seq(linewidth = 1) +
geom_bin_label(size=7) +
syri_plot_fills +
ggtitle("Synteny between Col and Ler without resizing")
```![](parse_files/figure-gfm/unnamed-chunk-6-1.png)
## Minimum resize size
Only regions larger than `min_polygon_feat_size` are resized (default
5000), this can be modified to also include smaller regions``` r
dat <- parse_syri("data/col_on_ler.syri.out",
order = data.frame(bin_id = c("col","ler")),
resize_polygons = T,
min_polygon_feat_size = 1000)
```## Created seqtab
## Created links
## Calculating polygons
Naturally, this will create a busier plot.
``` r
gggenomes::gggenomes(seqs = dat$seqs,
links = dat$links) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.6
) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.8
) +
geom_seq(linewidth = 1) +
geom_bin_label(size=7) +
syri_plot_fills +
ggtitle("Synteny between Col and Ler, resizing regions larger than 999bp")
```![](parse_files/figure-gfm/unnamed-chunk-8-1.png)
## Resize output size
Regions are resized to have a certain length relative to the chromosome,
controlled by `resize_polygons_size`, which defaults to `0.003` (0.3%)
of the chromosome length. Altering this will make resized regions
larger, or smaller.``` r
dat <- parse_syri("data/col_on_ler.syri.out",
order = data.frame(bin_id = c("col","ler")),
resize_polygons = T,
resize_polygons_size = 0.01)
```## Created seqtab
## Created links
## Calculating polygons
This will produce wider polygons for resized links.
``` r
gggenomes::gggenomes(seqs = dat$seqs,
links = dat$links) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.6
) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.8
) +
geom_seq(linewidth = 1) +
geom_bin_label(size=7) +
syri_plot_fills +
ggtitle("Synteny between Col and Ler")
```![](parse_files/figure-gfm/unnamed-chunk-10-1.png)
# Multiple genomes
Comparing two genomes is nice, but more are better.
`parse_syri` can handle multiple outputs in one go:
``` r
file_list <- list.files("data", full.names = T)
syri_order <- data.frame(bin_id = c("col", "ler", "cvi", "eri"))
dat <- parse_syri(file_list, order = syri_order)
```## Created seqtab
## Created links
## Calculating polygons
Making a plot from this works the same way of making a plot of only one
comparison. The order of sequences is set via the `order` argument to
`parse_syri()```` r
gggenomes::gggenomes(seqs = dat$seqs,
links = dat$links) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.6
) +
geom_polygon(
data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
aes(
x = x,
y = y,
fill = type,
group = link_grp
),
alpha = 0.8
) +
geom_seq(linewidth = 1) +
geom_bin_label(size=7) +
syri_plot_fills +
ggtitle("Synteny between Col - Ler - Cvi - Eri")
```![](parse_files/figure-gfm/unnamed-chunk-12-1.png)