https://github.com/simonhmartin/asynt
Genome alignment and synteny plots
https://github.com/simonhmartin/asynt
Last synced: 2 months ago
JSON representation
Genome alignment and synteny plots
- Host: GitHub
- URL: https://github.com/simonhmartin/asynt
- Owner: simonhmartin
- License: gpl-3.0
- Created: 2021-10-05T15:16:30.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-06-05T16:32:04.000Z (about 3 years ago)
- Last Synced: 2023-06-05T17:30:06.256Z (about 3 years ago)
- Language: R
- Size: 810 KB
- Stars: 13
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-genome-visualization - asynt - genome-visualization/asynt.png) (Comparative)
README
# Asynt: R functions for exploring synteny using whole genome alignments
* Make diagonal 'dot' plots
* Plot alignment tracts between a pair of genomes
* Merge adjacent alignments into synteny 'blocks' for cleaner plots
See our [paper](https://doi.org/10.1098/rstb.2021.0207) for examples of the plots you can make (see Figures 1 and S1)
## How to use this code
Make sure you have the R package "Intervals" installed.
If you already have alignment coordinate files from minimap2 (recommended) or mummer (using nucmer and show-coords), you are ready to go.
Open the script `asynt_example_plots.R` in an interactive R session (e.g. Rstudio) and work through it line by line to explore the kinds of plots you can make.
## Where do I get alignments from?
Make alignemnts between two assemblies (or a single assembly) using [minimap2](https://github.com/lh3/minimap2) or [mummer](https://mummer4.github.io/)
Here is an example command for minimap2:
`minimap2 -x asm20 reference.fa query.fasta | gzip > mm2asm20.paf.gz`
`-x asm20` uses presets suited for genomes up to 20% divergent.
## How does asynt infer synteny blocks?
There are some sophisticated tools that use probabilistic approaches for infering synteny blocks. This is not one of those.
The algorithm has three steps:
1. Alignments are split into ‘sub-blocks’ that each correspond to a unique tract of the reference assembly.
2. Sub-blocks below a minimum size are discarded.
3. Adjacent sub-blocks that are in the same orientation and are below some threshold distance apart are merged to yield syntenic blocks.
These three steps can be performed iteratively to first identify regions of fine-scale synteny and build these up into larger syntenic blocks (discarding short overlaps, small inversions etc).
The nature of this approach means that you will get a different result depending on what you use as the reference. If possible, use a reference that represents the ancestral state, such that your query genome is being represented as a new arrangement of ancestral blocks.