An open API service indexing awesome lists of open source software.

https://github.com/mhahsler/rmsa

Interface to Popular Multiple Sequence Alignment Tools - R-package
https://github.com/mhahsler/rmsa

bioinformatics sequence-alignment

Last synced: about 1 month ago
JSON representation

Interface to Popular Multiple Sequence Alignment Tools - R-package

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r echo=FALSE, results = 'asis'}
pkg <- "rMSA"

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg, CRAN = FALSE)
```

Seamlessly interfaces Multiple Sequence Alignment software packages including

* ClustalW
* MAFFT
* MUSCLE
* Kalign
* Boxshade

with the Bioconductor infrastructure. The tools have to be installed separately.

In addition, distances between sets of sequences can be calculated providing.

* k-mer-based distances
* Jensen-Shannon divergence (JSD) between FFPs (Sims and Kim, 2011)
* Cosine distance is used between Composition Vectors (Qi et al, 2007)
* Numerical summarization vector distance (Nagar and Hahsler, 2013)
* Jaccard distance between sets of k-mers.
* Distance based on SimRank (Santis et al, 2011)

* Edit distance/alignment-based distances
* Edit (Levenshtein) Distance (package pwalign)
* Distance based on alignment scores (package pwalign)

* Evolutionary distances
* Evolutionary distances (package ape)

A short guide with examples can be found
[here](http://github.com/mhahsler/rMSA/raw/master/vignettes_real/rMSA.pdf).

```{r echo=FALSE, results = 'asis'}
pkg_install(pkg, CRAN = FALSE)
```

Additional installation instructions can be found [here](INSTALL).

## Examples

Align sequences using clustalW.

```{r}
library("rMSA")

rna <- readRNAStringSet(system.file("examples/RNA_example.fasta", package = "rMSA"))
rna

al <- clustal(rna)
al
```
Cluster mutations of a sequence using SimRank

```{r}
s <- random_sequences(len = 100, number = 1)
ms <- mutations(s, number = 20)
dSimRank <- distSimRank(ms)
plot(as.dendrogram(hclust(dSimRank)), horiz=TRUE, type="triangle")
```

# How to Cite the Use of this Package

```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg)
```

# References

Gao, L; Qi, J (2007 Mar 15). "Whole genome molecular phylogeny of large dsDNA viruses using composition vector method.". BMC evolutionary biology 7: 41. PMID 17359548.

Hahsler M, Nagar A (2024). _rMSA: Interface for Popular Multiple Sequence Alignment Tools_. R package version
0.99.1.

Anurag Nagar; Michael Hahsler (2013). "Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment." BMC Bioinformatics, 14(Suppl. 11), 2013

Santis et al, Simrank: Rapid and sensitive general-purpose k-mer search tool, BMC Ecology 2011, 11:11

Sims, GE; Kim, SH (2011 May 17). "Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs).". Proceedings of the National Academy of Sciences of the United States of America 108 (20): 8329-34. PMID 21536867.

Qi J, Wang B, Hao B: Whole Proteome Prokaryote Phylogeny without Sequence Alignment: A K-String Composition Approach. Journal of Molecular Evolution 2004, 58:1-11.