An open API service indexing awesome lists of open source software.

https://github.com/BioJulia/BioMarkovChains.jl

Representing biological sequences as Markov chains
https://github.com/BioJulia/BioMarkovChains.jl

bioinformatics biology dna julia julialang markov-chain

Last synced: about 2 months ago
JSON representation

Representing biological sequences as Markov chains

Awesome Lists containing this project

README

          




Representing biological sequences as Markov chains


[![Documentation](https://img.shields.io/badge/documentation-online-blue.svg?logo=Julia&logoColor=white)](https://biojulia.dev/BioMarkovChains.jl/dev/)
[![Latest Release](https://img.shields.io/github/release/BioJulia/BioMarkovChains.jl.svg)](https://github.com/BioJulia/BioMarkovChains.jl/releases/latest)
[![DOI](https://zenodo.org/badge/665161607.svg)](https://zenodo.org/badge/latestdoi/665161607)


[![CI Workflow](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/BioJulia/BioMarkovChains.jl/blob/main/LICENSE)
[![Work in Progress](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![Downloads](https://img.shields.io/badge/dynamic/json?url=http%3A%2F%2Fjuliapkgstats.com%2Fapi%2Fv1%2Fmonthly_downloads%2FBioMarkovChains&query=total_requests&suffix=%2Fmonth&label=Downloads)](http://juliapkgstats.com/pkg/BioMarkovChains)
[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)
[![JET](https://img.shields.io/badge/%E2%9C%88%EF%B8%8F%20tested%20with%20-%20JET.jl%20-%20red)](https://github.com/aviatesk/JET.jl)

***

# BioMarkovChains

> A Julia package to represent biological sequences as Markov chains

## Installation


BioMarkovChains is a  


Julia Language

  package. To install BioMarkovChains,
please open
Julia's interactive session (known as REPL)
and press ]
key in the REPL to use the package mode, then type the following command

```julia
pkg> add BioMarkovChains
```

## Creating Markov chain out of DNA sequences

An important step before developing several gene finding algorithms consist of having a Markov chain representation of the DNA. To do so, we implemented the `BioMarkovChain` method that will capture the initials and transition probabilities of a DNA sequence (`LongSequence`) and will create a dedicated object storing relevant information of a DNA Markov chain. Here an example:

Let find one ORF in a random `LongDNA` :

```julia
using BioSequences, BioMarkovChains, GeneFinder

# > 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)
seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"

orfseq = findorfs(seq)[3] |> sequence

21nt DNA Sequence:
ATGCGTCGAATGGCACGGTGA
```

If we translate it, we get a 7aa sequence:

```julia
translate(orfseq)

7aa Amino Acid Sequence:
MRRMAR*
```

Now supposing I do want to see how transitions are occurring in this ORF sequence, the I can use the `BioMarkovChain` method and tune it to 2nd-order Markov chain:

```julia
BioMarkovChain(orfseq, 2)

BioMarkovChain of DNA alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.25 0.25 0.0 0.5
0.25 0.0 0.75 0.0
0.25 0.25 0.25 0.25
0.0 0.25 0.75 0.0
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.2 0.2 0.4 0.2

```

But I can also have a BioMarkovChain instance of the Ammino Acid sequence:

```julia
BioMarkovChain(translate(orfseq), 2)

BioMarkovChain of AminoAcid alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(20 × 20):
0.0 1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.333 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
- Initial Probabilities -> Vector{Float64}(20 × 1):
0.167 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0

```

This is useful to later create HMMs and calculate sequence probability based on a given model, for instance we now have the *E. coli* CDS and No-CDS transition models or Markov chain implemented:

```julia
ECOLICDS

BioMarkovChain of DNA alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.31 0.224 0.199 0.268
0.251 0.215 0.313 0.221
0.236 0.308 0.249 0.207
0.178 0.217 0.338 0.267
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.245 0.243 0.273 0.239
```

What is then the probability of the previous DNA sequence given this model?

```julia
markovprobability(orfseq, model=ECOLICDS, logscale=true)

-39.71754773536592
```

This is off course not very informative, but we can later use different criteria to then classify new ORFs. For a more detailed explanation see the [docs](https://BioJulia.dev/BioMarkovChains.jl/dev/)