https://github.com/BioJulia/BioMarkovChains.jl
Representing biological sequences as Markov chains
https://github.com/BioJulia/BioMarkovChains.jl
bioinformatics biology dna julia julialang markov-chain
Last synced: about 2 months ago
JSON representation
Representing biological sequences as Markov chains
- Host: GitHub
- URL: https://github.com/BioJulia/BioMarkovChains.jl
- Owner: BioJulia
- License: mit
- Created: 2023-07-11T15:11:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-09T05:52:17.000Z (10 months ago)
- Last Synced: 2024-12-11T11:28:40.038Z (10 months ago)
- Topics: bioinformatics, biology, dna, julia, julialang, markov-chain
- Language: Julia
- Homepage: https://biojulia.dev/BioMarkovChains.jl/dev/
- Size: 3.57 MB
- Stars: 8
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
Representing biological sequences as Markov chains
[](https://biojulia.dev/BioMarkovChains.jl/dev/)
[](https://github.com/BioJulia/BioMarkovChains.jl/releases/latest)
[](https://zenodo.org/badge/latestdoi/665161607)
[](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml)
[](https://github.com/BioJulia/BioMarkovChains.jl/blob/main/LICENSE)
[](https://www.repostatus.org/#wip)
[](http://juliapkgstats.com/pkg/BioMarkovChains)
[](https://github.com/JuliaTesting/Aqua.jl)
[](https://github.com/aviatesk/JET.jl)***
# BioMarkovChains
> A Julia package to represent biological sequences as Markov chains
## Installation
BioMarkovChains is a
![]()
Julia Language
package. To install BioMarkovChains,
please open
Julia's interactive session (known as REPL) and press ]
key in the REPL to use the package mode, then type the following command```julia
pkg> add BioMarkovChains
```## Creating Markov chain out of DNA sequences
An important step before developing several gene finding algorithms consist of having a Markov chain representation of the DNA. To do so, we implemented the `BioMarkovChain` method that will capture the initials and transition probabilities of a DNA sequence (`LongSequence`) and will create a dedicated object storing relevant information of a DNA Markov chain. Here an example:
Let find one ORF in a random `LongDNA` :
```julia
using BioSequences, BioMarkovChains, GeneFinder# > 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)
seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"orfseq = findorfs(seq)[3] |> sequence
21nt DNA Sequence:
ATGCGTCGAATGGCACGGTGA
```If we translate it, we get a 7aa sequence:
```julia
translate(orfseq)7aa Amino Acid Sequence:
MRRMAR*
```Now supposing I do want to see how transitions are occurring in this ORF sequence, the I can use the `BioMarkovChain` method and tune it to 2nd-order Markov chain:
```julia
BioMarkovChain(orfseq, 2)BioMarkovChain of DNA alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.25 0.25 0.0 0.5
0.25 0.0 0.75 0.0
0.25 0.25 0.25 0.25
0.0 0.25 0.75 0.0
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.2 0.2 0.4 0.2```
But I can also have a BioMarkovChain instance of the Ammino Acid sequence:
```julia
BioMarkovChain(translate(orfseq), 2)BioMarkovChain of AminoAcid alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(20 × 20):
0.0 1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.333 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
- Initial Probabilities -> Vector{Float64}(20 × 1):
0.167 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0```
This is useful to later create HMMs and calculate sequence probability based on a given model, for instance we now have the *E. coli* CDS and No-CDS transition models or Markov chain implemented:
```julia
ECOLICDSBioMarkovChain of DNA alphabet and order 1:
- Transition Probability Matrix -> Matrix{Float64}(4 × 4):
0.31 0.224 0.199 0.268
0.251 0.215 0.313 0.221
0.236 0.308 0.249 0.207
0.178 0.217 0.338 0.267
- Initial Probabilities -> Vector{Float64}(4 × 1):
0.245 0.243 0.273 0.239
```What is then the probability of the previous DNA sequence given this model?
```julia
markovprobability(orfseq, model=ECOLICDS, logscale=true)-39.71754773536592
```This is off course not very informative, but we can later use different criteria to then classify new ORFs. For a more detailed explanation see the [docs](https://BioJulia.dev/BioMarkovChains.jl/dev/)