https://github.com/marcom/biostockholm.jl
Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments (Pfam, Rfam, etc)
https://github.com/marcom/biostockholm.jl
Last synced: 4 months ago
JSON representation
Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments (Pfam, Rfam, etc)
- Host: GitHub
- URL: https://github.com/marcom/biostockholm.jl
- Owner: marcom
- License: mit
- Created: 2022-10-06T18:54:18.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-17T20:01:44.000Z (over 1 year ago)
- Last Synced: 2025-02-20T21:23:01.406Z (4 months ago)
- Language: Julia
- Size: 34.2 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BioStockholm.jl
[](https://github.com/marcom/BioStockholm.jl/actions/workflows/CI.yml?query=branch%3Amain)
[](https://github.com/JuliaTesting/Aqua.jl)Julia parser for the [Stockholm file
format](https://en.wikipedia.org/wiki/Stockholm_format) (.sto) used
for multiple sequence alignments of protein, RNA, or DNA sequences
(Pfam, Rfam, etc databases). This package uses
[Automa.jl](https://github.com/BioJulia/Automa.jl) under the hood to
generate a finite state machine parser.## Installation
Enter the package mode from the Julia REPL by pressing `]`, then
install with:```
add BioStockholm
```## Usage
```julia
using BioStockholmmsa = MSA{Char}(;
seq = Dict("human" => "ACACGCGAAA.GCGCAA.CAAACGUGCACGG",
"chimp" => "GAAUGUGAAAAACACCA.CUCUUGAGGACCU",
"bigfoot" => "UUGAG.UUCG..CUCGUUUUCUCGAGUACAC"),
GC = Dict("SS_cons" => "...<<<.....>>>....<<....>>.....")
)# read from file
# example2.sto contains an example Stockholm file
msa_path = joinpath(dirname(pathof(BioStockholm)), "..",
"test", "example2.sto")
msa_str = read(msa_path, String)
print(msa_str)# read from a file or parse from a String
msa = read(msa_path, MSA)
msa = parse(MSA, msa_str)# write to a file
write("foobar.sto", msa)# pretty-print
print(msa)
print(stdout, msa)
```## Limitations / TODO
- when writing, long sequences or text is never split over multiple lines
- integrate with BioJulia string types## Related packages
[MIToS.jl](https://github.com/diegozea/MIToS.jl) is a package for
analysing protein sequences that also supports parsing the Stockholm
format (and many more things).