Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cmdcolin/maf2bed
Converts multiple alignment format (MAF) files to a bed file for tabixing
https://github.com/cmdcolin/maf2bed
Last synced: 4 days ago
JSON representation
Converts multiple alignment format (MAF) files to a bed file for tabixing
- Host: GitHub
- URL: https://github.com/cmdcolin/maf2bed
- Owner: cmdcolin
- Created: 2023-11-01T06:52:28.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2023-12-30T01:21:24.000Z (11 months ago)
- Last Synced: 2024-04-15T02:14:36.615Z (7 months ago)
- Language: Rust
- Size: 13.7 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# maf2bed
Used to convert multiple alignment format (MAF) files to a bed tabix-y style
format. Used in jbrowse mafviewer plugin
https://github.com/cmdcolin/jbrowse-plugin-mafviewer## Install
```
cargo install maf2bed
maf2bed --help
```## Usage
Make sure to specify the 'assembly name' being used as the reference for the
bed file as the first argument to maf2bedExample
```
export LC_ALL=c # improves sorting speed#parallel compression/decompression
pigz -dc file.maf.gz | maf2bed hg19 | sort -k1,1 -k2,2n | bgzip -@8 > file.bed.gztabix file.bed.gz
```Might be able to remove the sort/LC_ALL=c in some cases, but sorting ensures
that it will be ready for tabix## Footnote 1
Converted to rust from perl as a coding exercise mostly, gaining a modest
speedup on the way https://twitter.com/cmdcolin/status/1719608993310486883## Footnote 2
There are likely many ways to end up with a MAF file, but one way is to export the MAF from a pangenome graph
This page discusses some examples
https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/progressive.md#using-the-hal-output
using "--dupeMode all" is, afaik, recommended for the purposes of the JBrowse 2 mafviewer plugin, because using "--dupeMode single" can cause missing blocks of data
## MotivationI wanted to try using the bigMaf (bigBed based) format ecosystem with large MAF
files but bedToBigBed doesn't support streaming or reading compressed files(?),
so that requires reading big files on disk and in memory. in contrast, MAF
tabix type approach like implemented here can be compressed and streaming which
allows much lower memory usage and disk space