https://github.com/algbio/founderblockgraphs
Constructs segment repeat-free founder block graphs from multiple sequence alignments
https://github.com/algbio/founderblockgraphs
Last synced: 5 months ago
JSON representation
Constructs segment repeat-free founder block graphs from multiple sequence alignments
- Host: GitHub
- URL: https://github.com/algbio/founderblockgraphs
- Owner: algbio
- License: gpl-3.0
- Created: 2020-05-08T07:53:52.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2025-07-16T09:47:06.000Z (12 months ago)
- Last Synced: 2025-07-17T13:27:11.032Z (12 months ago)
- Language: C++
- Homepage:
- Size: 228 KB
- Stars: 3
- Watchers: 7
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# founderblockgraphs
Constructs repeat-free/semi-repeat-free non-elastic/elastic founder graphs from multiple sequence alignments.
# getting started
Clone this repository with dependencies:
```
$ git clone --recurse-submodules https://github.com/algbio/founderblockgraphs.git
$ cd founderblockgraphs
```
Build sdsl-lite-v3:
```
$ cd sdsl-lite-v3
$ ./install.sh .
$ cd ..
```
Build this project (`founderblockgraph`, `locate_multiple`, `locate_patterns`):
```
$ make
```
# usage
```
Usage: founderblockgraph --input=MSA.fasta --output={MSA.index|efg.xgfa} [--gfa]
[--elastic] [--gap-limit=GAPLIMIT] [--threads=THREADNUM]
[--graphviz-output=efg.dot] [--output-paths] [--ignore-chars="ALPHABET"]
Constructs a semi-repeat-free (Elastic) Founder Graph
Input is MSA given in fasta format. In standard mode (without --elastic), rows
with runs of gaps ‘-’ or N’s ≥ GAPLIMIT will be filtered out.
-h, --help Print help and exit
--full-help Print help, including hidden options, and exit
-V, --version Print version and exit
--input=filename MSA input path
--output=filename Index/EFG output path
--gap-limit=GAPLIMIT Gap limit (suppressed by --elastic)
(default=`1')
--graphviz-output=filename
Graphviz output path
--memory-chart-output=filename
Memory chart output path
-e, --elastic Min-max-length semi-repeat-free segmentation
(default=off)
--gfa Saves output in xGFA format (default=off)
-p, --output-paths Print the original sequences as paths of the
xGFA graph (requires --gfa) (default=off)
--ignore-chars=STRING Ignore these characters for the indexability
property/pattern matching
-t, --threads=THREADNUM Max # threads (default=`-1')
```
# todo
- document EFG tricks related to option `--ignore-chars`, to the start and end of sequences, and to initial and ending runs of gaps
- implement validation of .gfa files
- implement pattern matching (`locate_multiple`, `locate_patterns`) on EFGs
- implement min max height segmentation