Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/urbanslug/junctions
Compare pangenomes using ED-Strings
https://github.com/urbanslug/junctions
elastic-degenerate-string pangenomics
Last synced: 18 days ago
JSON representation
Compare pangenomes using ED-Strings
- Host: GitHub
- URL: https://github.com/urbanslug/junctions
- Owner: urbanslug
- License: gpl-3.0
- Created: 2022-09-21T05:15:15.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-07-17T06:35:02.000Z (5 months ago)
- Last Synced: 2024-07-17T08:37:24.834Z (5 months ago)
- Topics: elastic-degenerate-string, pangenomics
- Language: C++
- Homepage:
- Size: 3.41 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-pangenomes - junctions - degenerate strings. (A list of software capable of analyzing mainly **eukaryotic** genomes for pangenomics. A new section for microbial genomes has also been added, these tools may not scale to large genomes.)
README
# Pangenome comparison via ED strings
This is a software suite for pangenome comparison via elastic-degenerate (ED) strings.
An ED string is a sequence of sets of strings. Our software currently supports only the DNA alphabet `{A, T, C, G}`, the letter `N` for the indeterminate base, as well as the empty string `Ɛ`. For example, the ED string x below has 3 sets. The first set has two strings (`AGT` and `A`); the second set has three strings (`A`, `T`, and the empty string `Ɛ`), and the third set has one string (`ACGTN`).
```
$ cat x.eds
{AGT,A}{A,T,}{ACGTN}
```## Download the source code
Using `git`
```sh
$ git clone https://github.com/urbanslug/junctions
$ cd junctions
```or using `curl` and `zip`:
```
$ curl -LO https://github.com/urbanslug/junctions/archive/refs/heads/master.zip
$ unzip master.zip
$ cd junctions
```## Compile
Compilation is done with `make` and can be done in different ways.
To create a dynamically linked binary (advisable)
```
$ make
```in case of need of statically linked binary
```
$ make static
```When compiled with MSA support junctions is able to internally convert MSA
files in RC-MSA format to ED strings however this is only supported on newer
x86 processors.To compile a dynamically linked binary
```
$ make WITH_MSA=true
```or for a statically linked binary
```
$ make static WITH_MSA=true
```## Usage and Documentation
Run the following for the help text```
$ ./bin/junctions
```Further documentation can be found in the [wiki](https://github.com/urbanslug/junctions/wiki).
## Example 1: ED string intersection
Consider two ED strings x and y encoded in the corresponding files below:```
$ cat x.eds
{A,AC,TGCT}{CA,}
``````
$ cat y.eds
{,T}{GCA,AC}
```We can determine whether x and y have a nonempty intersection by running the following:
```
$ ./bin/junctions intersect x.eds y.eds
INFO intersection exists
```
Indeed, x and y share the string `AC`.## Example 2: ED Matching Statistics
Consider two ED strings x and y encoded in the corresponding files below:```
$ cat x.eds
{A,AC,TGCT}{CA,}
``````
$ cat y.eds
{,T}{GCA,AC}
```
We can compute their matching statistics by running the following:```
$ ./bin/junctions graph -c 0 -s x.eds y.eds
Similarity measure is: 5
```In particular, the option `-c 0` denotes that no constraint is imposed to the
matching statistics; the option `-s` denotes that a similarity measure will be
computed from the matching statistics.## Example 3: Breakpoint Matching Statistics
In this flavour, matching statistics are considered only between breakpoints.
Consider two ED strings x and y encoded in the corresponding files below:```
$ cat x.eds
{A,AC,TGCT}{CA,}
``````
$ cat y.eds
{,T}{GCA,AC}```
We can compute their breakpoint matching statistics by running the following:```
$ ./bin/junctions graph -c 1 -s x.eds y.eds
Similarity measure is: 3.5
```In particular, the option `-c 1` denotes that a constraint related to breakpoints
is imposed to the matching statistics; the option `-s` denotes that a similarity
measure will be computed from the matching statistics.## Citations
```
Estéban Gabory, Njagi Moses Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba:
Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications. CPM 2023.Estéban Gabory, Njagi Moses Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba:
Pangenome Comparison via ED Strings. [Front. Bioinform.](https://doi.org/10.3389/fbinf.2024.1397036)```