https://github.com/sammyjava/pangenomics

Java code writtten for pangenomics work
https://github.com/sammyjava/pangenomics

Last synced: 8 months ago
JSON representation

Java code writtten for pangenomics work

Host: GitHub
URL: https://github.com/sammyjava/pangenomics
Owner: sammyjava
License: mit
Created: 2020-05-29T13:30:03.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-01-05T13:53:14.000Z (almost 4 years ago)
Last Synced: 2025-01-09T08:53:48.199Z (10 months ago)
Language: Java
Size: 5.52 MB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

This directory contains classes for working with pan-genomic graphs and frequented regions, based on the paper
```
Cleary, et al., "Exploring Frequented Regions in Pan-Genomic Graphs", IEEE/ACM Trans Comput Biol Bioinform. 2018 Aug 9. PMID:30106690 DOI:10.1109/TCBB.2018.2864564
```
This work was funded in part by the National Center for Genome Resources, Santa Fe, NM.

## Building
The project is set up with dependencies managed with the [Gradle build tool](https://gradle.org/). To build the distribution, simply run
```
$ ./gradlew installDist
```
This will create a distribution under `build/install` that is used by the various run scripts.

### org.ncgr.pangenomics
This contains two packages with similarly-named classes:

**org.ncgr.pangenomics.allele** which contains classes for working with allele-based sequence graphs
**org.ncgr.pangenomics.genotype** which contains classes for working with genotype graphs

Basic graph-related classes, not particularly specific to frequented regions:

`PangenomicGraph` extends org.jgrapht.graph.DirectedAcyclicGraph and stores a graph with methods for reading it in from files and various output methods.
There is a `main` class for creating a graph from input data such as a GFA or VCF file.

`Node` encapsulates a node in a Graph: its ID (a long) and, for sequence graphs, its sequence.

`NodeSet` encapsulates a set of nodes in a Graph. NodeSet implements Comparable. There is a method `merge()` for merging two NodeSets.
(These are called "node clusters" in the paper above, but since I've implemented it as an extension of TreeSet, I've used "Set").

`Path` encapsulates a path through a Graph, along with its full sequence in the case of sequence graphs.

### org.ncgr.pangenomics.[allele/genotype].fr
Frequented regions-related code.

`FrequentedRegion` stores a FrequentedRegion, containing a NodeSet along with the supporting subpaths of the full set of Paths in a Graph, and lots of methods.

`FRFinder` contains a `main()` method for finding FRs based on a bunch of parameters.

`FRPair` is a utility class that contains two FRs and the result of merging them, and is used in the search loop in `FRFinder.findFRs()`.

### org.ncgr.svm
LIBSVM-based Support Vector Machine classes.

### org.ncgr.weka
Weka-based supervised classifier classes.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sammyjava/pangenomics

Awesome Lists containing this project

README