Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sammyjava/pangenomics
Java code writtten for pangenomics work
https://github.com/sammyjava/pangenomics
Last synced: 1 day ago
JSON representation
Java code writtten for pangenomics work
- Host: GitHub
- URL: https://github.com/sammyjava/pangenomics
- Owner: sammyjava
- License: mit
- Created: 2020-05-29T13:30:03.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-01-05T13:53:14.000Z (about 3 years ago)
- Last Synced: 2024-11-11T09:50:13.467Z (2 months ago)
- Language: Java
- Size: 5.52 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
This directory contains classes for working with pan-genomic graphs and frequented regions, based on the paper
```
Cleary, et al., "Exploring Frequented Regions in Pan-Genomic Graphs", IEEE/ACM Trans Comput Biol Bioinform. 2018 Aug 9. PMID:30106690 DOI:10.1109/TCBB.2018.2864564
```
This work was funded in part by the National Center for Genome Resources, Santa Fe, NM.## Building
The project is set up with dependencies managed with the [Gradle build tool](https://gradle.org/). To build the distribution, simply run
```
$ ./gradlew installDist
```
This will create a distribution under `build/install` that is used by the various run scripts.### org.ncgr.pangenomics
This contains two packages with similarly-named classes:**org.ncgr.pangenomics.allele** which contains classes for working with allele-based sequence graphs
**org.ncgr.pangenomics.genotype** which contains classes for working with genotype graphsBasic graph-related classes, not particularly specific to frequented regions:
`PangenomicGraph` extends org.jgrapht.graph.DirectedAcyclicGraph and stores a graph with methods for reading it in from files and various output methods.
There is a `main` class for creating a graph from input data such as a GFA or VCF file.`Node` encapsulates a node in a Graph: its ID (a long) and, for sequence graphs, its sequence.
`NodeSet` encapsulates a set of nodes in a Graph. NodeSet implements Comparable. There is a method `merge()` for merging two NodeSets.
(These are called "node clusters" in the paper above, but since I've implemented it as an extension of TreeSet, I've used "Set").`Path` encapsulates a path through a Graph, along with its full sequence in the case of sequence graphs.
### org.ncgr.pangenomics.[allele/genotype].fr
Frequented regions-related code.`FrequentedRegion` stores a FrequentedRegion, containing a NodeSet along with the supporting subpaths of the full set of Paths in a Graph, and lots of methods.
`FRFinder` contains a `main()` method for finding FRs based on a bunch of parameters.
`FRPair` is a utility class that contains two FRs and the result of merging them, and is used in the search loop in `FRFinder.findFRs()`.
### org.ncgr.svm
LIBSVM-based Support Vector Machine classes.### org.ncgr.weka
Weka-based supervised classifier classes.