https://github.com/leonardogemin/duohash
DuoHash is an advanced tool for the efficient calculation of forward and reverse hashes of spaced k-mers in nucleotide sequences, improving the analysis of genomic data by reducing processing time and computational resources.
https://github.com/leonardogemin/duohash
algorithm-optimization bioinformatics computational-biology dna-sequencing genomics hashing sequence-analysis spaced-kmer
Last synced: 2 months ago
JSON representation
DuoHash is an advanced tool for the efficient calculation of forward and reverse hashes of spaced k-mers in nucleotide sequences, improving the analysis of genomic data by reducing processing time and computational resources.
- Host: GitHub
- URL: https://github.com/leonardogemin/duohash
- Owner: leonardoGemin
- License: mit
- Created: 2024-06-19T13:59:29.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T06:42:57.000Z (10 months ago)
- Last Synced: 2024-08-06T08:36:30.285Z (10 months ago)
- Topics: algorithm-optimization, bioinformatics, computational-biology, dna-sequencing, genomics, hashing, sequence-analysis, spaced-kmer
- Language: Roff
- Homepage:
- Size: 702 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe.md
- License: LICENSE
Awesome Lists containing this project
README
# DuoHash: Improving Spaced k-mer Extraction and Hash Encoding for Bioinformatics Applications
## Methods
The **DuoHash** library provides two classes: **DuoHash** and **DuoHash_multi** for handling one or multiple spaced seeds, respectively. The methods of the first class are
- `GetEncoding_naive()`,
- `GetEncoding_FSH()`,
- and `GetEncoding_ISSH()`.The methods of the second class are
- `GetEncoding_naive()`,
- `GetEncoding_FSH()`,
- `GetEncoding_ISSH()`,
- `GetEncoding_FSH_multi()`,
- `GetEncoding_MISSH_v1()`,
- `GetEncoding_MISSH_col()`,
- `GetEncoding_MISSH_col_parallel()`,
- and `GetEncoding_MISSH_row()`.Both classes share the `PrintFASTA()` method for saving the resulting spaced k-mers to a file and other methods for handling the various parameters.
Each `GetEncoding_<...>()` method has four implementations. The first is for the extraction of spaced k-mer and their encoding only, the second allows post-processing of encodings to calculate forward and reverse hashing, the third allows post-processing of encodings for conversion into strings, and the fourth combines the two previous options.
## Installation
Make sure CMake is installed on the system.Download the repository using
```shell
$ git clone https://github.com/leonardoGemin/DuoHash.git
```
and build the library with
```shell
$ make build
```This will install `build/libDuoHash.a` in the project's directory.
## Usage
To use **DuoHash** in a C++ project:
- Import DuoHash in the code using `#include `
- Add the `include` directory (pass `-I./include` to the compiler)
- Link the code with `libDuoHash.a` (pass `-L./build -lDuoHash` to the compiler)
- Compile your code with `g++-13`, `-std=c++0x` (and preferably `-O3`), and `-fopenmp` enabled## Example
Compile `example/main.cpp` file with
```shell
$ cd example
$ g++-13 -std=c++0x -O3 -fopenmp -I../include -L../build -lDuoHash -o main main.cpp
```## Thesis
Link to my Master Thesis: [Gemin_Leonardo.pdf](https://thesis.unipd.it/retrieve/99e7ee7c-1348-45f6-8a02-467bae6b0dbc/Gemin_Leonardo.pdf)