https://github.com/will-rowe/nthash
Go implementation of ntHash
https://github.com/will-rowe/nthash
genomics hashing-algorithm
Last synced: 27 days ago
JSON representation
Go implementation of ntHash
- Host: GitHub
- URL: https://github.com/will-rowe/nthash
- Owner: will-rowe
- License: mit
- Created: 2018-09-18T20:04:35.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-09-16T18:43:03.000Z (over 4 years ago)
- Last Synced: 2024-12-07T19:19:18.326Z (about 1 year ago)
- Topics: genomics, hashing-algorithm
- Language: Go
- Size: 20.5 KB
- Stars: 20
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
## Overview
This is a Go implementation of the [ntHash](https://github.com/bcgsc/ntHash) recursive hash function for hashing all possible k-mers in a DNA/RNA sequence.
For more information, read the ntHash [paper](http://dx.doi.org/10.1093/bioinformatics/btw397) by Mohamadi et al. or check out their C++ [implementation](https://github.com/bcgsc/ntHash).
This implementation was inspired by [Luiz Irber](https://luizirber.org/) and his recent [blog post](https://blog.luizirber.org/2018/09/13/nthash/) on his cool [Rust ntHash implementation](https://github.com/luizirber/nthash).
I have coded this up in Go so that ntHash can be used in my [HULK](https://github.com/will-rowe/hulk) and [GROOT](https://github.com/will-rowe/groot) projects but feel free to use it for yourselves.
## Installation
```go
go get github.com/will-rowe/nthash
```
## Example usage
### range over ntHash values for a sequence
```go
package main
import (
"log"
"github.com/will-rowe/nthash"
)
var (
sequence = []byte("ACGTCGTCAGTCGATGCAGTACGTCGTCAGTCGATGCAGT")
kmerSize = 11
)
func main() {
// create the ntHash iterator using a pointer to the sequence and a k-mer size
hasher, err := ntHash.New(&sequence, kmerSize)
// check for errors (e.g. bad k-mer size choice)
if err != nil {
log.Fatal(err)
}
// collect the hashes by ranging over the hash channel produced by the Hash method
canonical := true
for hash := range hasher.Hash(canonical) {
log.Println(hash)
}
}
```