{"id":19154500,"url":"https://github.com/bcgsc/nthash","last_synced_at":"2025-04-15T18:17:43.413Z","repository":{"id":32118217,"uuid":"35690674","full_name":"bcgsc/ntHash","owner":"bcgsc","description":"Fast hash function for DNA/RNA sequences","archived":false,"fork":false,"pushed_at":"2024-04-15T17:20:07.000Z","size":12946,"stargazers_count":100,"open_issues_count":4,"forks_count":13,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-15T18:17:23.301Z","etag":null,"topics":["bioinformatics","bloom-filter","genomics","hash","hash-algorithm","hash-methods","k-mer-hashing"],"latest_commit_sha":null,"homepage":"http://bcgsc.github.io/ntHash/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bcgsc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-15T18:44:10.000Z","updated_at":"2025-04-13T16:01:27.000Z","dependencies_parsed_at":"2024-11-09T08:38:34.215Z","dependency_job_id":null,"html_url":"https://github.com/bcgsc/ntHash","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FntHash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FntHash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FntHash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FntHash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bcgsc","download_url":"https://codeload.github.com/bcgsc/ntHash/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249126000,"owners_count":21216705,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","bloom-filter","genomics","hash","hash-algorithm","hash-methods","k-mer-hashing"],"created_at":"2024-11-09T08:27:09.591Z","updated_at":"2025-04-15T18:17:43.381Z","avatar_url":"https://github.com/bcgsc.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Release](https://img.shields.io/github/release/bcgsc/ntHash.svg)](https://github.com/bcgsc/ntHash/releases)\n[![Downloads](https://img.shields.io/github/downloads/bcgsc/ntHash/total?logo=github)](https://github.com/bcgsc/ntHash/archive/master.zip)\n[![Issues](https://img.shields.io/github/issues/bcgsc/ntHash.svg)](https://github.com/bcgsc/ntHash/issues)\n\n![Logo](nthash-logo.png)\n\nntHash is an efficient rolling hash function for k-mers and spaced seeds.\n\n# Installation\n\nMake sure [Meson](https://mesonbuild.com/) is installed on the system.\n\nDownload the repo (either from the releases section or close using `git clone https://github.com/bcgsc/ntHash`). Setup meson in an arbitrary directory (e.g. `build`), by running the following command in the project's root (include `--prefix=PREFIX` set the installation prefix to `PREFIX`):\n\n```shell\nmeson setup --buildtype=release --prefix=PREFIX build\n```\n\nThen, install the project and its dependencies using:\n\n```shell\nmeson install -C build \n```\n\nThis will install `include/nthash` and `lib/libnthash.a` to the installation prefix.\n\n# Usage\n\nTo use ntHash in a C++ project:\n- Import ntHash in the code using `#include \u003cnthash/nthash.hpp\u003e`\n- Access ntHash classes from the `nthash` namespace\n- Add the `include` directory (pass `-IPREFIX/include` to the compiler)\n- Link the code with `libnthash.a` (i.e. pass `-LPREFIX/lib -lnthash` to the compiler, where `PREFIX` is the installation prefix)\n- Compile your code with `-std=c++17` (and preferably `-O3`) enabled\n\nRefer to [docs](https://bcgsc.github.io/ntHash/) for more information.\n\n# Examples\n\nGenerally, the `nthash::NtHash` and `nthash::SeedNtHash` classes are used for hashing sequences:\n\n```C++\nnthash::NtHash nth(\"TGACTGATCGAGTCGTACTAG\", 1, 5);  // 1 hash per 5-mer\nwhile (nth.roll()) {\n    // use nth.hashes() for canonical hashes\n    //     nth.get_forward_hash() for forward strand hashes\n    //     nth.get_reverse_hash() for reverse strand hashes\n}\n```\n\n```C++\nstd::vector\u003cstd::string\u003e seeds = {\"10101\", \"11011\"};\nnthash::SeedNtHash nth(\"TGACTGATCGAGTCGTACTAG\", seeds, 3, 5);\nwhile (nth.roll()) {\n    // nth.hashes()[0] = \"T#A#T\"'s first hash\n    // nth.hashes()[1] = \"T#A#T\"'s second hash\n    // nth.hashes()[2] = \"T#A#T\"'s third hash\n    // nth.hashes()[3] = \"TG#CT\"'s first hash\n}\n```\n\n# For developers\n\nIf you would like to contribute to the development of ntHash, after forking/cloning the repo, create the `build` directory without the release flag:\n\n```\nmeson setup build\n```\n\nCompile the code, tests, and benchmarking script using:\n\n```\nmeson compile -C build\n```\n\nIf compilation is successful, `libnthash.a` will be available in the `build` folder. The benchmarking script is also compiled as the `bench` binary file in `build`.\n\nBefore sending a PR, make sure that:\n\n- tests pass by running `meson test` in the project directory\n- code is formatted properly by running `ninja clang-format` in the `build` folder (requires `clang-format` to be available)\n- coding standards have been met by making sure running `ninja clang-tidy-check` in `build` returns no errors (requires `clang-tools` to be installed)\n- documentation is up-to-date by running `ninja docs` in `build` (requires [doxygen](https://www.doxygen.nl/))\n\n# Publications\n\nParham Kazemi, Johnathan Wong, Vladimir Nikolić, Hamid Mohamadi, René L Warren, Inanç Birol, ntHash2: recursive spaced seed hashing for nucleotide sequences, Bioinformatics, 2022;, btac564, [https://doi.org/10.1093/bioinformatics/btac564](https://doi.org/10.1093/bioinformatics/btac564)\n\nHamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol.\nntHash: recursive nucleotide hashing.\n*Bioinformatics* (2016) 32 (22): 3492-3494.\n[doi:10.1093/bioinformatics/btw397](http://dx.doi.org/10.1093/bioinformatics/btw397)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fnthash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbcgsc%2Fnthash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fnthash/lists"}