{"id":36548657,"url":"https://github.com/will-rowe/hulk","last_synced_at":"2026-01-12T06:19:21.122Z","repository":{"id":57554833,"uuid":"143890875","full_name":"will-rowe/hulk","owner":"will-rowe","description":"Histosketching Using Little Kmers","archived":false,"fork":false,"pushed_at":"2023-05-25T08:44:54.000Z","size":9719,"stargazers_count":55,"open_issues_count":7,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-06-20T13:30:57.534Z","etag":null,"topics":["genomics","hashing","microbiome","sketching"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/will-rowe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-08-07T15:10:51.000Z","updated_at":"2024-04-14T21:46:23.000Z","dependencies_parsed_at":"2022-09-26T18:51:28.835Z","dependency_job_id":"a07a7148-3713-40f0-9ec2-52ac2ef10f11","html_url":"https://github.com/will-rowe/hulk","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/will-rowe/hulk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/will-rowe%2Fhulk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/will-rowe%2Fhulk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/will-rowe%2Fhulk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/will-rowe%2Fhulk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/will-rowe","download_url":"https://codeload.github.com/will-rowe/hulk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/will-rowe%2Fhulk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28336311,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T06:09:07.588Z","status":"ssl_error","status_checked_at":"2026-01-12T06:05:18.301Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genomics","hashing","microbiome","sketching"],"created_at":"2026-01-12T06:19:20.507Z","updated_at":"2026-01-12T06:19:21.114Z","avatar_url":"https://github.com/will-rowe.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"paper/img/misc/hulk-logo-with-text.png?raw=true?\" alt=\"hulk-logo\" width=\"250\"\u003e\n    \u003ch3\u003e\u003ca style=\"color:#9900FF\"\u003eH\u003c/a\u003eistosketching \u003ca style=\"color:#9900FF\"\u003eU\u003c/a\u003esing \u003ca style=\"color:#9900FF\"\u003eL\u003c/a\u003eittle \u003ca style=\"color:#9900FF\"\u003eK\u003c/a\u003emers\u003c/h3\u003e\n    \u003chr/\u003e\n    \u003ca href=\"https://travis-ci.org/will-rowe/hulk\"\u003e\u003cimg src=\"https://travis-ci.org/will-rowe/hulk.svg?branch=master\" alt=\"travis\"\u003e\u003c/a\u003e\n    \u003ca href='http://hulk.readthedocs.io/en/latest/?badge=latest'\u003e\u003cimg src='https://readthedocs.org/projects/hulk/badge/?version=latest' alt='Documentation Status' /\u003e\u003c/a\u003e\n    \u003ca href=\"https://goreportcard.com/report/github.com/will-rowe/hulk\"\u003e\u003cimg src=\"https://goreportcard.com/badge/github.com/will-rowe/hulk\" alt=\"reportcard\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://zenodo.org/badge/latestdoi/143890875\"\u003e\u003cimg src=\"https://zenodo.org/badge/143890875.svg\" alt=\"DOI\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/will-rowe/hulk/blob/master/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-orange.svg\" alt=\"License\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://bioconda.github.io/recipes/hulk/README.html\"\u003e\u003cimg src=\"https://anaconda.org/bioconda/hulk/badges/downloads.svg\" alt=\"bioconda\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://mybinder.org/v2/gh/will-rowe/hulk/master?filepath=paper%2Fanalysis-notebooks\"\u003e\u003cimg src=\"https://mybinder.org/badge_logo.svg\" alt=\"Binder\"\u003e\u003c/a\u003e\n    \u003chr/\u003e\n\u003c/div\u003e\n\n\u003e UPDATE: JULY 2019\n\n\u003e I no longer work for STFC. All versions of HULK pre 1.0.0 have been renamed and archived to the [STFC github](https://github.com/stfc/histogramSketcher). The STFC Hartree Centre are building genomic solutions based on these and other tools - if you are interested, please [contact them](hartree@stfc.ac.uk).\n\n\u003e This repo now hosts HULK \u003e= version 1.0.0, which is a complete re-implementation of HULK and based solely off the method described in the [open-access paper](https://doi.org/10.1186/s40168-019-0653-2).\n\n\u003e I've tried to keep much of the syntax and existing functionality, but make sure to check the change log below. It's a work in progress but the master branch should be a close drop-in replacement for the old HULK (for sketching at least). There are a few algorithmic differences, mainly that HULK now uses **minimizers frequencies** for representing the underling microbiome sample.\n\n\n\u003e Importantly, this project is now **fully open source**!\n\n## Overview\n\n`HULK` is a tool that creates small, fixed-size sketches from streaming microbiome sequencing data, enabling **rapid metagenomic dissimilarity analysis**. `HULK` approximates a [k-mer spectrum](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0875-7) from a FASTQ data stream, incrementally sketches it and makes similarity search queries against other microbiome sketches.\n\n`HULK` works by collecting **minimizers** from sequences. Minimizers are assigned to a finite number of histogram bins using a [consistent jump hash](https://arxiv.org/abs/1406.2294); these bins are incremented as their corresponding minimizers are found. At set intervals (i.e. after X sequences have been processed), the bins are histosketched by `HULK`. Similarly to [MinHash sketches](https://en.wikipedia.org/wiki/MinHash), histosketches can be used to estimate similarity between sequence data sets.\n\nThe advantages of `HULK` include:\n\n* it's fast and can run on a laptop\n* **hulk sketches** are compact, fixed size and incorporate k-mer frequency information\n* it works on data streams and does not require complete data instances\n* it can use [concept drift](https://en.wikipedia.org/wiki/Concept_drift) for histosketching\n* you get to type `hulk smash` into the command line...\n\nFinally, you can use **hulk sketches** to with a Machine Learning classifier to predict microbiome sample origin (see [the paper](https://doi.org/10.1186/s40168-019-0653-2) and [BANNER](https://github.com/will-rowe/banner)).\n\n## Change log\n\n### version 1.0.1 (dev branch)\n\n* WASM interface\n  * run HULK locally and from a browser\n  * based on my [baby-GROOT](https://github.com/will-rowe/baby-groot) user interface\n* HULK will output additional sketches\n  * KMV MinHash\n  * HyperMinHash\n* Indexing\n  * re-implementation of the LSH Forest index\n\n### version 1.0.0 (current release)\n\n* fully re-written codebase\n  * I've aimed for it to be largely backwards compatible with previous releases\n* fully open-sourced!\n  * MIT license ([OSI approved](https://opensource.org/licenses))\n* algorithm changes\n  * underlying histogram is now based on minimizer frequencies\n  * count-min sketch for k-mer frequencies is now replaced with a fixed-size array and a jump-hash for minimizer placement\n* changes to the `sketch` subcommand:\n  * sketches saved to JSON by default (ala [sourmash](https://github.com/dib-lab/sourmash))\n  * histosketch count-min sketch is no longer configurable by the user (this was Epsilon and Delta)\n  * spectrum size is determined based on k-mer size\n  * minCount for k-mer frequencies is removed\n* changes to the `smash` subcommand:\n  * operates on JSON input\n  * outputs matrix as csv\n* replaced some unecessary features\n  * the functionality of the `print` and `distance` subcommands is available in the `smash` subcommand\n\n### pre version 1.0.0\n\n* all versions of HULK (and BANNER) pre v1.0.0 have been moved to the [UKRI github](https://github.com/stfc/histogramSketcher) and renamed. I can no longer work on these code bases.\n\n## Installation\n\nCheck out the [releases](https://github.com/will-rowe/hulk/releases) to download a binary. Alternatively, install using Bioconda or compile the software from source.\n\n### Bioconda\n\nFor versions \u003c1.0.0, use bioconda. I will add the recipe for HULK 1.0.0 asap.\n\n```bash\nconda install -c bioconda hulk\n```\n\n### Source\n\n`HULK` is written in Go (v1.12) - to compile from source you will first need the [Go tool chain](https://golang.org/doc/install). Once you have it, try something like this to compile:\n\n```bash\n# Clone this repository\ngit clone https://github.com/will-rowe/hulk.git\n\n# Go into the repository and get the package dependencies\ncd hulk\ngo get -d -t -v ./...\n\n# Run the unit tests\ngo test -v ./...\n\n# Compile the program\ngo build ./\n\n# Call the program\n./hulk --help\n```\n\n## Quick Start\n\n`HULK` is called by typing **hulk**, followed by the subcommand you wish to run. There main subcommands are **sketch** and **smash**:\n\n```bash\n# Create a hulk sketch\ngunzip -c microbiome.fq.gz | hulk sketch -o sketches/sampleA\n\n#  Get a pairwise weighted Jaccard similarity matrix for a set of hulk histosketches\nhulk smash -k 31 -m weightedjaccard -d ./sketches -o myOutfile\n```\n\n## Further Information \u0026 Citing\n\nI'm working on some new documentation and this will be available on [readthedocs](http://hulk.readthedocs.io/en/latest/?badge=latest) soon.\n\nA paper describing the `HULK` method is published in Microbiome:\n\n\u003e[Rowe WPM et al. Streaming histogram sketching for rapid microbiome analytics. Microbiome. 2019.](https://doi.org/10.1186/s40168-019-0653-2)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwill-rowe%2Fhulk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwill-rowe%2Fhulk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwill-rowe%2Fhulk/lists"}