{"id":27918322,"url":"https://github.com/axiomhq/hyperminhash","last_synced_at":"2025-05-06T18:21:13.857Z","repository":{"id":66118273,"uuid":"111132367","full_name":"axiomhq/hyperminhash","owner":"axiomhq","description":"HyperMinHash: Bringing intersections to HyperLogLog","archived":false,"fork":false,"pushed_at":"2018-03-09T23:51:48.000Z","size":18,"stargazers_count":304,"open_issues_count":1,"forks_count":18,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-05-04T11:50:15.224Z","etag":null,"topics":["estimation","hyperloglog"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/axiomhq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-11-17T17:33:54.000Z","updated_at":"2025-03-30T03:53:33.000Z","dependencies_parsed_at":"2023-02-20T17:30:55.862Z","dependency_job_id":null,"html_url":"https://github.com/axiomhq/hyperminhash","commit_stats":null,"previous_names":["seiflotfy/hyperminhash"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperminhash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperminhash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperminhash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperminhash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/axiomhq","download_url":"https://codeload.github.com/axiomhq/hyperminhash/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252741731,"owners_count":21797074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["estimation","hyperloglog"],"created_at":"2025-05-06T18:21:13.232Z","updated_at":"2025-05-06T18:21:13.848Z","avatar_url":"https://github.com/axiomhq.png","language":"Go","readme":"# HyperMinSketch\n\nBesides being a compact and pretty speedy HyperLogLog implementation for cardinality counting, this modified HyperLogLog allows **intersection** and **similarity** estimation of different HyperLogLogs.\n\n## Details\nA simple implementation of HyperLogLog (LogLog-Beta to be specific):\n* 16 bit registers instead of 6 bit, the new 10 bit are for b-bit signatures\n* Similarity function estimates Jaccard indices (a number between 0-1) of 0.01 for set cardinalities on the order of 1e9 with accuracy around 5%\n* Intersection applies the Jaccard index on the union of the sets to return the intersecting set cardinality\n\n*The work is based on [\"HyperMinHash: Jaccard index sketching in LogLog space - Yun William Yu, Griffin M. Weber\"](https://arxiv.org/pdf/1710.08436.pdf)*\n\n## Example Usage\n```go\nsk1 := hyperminhash.New()\nsk2 := hyperminhash.New()\n\nfor i := 0; i \u003c 10000; i++ {\n    sk1.Add([]byte(strconv.Itoa(i)))\n}\n\nsk1.Cardinality() // 10001 (should be 10000)\n\nfor i := 3333; i \u003c 23333; i++ {\n    sk2.Add([]byte(strconv.Itoa(i)))\n}\n\nsk2.Cardinality()     // 19977 (should be 20000)\nsk1.Similarity(sk2)   // 0.284589082 (should be 0.2857326533)\nsk1.Intersection(sk2) // 6623 (should be 6667)\n\nsk1.Merge(sk2)\nsk1.Cardinality() // 23271 (should be 23333)\n```\n\n## Results\n\n### Max Cardinality 1e3\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 350 | 352 | 752 | 752 | 831 | 832 | **271** (32.611312%) | **274** (32.932692%) |\n| 746 | 748 | 591 | 590 | 834 | 835 | **503** (60.311751%) | **501** (60.000000%) |\n| 248 | 248 | 789 | 791 | 897 | 899 | **140** (15.607581%) | **144** (16.017798%) |\n| 9 | 9 | 818 | 818 | 824 | 825 | **3** (0.364078%) | **3** (0.363636%) |\n| 408 | 411 | 412 | 408 | 771 | 771 | **49** (6.355383%) | **47** (6.095979%) |\n\n\n### Max Cardinality 1e4\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 2126 | 2138 | 1162 | 1158 | 3063 | 3060 | **225** (7.345739%) | **223** (7.287582%) |\n| 7767 | 7706 | 7054 | 7064 | 8889 | 8887 | **5932** (66.734166%) | **5888** (66.254079%) |\n| 842 | 844 | 5183 | 5135 | 5880 | 5842 | **145** (2.465986%) | **135** (2.310852%) |\n| 6833 | 6791 | 664 | 666 | 7410 | 7345 | **87** (1.174089%) | **89** (1.211709%) |\n| 1814 | 1820 | 6214 | 6169 | 7697 | 7639 | **331** (4.300377%) | **320** (4.189030%) |\n\n\n### Max Cardinality 1e5\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 29667 | 29540 | 88700 | 88167 | 92444 | 91667 | **25923** (28.041842%) | **25036** (27.311901%) |\n| 79242 | 78731 | 30216 | 30137 | 83502 | 82953 | **25956** (31.084285%) | **25995** (31.337022%) |\n| 57830 | 57223 | 79550 | 79194 | 82112 | 81595 | **55268** (67.308067%) | **54684** (67.018812%) |\n| 64610 | 63501 | 21696 | 21729 | 75895 | 74816 | **10411** (13.717636%) | **10083** (13.477064%) |\n| 92204 | 91453 | 96417 | 95556 | 165025 | 163370 | **23596** (14.298440%) | **24130** (14.770154%) |\n\n\n### Max Cardinality 1e6\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 150443 | 149810 | 974366 | 979514 | 1088517 | 1096991 | **36292** (3.334077%) | **37417** (3.410876%) |\n| 156337 | 155347 | 19083 | 19070 | 167353 | 165433 | **8067** (4.820350%) | **8017** (4.846071%) |\n| 800969 | 802044 | 51053 | 51244 | 851388 | 853396 | **634** (0.074467%) | **511** (0.059878%) |\n| 176155 | 174707 | 520111 | 516822 | 570092 | 569289 | **126174** (22.132217%) | **123766** (21.740452%) |\n| 485954 | 481362 | 967341 | 972651 | 1081990 | 1091296 | **371305** (34.316861%) | **376007** (34.455088%) |\n\n\n### Max Cardinality 1e7\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 7132942 | 7150720 | 122116 | 121539 | 7243153 | 7261709 | **11905** (0.164362%) | **12550** (0.172824%) |\n| 8646240 | 8649049 | 1277784 | 1295017 | 9821480 | 9854242 | **102544** (1.044079%) | **99163** (1.006298%) |\n| 4192390 | 4164637 | 2788913 | 2779975 | 4526476 | 4499897 | **2454827** (54.232630%) | **2454356** (54.542493%) |\n| 9803344 | 9826412 | 1705700 | 1715798 | 10255010 | 10262719 | **1254034** (12.228501%) | **1273821** (12.412120%) |\n| 1308849 | 1322604 | 9940327 | 9971519 | 11179030 | 11201850 | **70146** (0.627478%) | **80717** (0.720568%) |\n\n\n### Max Cardinality 1e8\n\n| Set1 | HLL1 | Set2 | HLL2 | S1 ∪ S2 | HLL1 ∪ HLL2 | S1 ∩ S2 | HLL1 ∩ HLL2 |\n|---|---|---|---|---|---|---|---|\n| 13237748 | 13298469 | 57073758 | 57124720 | 59474437 | 59394847 | **10837069** (18.221390%) | **11143669** (18.762013%) |\n| 90757994 | 88576114 | 5717797 | 5701796 | 95061178 | 93016636 | **1414613** (1.488108%) | **1350058** (1.451416%) |\n| 60150663 | 60033013 | 79238333 | 77672994 | 110438475 | 108311818 | **28950521** (26.214162%) | **27666946** (25.543792%) |\n| 30187492 | 30718889 | 37756209 | 37153655 | 67443566 | 66938074 | **500135** (0.741561%) | **447406** (0.668388%) |\n| 53196095 | 53461989 | 48344583 | 47535284 | 93284291 | 91321031 | **8256387** (8.850780%) | **8036467** (8.800237%) |","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxiomhq%2Fhyperminhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faxiomhq%2Fhyperminhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxiomhq%2Fhyperminhash/lists"}