{"id":13639352,"url":"https://github.com/shenwei356/countminsketch","last_synced_at":"2026-01-29T13:15:29.327Z","repository":{"id":21531539,"uuid":"24850821","full_name":"shenwei356/countminsketch","owner":"shenwei356","description":"An implementation of Count-Min Sketch in Golang","archived":false,"fork":false,"pushed_at":"2016-05-19T11:05:57.000Z","size":10,"stargazers_count":32,"open_issues_count":1,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-03T01:14:30.066Z","etag":null,"topics":["bioinformatics","count-min-sketch"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shenwei356.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-10-06T15:08:29.000Z","updated_at":"2024-07-11T08:26:00.000Z","dependencies_parsed_at":"2022-08-21T16:41:22.951Z","dependency_job_id":null,"html_url":"https://github.com/shenwei356/countminsketch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Fcountminsketch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Fcountminsketch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Fcountminsketch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Fcountminsketch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shenwei356","download_url":"https://codeload.github.com/shenwei356/countminsketch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223810375,"owners_count":17206753,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","count-min-sketch"],"created_at":"2024-08-02T01:00:59.810Z","updated_at":"2026-01-29T13:15:29.266Z","avatar_url":"https://github.com/shenwei356.png","language":"Go","funding_links":[],"categories":["Data Manipulation and Querying"],"sub_categories":[],"readme":"countminsketch\n========\n\nAn implementation of Count-Min Sketch in Golang.\n\nIntroduction of Count-Min Sketch, from Wikipedia[1]\n\n\u003e    The Count–min sketch (or CM sketch) is a probabilistic sub-linear space\n\u003e    streaming algorithm which can be used to summarize a data stream in many\n\u003e    different ways. The algorithm was invented in 2003 by Graham Cormode and\n\u003e    S. Muthu Muthukrishnan.\n\u003e    \n\u003e    Count–min sketches are somewhat similar to Bloom filters; the main\n\u003e    distinction is that Bloom filters represent sets, while CM sketches\n\u003e    represent multisets and frequency tables. Spectral Bloom filters with\n\u003e    multi-set policy, are conceptually isomorphic to the Count-Min Sketch.\n\nThe code is deeply inspired by an implementation of Bloom filters in golang,\n[bloom](https://github.com/willf/bloom).\n\nSame to bloom, the hashing function used is FNV, provided by Go package\n(hash/fnv). For a item, the 64-bit FNV hash is computed, and upper and lower\n32 bit numbers, call them _h1_ and _h2_, are used. Then, the _i_ th hashing\nfunction is:\n\n    h1 + h2*i\n\nSketch Accuracy\n-------------\n\nAccuracy guarantees will be made in terms of a pair of user specified parameters,\nε and δ, meaning that the error in answering a query is within a factor of ε with\n probability δ[2]\n\nFor a sketch of size _w_ × _d_ with total count _N_ , it follows that any\nestimate has error at most _2N/w_, with probability at least 1 - (1/2)^_d_.\nSo setting the parameters _w_ and _d_ large enough allows us to achieve\nvery high accuracy while using relatively little space[3].\n\nSuppose we want an error of at most 0.1% (of the sum of all frequencies),\nwith 99.9% certainty. Then we want 2/_w_ = 1/1000, we set _w_ = 2000,\nand = 0.001, i.e. _d_ = log 0.001 / log 0.5 ≤ 10. Using uint counters,\nthe space required by the array of counters is _w_ × _d_ × 4 = 80KB in 32 bit\nOS, and _w_ × _d_ × 8 = 160KB in 64 bit OS [3].\n\nTo create with given error rate and confidence, we could use constructor NewWithEstimates.\n\nParallelization\n-----------\n\nThe parallelizing part of Count-Min Sketch is the hashing step. But in this implementation,\nonly one basic hashing step is computed. So the parallelization is not necessary.\n\nIf you have to, try to split the data and count separately. And at last `Merge` them.\n\nInstall\n-------\n\nThis package is \"go-gettable\", just:\n\n    go get github.com/shenwei356/countminsketch\n\nUsage\n-------------\n```go\nimport \"github.com/shenwei356/countminsketch\"\n\nfunc main() {\n\tvar epsilon, delta float64\n\tepsilon, delta = 0.1, 0.9\n\ts := countminsketch.NewWithEstimates(epsilon, delta)\n\tfmt.Printf(\"ε: %f, δ: %f -\u003e d: %d, w: %d\\n\", epsilon, delta, s.D(), s.W())\n\n\tepsilon, delta = 0.0001, 0.9999\n\ts = countminsketch.NewWithEstimates(epsilon, delta)\n\tfmt.Printf(\"ε: %f, δ: %f -\u003e d: %d, w: %d\\n\", epsilon, delta, s.D(), s.W())\n\n\tkey := \"abc\"\n\ts.UpdateString(key, 1)\n\tfmt.Printf(\"%s:%d\\n\\n\", key, s.EstimateString(key))\n\n\t//////////////////////////////////////////////////\n\tfile := \"data\"\n\ts.UpdateString(key, 2)\n\t_, err := s.WriteToFile(file)\n\tdefer func() {\n\t\terr := os.Remove(file)\n\t\tcheckerr(err)\n\t}()\n\n\tcm, err := countminsketch.NewFromFile(file)\n\tcheckerr(err)\n\n\tfmt.Printf(\"%s:%d\\n\", key, cm.EstimateString(key))\n\n\t//////////////////////////////////////////////////\n\ts = countminsketch.NewWithEstimates(0.1, 0.9)\n\ts.UpdateString(key, 10)\n\tbytes, err := s.MarshalJSON()\n\tcheckerr(err)\n\tfmt.Println(string(bytes))\n\n\terr = s.UnmarshalJSON(bytes)\n\tcheckerr(err)\n\ts.UpdateString(key, 10)\n\n\tfmt.Printf(\"%s:%d\\n\", key, s.EstimateString(key))\n}\n```\n\nOutput\n\n    ε: 0.100000, δ: 0.900000 -\u003e d: 4, w: 20\n    ε: 0.000100, δ: 0.999900 -\u003e d: 14, w: 20000\n    abc:1\n\n    abc:3\n    {\"d\":4,\"w\":20,\"count\":[[0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0],[0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10]]}\n    abc:20\n\nBenchmark\n--------\n\n    Benchmark_Update_ε0_001_δ0_999           5000000               515 ns/op\n    Benchmark_Estimates_ε0_001_δ0_999        5000000               481 ns/op\n    Benchmark_Update_ε0_000001_δ0_9999       2000000               941 ns/op\n    Benchmark_Estimates_ε0_000001_δ0_9999    2000000               841 ns/op\n\n\nDocumentation\n-------------\n\n[![GoDoc](https://godoc.org/github.com/shenwei356/countminsketch?status.svg)](https://godoc.org/github.com/shenwei356/countminsketch)\n\nReference\n-------------\n1. [Wikipedia](http://en.wikipedia.org/wiki/Count%E2%80%93min_sketch)\n2. [An Improved Data Stream Summary: The Count-Min Sketch and its Applications](http://www.cse.unsw.edu.au/~cs9314/07s1/lectures/Lin_CS9314_References/cm-latin.pdf)\n3. [Approximating Data with the Count-Min Data Structure](http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf)\n4. [https://github.com/jehiah/countmin](https://github.com/jehiah/countmin)\n5. [https://github.com/mtchavez/countmin](https://github.com/mtchavez/countmin)\n\nCopyright\n--------\n\nCopyright (c) 2014-2016, Wei Shen (shenwei356@gmail.com)\n\n[MIT License](https://github.com/shenwei356/countminsketch/blob/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenwei356%2Fcountminsketch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshenwei356%2Fcountminsketch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenwei356%2Fcountminsketch/lists"}