{"id":21504408,"url":"https://github.com/keilerkonzept/topk","last_synced_at":"2025-04-23T20:15:30.931Z","repository":{"id":255438939,"uuid":"849512520","full_name":"keilerkonzept/topk","owner":"keilerkonzept","description":"Sliding-window and regular top-K sketches, based on HeavyKeeper","archived":false,"fork":false,"pushed_at":"2025-02-24T07:44:40.000Z","size":81,"stargazers_count":10,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-23T20:15:15.292Z","etag":null,"topics":["go","golang","heavy-hitters","heavy-keeper","heavykeeper","probabilistic-data-structures","sketch","sliding-window","top-k"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keilerkonzept.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-29T18:10:41.000Z","updated_at":"2025-04-19T17:15:02.000Z","dependencies_parsed_at":"2024-09-02T04:57:19.215Z","dependency_job_id":"ec296692-e099-4f94-996e-6c9bd2da2765","html_url":"https://github.com/keilerkonzept/topk","commit_stats":null,"previous_names":["keilerkonzept/topk"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Ftopk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Ftopk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Ftopk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Ftopk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keilerkonzept","download_url":"https://codeload.github.com/keilerkonzept/topk/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250506142,"owners_count":21441723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","heavy-hitters","heavy-keeper","heavykeeper","probabilistic-data-structures","sketch","sliding-window","top-k"],"created_at":"2024-11-23T18:59:16.518Z","updated_at":"2025-04-23T20:15:30.900Z","avatar_url":"https://github.com/keilerkonzept.png","language":"Go","funding_links":[],"categories":["Science and Data Analysis","科学与数据分析"],"sub_categories":["HTTP Clients","HTTP客户端"],"readme":"# topk\n[![Coverage](https://img.shields.io/badge/Coverage-97.8%25-brightgreen)](https://github.com/keilerkonzept/topk/actions/workflows/gocover.yaml)\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/keilerkonzept/topk.svg)](https://pkg.go.dev/github.com/keilerkonzept/topk)\n[![Go Report Card](https://goreportcard.com/badge/github.com/keilerkonzept/topk)](https://goreportcard.com/report/github.com/keilerkonzept/topk)\n[![Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)  \n\nSliding-window and regular top-K sketches.\n\n- A fast implementation of the [**HeavyKeeper top-K sketch**](https://www.usenix.org/conference/atc18/presentation/gong) inspired by the [segmentio implementation](https://github.com/segmentio/topk) and [RedisBloom implementation](https://github.com/RedisBloom/RedisBloom/blob/b5916e1b9fba17829c3e329c127b99d706eb31f6/src/topk.c). [Significantly faster (~1.5x)](#comparison-with-segmentiotopk) than [segmentio/topk](https://github.com/segmentio/topk) on small sketches (k \u003c= 1000) and [much faster (10x-90x)](#comparison-with-segmentiotopk) on large sketches (k \u003e= 10000).\n- A **sliding-window top-K sketch**, also based on HeavyKeeper, as described in [\"A Sketch Framework for Approximate Data Stream Processing in Sliding Windows\"](https://yangtonghome.github.io/uploads/SlidingSketch_TKDE2022_final.pdf)\n\n```go\nimport (\n\t\"github.com/keilerkonzept/topk\" // plain sketch\n\t\"github.com/keilerkonzept/topk/sliding\" // sliding-window sketch\n)\n```\n\n[Demo application](https://github.com/keilerkonzept/sliding-topk-tui-demo): top K requesting IPs within a sliding time window from a [web server access logs dataset](https://www.kaggle.com/datasets/eliasdabbas/web-server-access-logs)\n\u003cp\u003e\n    \u003cimg src=\"https://www.keilerkonzept.com/sliding-topk-demo.gif\" width=\"100%\" alt=\"Sliding Top-K Demo Application\"\u003e\n\u003c/p\u003e\n\n## Contents\n\n- [Examples](#examples)\n    - [Top-K Sketch](#top-k-sketch)\n    - [Sliding-window Top-K Sketch](#sliding-window-top-k-sketch)\n- [Benchmarks](#benchmarks)\n    - [Top-K Sketch](#top-k-sketch)\n    - [Sliding-Window Top-K Sketch](#sliding-window-top-k-sketch)\n    - [Comparison with segmentio/topk](#comparison-with-segmentiotopk)\n\n## Examples\n\n### Top-K Sketch\n\n```go\npackage main\n\nimport (\n\t\"log\"\n\t\"github.com/keilerkonzept/topk\"\n)\n\nfunc main() {\n\t// make a new sketch keeping track of k=3 items using 1024x3 = 3072 buckets.\n\tsketch := topk.New(3, topk.WithWidth(1024), topk.WithDepth(3))\n\n\tlog.Println(\"the sketch takes up\", sketch.SizeBytes(), \"bytes in memory\")\n\n\tsketch.Incr(\"an item\")            // count \"an item\" 1 time\n\tsketch.Add(\"an item\", 123)        // count \"an item\" 123 times\n\tsketch.Add(\"another item\", 4)     // count \"another item\" 4 times\n\tsketch.Add(\"an item\", 5)          // count \"an item\" 5 more times\n\tsketch.Add(\"yet another item\", 6) // count \"yet another item\" 6 times\n\n\tif sketch.Query(\"an item\") {\n\t\t// \"an item\" is in the top K items observed within the last 60 ticks\n\t}\n\n\t_ = sketch.Count(\"another item\") // return the estimated count for \"another item\"\n\n\t// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.\n\tfor _, entry := range sketch.SortedSlice() {\n\t\tlog.Println(entry.Item, \"has been counted\", entry.Count, \"times\")\n\t}\n\n\t// Iter is an interator over the (*not* sorted) current top-K entries.\n\tfor entry := range sketch.Iter {\n\t\tlog.Println(entry.Item, \"has been counted\", entry.Count, \"times\")\n\t}\n\tsketch.Reset() // reset to New() state\n}\n```\n\n\n### Sliding-window Top-K Sketch\n\n```go\npackage main\n\nimport (\n\t\"log\"\n\t\"github.com/keilerkonzept/topk/sliding\"\n)\n\nfunc main() {\n\t// make a new sketch keeping track of k=3 items over a window of the last 60 ticks\n\t// use width=1024 x depth=3 = 3072 buckets\n\tsketch := sliding.New(3, 60, sliding.WithWidth(1024), sliding.WithDepth(3))\n\n\tlog.Println(\"the sketch takes up\", sketch.SizeBytes(), \"bytes in memory\")\n\n\tsketch.Incr(\"an item\")            // count \"an item\" 1 time\n\tsketch.Add(\"an item\", 123)        // count \"an item\" 123 times\n\tsketch.Tick()                     // advance time by one tick\n\tsketch.Add(\"another item\", 4)     // count \"another item\" 4 times\n\tsketch.Ticks(2)                   // advance time by two ticks\n\tsketch.Add(\"an item\", 5)          // count \"an item\" 5 more times\n\tsketch.Add(\"yet another item\", 6) // count \"yet another item\" 6 times\n\n\tif sketch.Query(\"an item\") {\n\t\t// \"an item\" is in the top K items observed within the last 60 ticks\n\t}\n\n\t_ = sketch.Count(\"another item\") // return the estimated count for \"another item\"\n\n\t// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.\n\tfor _, entry := range sketch.SortedSlice() {\n\t\tlog.Println(entry.Item, \"has been counted\", entry.Count, \"times\")\n\t}\n\n\t// Iter is an interator over the (*not* sorted) current top-K entries.\n\tfor entry := range sketch.Iter {\n\t\tlog.Println(entry.Item, \"has been counted\", entry.Count, \"times\")\n\t}\n\tsketch.Reset() // reset to New() state\n}\n```\n\n## Benchmarks\n\n### Top-K Sketch\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/keilerkonzept/topk\ncpu: Apple M1 Pro\n```\n\nThe `Add` benchmark performs random increments in the interval [1,10).\n\n| Operation |   K | Depth | Width |        time |  bytes |      allocs |\n|-----------|----:|------:|------:|------------:|-------:|------------:|\n| `Add`     |  10 |     3 |  1024 | 358.6 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     3 |  8192 | 375.0 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     4 |  1024 | 449.9 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     4 |  8192 | 436.0 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  1024 | 371.5 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  8192 | 387.9 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     4 |  1024 | 452.3 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     4 |  8192 | 471.4 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  1024 | 257.2 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  8192 | 232.3 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     4 |  1024 | 249.1 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     4 |  8192 | 251.2 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  1024 | 264.2 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  8192 | 227.4 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     4 |  1024 | 267.1 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     4 |  8192 | 261.3 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  1024 | 216.0 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  8192 | 215.4 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     4 |  1024 | 220.0 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     4 |  8192 | 269.3 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  1024 | 235.1 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  8192 | 277.1 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     4 |  1024 | 278.7 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     4 |  8192 | 302.2 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  1024 | 129.6 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  8192 | 98.21 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     4 |  1024 | 129.9 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     4 |  8192 | 114.3 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  1024 | 141.2 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  8192 | 140.8 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     4 |  1024 | 131.1 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     4 |  8192 | 109.8 ns/op | 0 B/op | 0 allocs/op |\n\n### Sliding-Window Top-K Sketch\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/keilerkonzept/topk/sliding\ncpu: Apple M1 Pro\n```\n\nThe `Add` benchmark performs random increments in the interval [1,10).\n\n| Operation |   K | Depth | Width | Window size | History size |        time |  bytes |      allocs |\n|-----------|----:|------:|------:|------------:|-------------:|------------:|-------:|------------:|\n| `Add`     |  10 |     3 |  1024 |         100 |           50 | 696.9 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     3 |  1024 |         100 |          100 |  1051 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     3 |  8192 |         100 |           50 | 784.9 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     |  10 |     3 |  8192 |         100 |          100 |  1146 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  1024 |         100 |           50 | 712.9 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  1024 |         100 |          100 |  1054 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  8192 |         100 |           50 | 763.3 ns/op | 0 B/op | 0 allocs/op |\n| `Add`     | 100 |     3 |  8192 |         100 |          100 |  1139 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  1024 |         100 |           50 | 434.9 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  1024 |         100 |          100 | 560.7 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  8192 |         100 |           50 | 501.1 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    |  10 |     3 |  8192 |         100 |          100 | 728.7 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  1024 |         100 |           50 | 425.6 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  1024 |         100 |          100 | 580.0 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  8192 |         100 |           50 | 497.8 ns/op | 0 B/op | 0 allocs/op |\n| `Incr`    | 100 |     3 |  8192 |         100 |          100 | 746.2 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  1024 |         100 |           50 | 228.5 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  1024 |         100 |          100 | 209.3 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  8192 |         100 |           50 | 234.5 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   |  10 |     3 |  8192 |         100 |          100 | 230.7 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  1024 |         100 |           50 | 237.5 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  1024 |         100 |          100 | 242.8 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  8192 |         100 |           50 | 246.5 ns/op | 0 B/op | 0 allocs/op |\n| `Count`   | 100 |     3 |  8192 |         100 |          100 | 243.4 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  1024 |         100 |           50 | 101.7 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  1024 |         100 |          100 | 104.8 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  8192 |         100 |           50 | 114.0 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   |  10 |     3 |  8192 |         100 |          100 | 114.5 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  1024 |         100 |           50 | 135.9 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  1024 |         100 |          100 | 118.5 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  8192 |         100 |           50 | 130.1 ns/op | 0 B/op | 0 allocs/op |\n| `Query`   | 100 |     3 |  8192 |         100 |          100 | 131.5 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    |  10 |     3 |  1024 |         100 |           50 |  4191 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    |  10 |     3 |  1024 |         100 |          100 |  7010 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    |  10 |     3 |  8192 |         100 |           50 | 28699 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    |  10 |     3 |  8192 |         100 |          100 | 90979 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    | 100 |     3 |  1024 |         100 |           50 |  6539 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    | 100 |     3 |  1024 |         100 |          100 |  9343 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    | 100 |     3 |  8192 |         100 |           50 | 31349 ns/op | 0 B/op | 0 allocs/op |\n| `Tick`    | 100 |     3 |  8192 |         100 |          100 | 87488 ns/op | 0 B/op | 0 allocs/op |\n\n### Comparison with [segmentio/topk](https://github.com/segmentio/topk)\n\nUsing [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat):\n```sh\n$ go test -run='^$' -bench=BenchmarkSketchAddForComparison -count=10 | tee new.txt\n$ go test -run='^$' -bench=BenchmarkSegmentioTopkSample -count=10 | tee old.txt\n$ benchstat -row /K,/Depth,/Width,/Decay -col .name old.txt new.txt\n```\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/keilerkonzept/topk\ncpu: Apple M1 Pro\n```\n\n| K      | Depth | Width   | Decay | `segmentio/topk` (sec/op) | this package (sec/op) | diff                   |\n|--------|-------|---------|-------|---------------------------|-----------------------|------------------------|\n| 10     | 3     | 256     | 0.6   | 641.0n ± 1%               | 373.5n ±  3%          | **-41.73%** (p=0.000 n=10) |\n| 10     | 3     | 256     | 0.8   | 602.6n ± 1%               | 387.3n ±  2%          | **-35.73%** (p=0.000 n=10) |\n| 10     | 3     | 256     | 0.9   | 550.4n ± 4%               | 431.3n ±  2%          | **-21.63%** (p=0.000 n=10) |\n| 100    | 4     | 460     | 0.6   | 763.8n ± 2%               | 427.0n ±  1%          | **-44.09%** (p=0.000 n=10) |\n| 100    | 4     | 460     | 0.8   | 720.9n ± 2%               | 459.1n ±  4%          | **-36.30%** (p=0.000 n=10) |\n| 100    | 4     | 460     | 0.9   | 660.6n ± 3%               | 539.0n ± 22%          | **-18.41%** (p=0.005 n=10) |\n| 1000   | 6     | 6907    | 0.6   | 1107.0n ± 2%              | 555.9n ±  8%          | **-49.79%** (p=0.000 n=10) |\n| 1000   | 6     | 6907    | 0.8   | 1040.0n ± 4%              | 613.4n ±  2%          | **-41.02%** (p=0.000 n=10) |\n| 1000   | 6     | 6907    | 0.9   | 936.5n ± 1%               | 731.5n ±  2%          | **-21.89%** (p=0.000 n=10) |\n| 10000  | 9     | 92103   | 0.6   | 10.693µ ± 2%              | 1.058µ ±  2%          | **-90.11%** (p=0.000 n=10) |\n| 10000  | 9     | 92103   | 0.8   | 10.667µ ± 1%              | 1.182µ ±  6%          | **-88.92%** (p=0.000 n=10) |\n| 10000  | 9     | 92103   | 0.9   | 10.724µ ± 1%              | 1.288µ ±  2%          | **-87.98%** (p=0.000 n=10) |\n| 100000 | 11    | 1151292 | 0.6   | 89.385µ ± 0%              | 1.674µ ±  1%          | **-98.13%** (p=0.000 n=10) |\n| 100000 | 11    | 1151292 | 0.8   | 89.349µ ± 1%              | 1.708µ ±  1%          | **-98.09%** (p=0.000 n=10) |\n| 100000 | 11    | 1151292 | 0.9   | 89.284µ ± 1%              | 1.705µ ±  1%          | **-98.09%** (p=0.000 n=10) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeilerkonzept%2Ftopk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeilerkonzept%2Ftopk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeilerkonzept%2Ftopk/lists"}