{"id":13571252,"url":"https://github.com/alecmocatta/streaming_algorithms","last_synced_at":"2025-04-09T13:08:45.269Z","repository":{"id":52421808,"uuid":"149362593","full_name":"alecmocatta/streaming_algorithms","owner":"alecmocatta","description":"Performant implementations of various streaming algorithms, including Count–min sketch, Top k, HyperLogLog, Reservoir sampling.","archived":false,"fork":false,"pushed_at":"2024-08-14T21:14:27.000Z","size":125,"stargazers_count":85,"open_issues_count":7,"forks_count":11,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-06T22:42:44.884Z","etag":null,"topics":["data-structures","hyperloglog","probabilistic-data-structures","rust","streaming-algorithms"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alecmocatta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-18T23:06:22.000Z","updated_at":"2024-11-11T17:11:25.000Z","dependencies_parsed_at":"2024-12-20T20:13:28.494Z","dependency_job_id":"63452433-d462-42cc-b1f2-384a32b712d0","html_url":"https://github.com/alecmocatta/streaming_algorithms","commit_stats":{"total_commits":35,"total_committers":4,"mean_commits":8.75,"dds":"0.17142857142857137","last_synced_commit":"99522db25ab4f7a7ba91c793b7568cc1c62afa56"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecmocatta%2Fstreaming_algorithms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecmocatta%2Fstreaming_algorithms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecmocatta%2Fstreaming_algorithms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecmocatta%2Fstreaming_algorithms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alecmocatta","download_url":"https://codeload.github.com/alecmocatta/streaming_algorithms/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045233,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-structures","hyperloglog","probabilistic-data-structures","rust","streaming-algorithms"],"created_at":"2024-08-01T14:01:00.235Z","updated_at":"2025-04-09T13:08:45.251Z","avatar_url":"https://github.com/alecmocatta.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# streaming_algorithms\n\n[![Crates.io](https://img.shields.io/crates/v/streaming_algorithms.svg?maxAge=86400)](https://crates.io/crates/streaming_algorithms)\n[![MIT / Apache 2.0 licensed](https://img.shields.io/crates/l/streaming_algorithms.svg?maxAge=2592000)](#License)\n[![Build Status](https://dev.azure.com/alecmocatta/streaming_algorithms/_apis/build/status/tests?branchName=master)](https://dev.azure.com/alecmocatta/streaming_algorithms/_build?definitionId=16)\n\n[📖 Docs](https://docs.rs/streaming_algorithms) | [💬 Chat](https://constellation.zulipchat.com/#narrow/stream/213236-subprojects)\n\nSIMD-accelerated implementations of various [streaming algorithms](https://en.wikipedia.org/wiki/Streaming_algorithm).\n\nThis library is a work in progress. PRs are very welcome! Currently implemented algorithms include:\n\n * Count–min sketch\n * Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)\n * HyperLogLog\n * Reservoir sampling\n\nA goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable an approximate version of something akin to `SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k`.\n\nRun your application with `RUSTFLAGS=\"-C target-cpu=native\"` and the `nightly` feature to benefit from the SIMD-acceleration like so:\n\n```bash\nRUSTFLAGS=\"-C target-cpu=native\" cargo run --features \"streaming_algorithms/nightly\" --release\n```\n\nSee [this gist](https://gist.github.com/debasishg/8172796) for a good list of further algorithms to be implemented. Other resources are [Probabilistic data structures – Wikipedia](https://en.wikipedia.org/wiki/Category:Probabilistic_data_structures), [DataSketches – A similar Java library originating at Yahoo](https://datasketches.github.io/), and [Algebird  – A similar Java library originating at Twitter](https://github.com/twitter/algebird).\n\nAs these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.\n\n## License\nLicensed under either of\n\n * Apache License, Version 2.0, ([LICENSE-APACHE.txt](LICENSE-APACHE.txt) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license ([LICENSE-MIT.txt](LICENSE-MIT.txt) or http://opensource.org/licenses/MIT)\n\nat your option.\n\nUnless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falecmocatta%2Fstreaming_algorithms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falecmocatta%2Fstreaming_algorithms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falecmocatta%2Fstreaming_algorithms/lists"}