{"id":17901609,"url":"https://github.com/liorkogan/streamsampler","last_synced_at":"2026-02-21T09:32:18.248Z","repository":{"id":79808074,"uuid":"48119909","full_name":"LiorKogan/StreamSampler","owner":"LiorKogan","description":"A header-only C++ library implementing seven reservoir sampling algorithms for streaming data.","archived":false,"fork":false,"pushed_at":"2026-01-23T19:16:37.000Z","size":176,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-24T09:07:42.875Z","etag":null,"topics":["algorithm","online-algorithm","probabilistic","reservoir-sampling","stream","stream-sampler"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LiorKogan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2015-12-16T15:52:04.000Z","updated_at":"2026-01-23T19:16:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"d41bd343-2c22-4266-8085-2e15386d6625","html_url":"https://github.com/LiorKogan/StreamSampler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LiorKogan/StreamSampler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LiorKogan%2FStreamSampler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LiorKogan%2FStreamSampler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LiorKogan%2FStreamSampler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LiorKogan%2FStreamSampler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LiorKogan","download_url":"https://codeload.github.com/LiorKogan/StreamSampler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LiorKogan%2FStreamSampler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29678237,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T06:23:40.028Z","status":"ssl_error","status_checked_at":"2026-02-21T06:23:39.222Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","online-algorithm","probabilistic","reservoir-sampling","stream","stream-sampler"],"created_at":"2024-10-28T16:02:50.916Z","updated_at":"2026-02-21T09:32:18.217Z","avatar_url":"https://github.com/LiorKogan.png","language":"C++","readme":"# StreamSampler\n\nA header-only C++11 library\n\nCopyright © 2015 Lior Kogan (koganlior1 [at] gmail [dot] com)\n\nReleased under the Apache License, Version 2.0\n\n--\n\nA [stream](https://en.wikipedia.org/wiki/Stream_(computing)) is a sequence of data elements made available over time. The number of elements in the stream is usually unknown a priori and can be very large.\n\nA _simple random sample_ of a stream is a subset of the stream elements, such that each stream element (from the start of the sampling till the latest available element) has an equal probability of being included in the sample.\n\nA **stream sampler** maintains one or more simple random samples, each with a fixed number of elements. As stream elements become available, the samples are updated to remain simple random samples.\nStream samplers are implemented using [online algorithms](https://en.wikipedia.org/wiki/Online_algorithm): The size of the stream is unknown, and only [one pass](https://en.wikipedia.org/wiki/One-pass_algorithm) over the stream is possible. \nThe time complexity of stream samplers is linear or [sub-linear](https://en.wikipedia.org/wiki/Time_complexity#Sub-linear_time) and the space complexity is constant.\n\nThe following seven unweighted [sampling without replacement](https://en.wikipedia.org/wiki/Simple_random_sample) [reservoir](https://en.wikipedia.org/wiki/Reservoir_sampling) [randomized](https://en.wikipedia.org/wiki/Randomized_algorithm) algorithms are implemented:\n\n - R    : Presented in [\"The Art of Computer Programming\" [Knuth] Vol.2, 3.4.2 Algorithm R](https://books.google.co.il/books?id=Zu-HAwAAQBAJ\u0026printsec=frontcover\u0026hl=iw\u0026source=gbs_ge_summary_r\u0026cad=0#v=onepage\u0026q\u0026f=false) (Reservoir Sampling) [attributed to Waterman](https://markkm.com/blog/reservoir-sampling/), modified according to Ex.10\n - X,Y,Z: Presented in [\"Random Sampling with a Reservoir\"](http://www.cs.umd.edu/~samir/498/vitter.pdf) [Jeferey Scott Vitter, 1985]\n - K,L,M: Presented in [\"Reservoir-Sampling Algorithms of Time Complexity O(n(1+log(N)-log(n)))\"](http://dl.acm.org/citation.cfm?id=198435) [Kim-Hung Li, 1994]\n\nAlgorithm R is the standard 'textbook algorithm'. Algorithms X, Y, Z, K, L, and M offer huge performance improvement by drawing the number of stream elements to skip at each stage, so much fewer random numbers need to be generated, especially for large streams (hence the sub-linear time complexity). Z, K, L, and M are typically much faster than R, while M is usually the most performant.\n\nIn all these papers, the algorithms were formulated to control the element fetching from the stream (An external function, *GetNextElement()*, is called by the algorithms). Such flow control is usually less suitable for real-world scenarios. In this implementation, the algorithms were reformulated such that a process can fetch elements from the stream and call a member function of the stream sampler class - *AddElement*. This function returns the number of stream elements the caller should skip before calling it again.\n\nThis implementation also extends the algorithms by supporting the construction of multiple independent samples.\n\nTwo versions of *AddElement* are provided: one with copy semantics (*AddElement(const ElementType\u0026 Element)*) and one with move semantics (*AddElement(ElementType\u0026\u0026 Element)*).\n\n*StreamSamplerTest* contains a usage example: *StreamSamplerExample()*, a comparative performance benchmark function *StreamSamplerPerformanceBenchmark()* and a uniformity test function *StreamSamplerTestUniformity()*.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliorkogan%2Fstreamsampler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliorkogan%2Fstreamsampler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliorkogan%2Fstreamsampler/lists"}