{"id":36614274,"url":"https://github.com/gstamatelat/random-sampling","last_synced_at":"2026-01-12T09:04:38.743Z","repository":{"id":49003764,"uuid":"111946494","full_name":"gstamatelat/random-sampling","owner":"gstamatelat","description":"A collection of algorithms in Java 8 for the problem of random sampling with a reservoir","archived":false,"fork":false,"pushed_at":"2022-11-20T14:52:30.000Z","size":438,"stargazers_count":33,"open_issues_count":8,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-07-28T09:58:53.079Z","etag":null,"topics":["algorithm","random-sampling","reservoir-sampling","stream-sampling"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gstamatelat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-24T18:33:15.000Z","updated_at":"2023-07-05T18:53:48.000Z","dependencies_parsed_at":"2023-01-22T16:00:39.570Z","dependency_job_id":null,"html_url":"https://github.com/gstamatelat/random-sampling","commit_stats":null,"previous_names":[],"tags_count":28,"template":null,"template_full_name":null,"purl":"pkg:github/gstamatelat/random-sampling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gstamatelat%2Frandom-sampling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gstamatelat%2Frandom-sampling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gstamatelat%2Frandom-sampling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gstamatelat%2Frandom-sampling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gstamatelat","download_url":"https://codeload.github.com/gstamatelat/random-sampling/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gstamatelat%2Frandom-sampling/sbom","scorecard":{"id":447348,"data":{"date":"2025-08-11","repo":{"name":"github.com/gstamatelat/random-sampling","commit":"e3ad270337d56ef486e7293cad3c18bca0f22a24"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.6,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":9,"reason":"binaries present in source code","details":["Warn: binary detected: gradle/wrapper/gradle-wrapper.jar:1"],"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.md:0","Info: FSF or OSI recognized license: MIT License: LICENSE.md:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.28 not signed: https://api.github.com/repos/gstamatelat/random-sampling/releases/76983757","Warn: release artifact v0.27 not signed: https://api.github.com/repos/gstamatelat/random-sampling/releases/76877345","Warn: release artifact v0.26 not signed: https://api.github.com/repos/gstamatelat/random-sampling/releases/76708248","Warn: release artifact v0.25 not signed: https://api.github.com/repos/gstamatelat/random-sampling/releases/76094585","Warn: release artifact v0.24 not signed: https://api.github.com/repos/gstamatelat/random-sampling/releases/75822623","Warn: release artifact v0.28 does not have provenance: https://api.github.com/repos/gstamatelat/random-sampling/releases/76983757","Warn: release artifact v0.27 does not have provenance: https://api.github.com/repos/gstamatelat/random-sampling/releases/76877345","Warn: release artifact v0.26 does not have provenance: https://api.github.com/repos/gstamatelat/random-sampling/releases/76708248","Warn: release artifact v0.25 does not have provenance: https://api.github.com/repos/gstamatelat/random-sampling/releases/76094585","Warn: release artifact v0.24 does not have provenance: https://api.github.com/repos/gstamatelat/random-sampling/releases/75822623"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T07:09:43.541Z","repository_id":49003764,"created_at":"2025-08-19T07:09:43.542Z","updated_at":"2025-08-19T07:09:43.542Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28337617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T06:09:07.588Z","status":"ssl_error","status_checked_at":"2026-01-12T06:05:18.301Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","random-sampling","reservoir-sampling","stream-sampling"],"created_at":"2026-01-12T09:04:35.639Z","updated_at":"2026-01-12T09:04:38.737Z","avatar_url":"https://github.com/gstamatelat.png","language":"Java","readme":"# Random Sampling\n\nA collection of algorithms in Java 8 for the problem of random sampling with a\nreservoir.\n\nReservoir sampling is a family of randomized algorithms for randomly choosing a\nsample of `k` items from a list `S` containing `n` items, where `n` is either a\nvery large or unknown number. Typically `n` is large enough that the list\ndoesn't fit into main memory. [1] In this context, the sample of `k` items will\nbe referred to as ***sample*** and the list `S` as ***stream***.\n\nThis package distinguishes these algorithms into two main categories: the ones\nthat assign a weight in each item of the source stream and the ones that don't.\nThese will be referred to as weighted and unweighted random sampling algorithms\nrespectively. In unweighted algorithms, each item in the stream has probability\n`k/n` in appearing in the sample. In weighted algorithms this probability\ndepends on the extra parameter `weight`. Each algorithm may interpret this\nparameter in a different way, for example in [2] two possible interpretations\nare mentioned.\n\n## Using\n\nYou can add a dependency from your project as follows:\n\nUsing Maven\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003egr.james\u003c/groupId\u003e\n    \u003cartifactId\u003erandom-sampling\u003c/artifactId\u003e\n    \u003cversion\u003e0.28\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nUsing Gradle\n\n```gradle\nimplementation 'gr.james:random-sampling:0.28' // Runtime\napi            'gr.james:random-sampling:0.28' // Public API\n```\n\n## Examples\n\nSelect 10 numbers at random in the range [1,100]. Each number has a 10%\nprobability of appearing in the sample.\n\n```java\nRandomSampling\u003cInteger\u003e rs = new WatermanSampling\u003c\u003e(10, new Random());\nrs.feed(IntStream.rangeClosed(1, 100).boxed().iterator());\nCollection\u003cInteger\u003e sample = rs.sample();\nSystem.out.println(sample);\n```\n\nSelect 5 random tokens from an input stream.\n\n```java\nRandomSampling\u003cString\u003e rs = new VitterXSampling\u003c\u003e(5, new Random());\nrs.feed(new Scanner(System.in));\nSystem.out.println(rs.sample());\n```\n\nSame example using Algorithm Z.\n\n```java\nRandomSampling\u003cString\u003e rs = new VitterZSampling\u003c\u003e(5, new Random());\nrs.feed(new Scanner(System.in));\nSystem.out.println(rs.sample());\n```\n\nSelect 2 terms from a vocabulary, based on their weight.\n\n```java\nWeightedRandomSampling\u003cString\u003e rs = new EfraimidisSampling\u003c\u003e(2, new Random());\nrs.feed(\"collection\", 1);\nrs.feed(\"algorithms\", 2);\nrs.feed(\"java\", 2);\nrs.feed(\"random\", 3);\nrs.feed(\"sampling\", 4);\nrs.feed(\"reservoir\", 5);\nSystem.out.println(rs.sample());\n```\n\nUnweighted random sampling using the Java 8 stream API.\n\n```java\nRandomSamplingCollector\u003cInteger\u003e collector = WatermanSampling.collector(5, new Random());\nCollection\u003cInteger\u003e sample = IntStream.range(0, 20).boxed().collect(collector);\nSystem.out.println(sample);\n```\n\nWeighted random sampling using the Java 8 stream API.\n\n```java\nWeightedRandomSamplingCollector\u003cString\u003e collector = ChaoSampling.weightedCollector(2, new Random());\nMap\u003cString, Double\u003e map = new HashMap\u003c\u003e();\nmap.put(\"collection\", 1.0);\nmap.put(\"algorithms\", 2.0);\nmap.put(\"java\", 2.0);\nmap.put(\"random\", 3.0);\nmap.put(\"sampling\", 4.0);\nmap.put(\"reservoir\", 5.0);\nCollection\u003cString\u003e sample = map.entrySet().stream().collect(collector);\nSystem.out.println(sample);\n```\n\n## Algorithms\n\n| Class                       | Algorithm                     | Space  | Weighted |\n| :-------------------------- | :---------------------------- | :----- | :------- |\n| `WatermanSampling`          | Algorithm R by Waterman       | `O(k)` |          |\n| `VitterXSampling`           | Algorithm X by Vitter         | `O(k)` |          |\n| `VitterZSampling`           | Algorithm Z by Vitter         | `O(k)` |          |\n| `LiLSampling`               | Algorithm L by Li             | `O(k)` |          |\n| `EfraimidisSampling`        | Algorithm A-Res by Efraimidis | `O(k)` | \u0026#10004; |\n| `ChaoSampling`              | Algorithm by Chao             | `O(k)` | \u0026#10004; |\n| `SequentialPoissonSampling` | Algorithm by Ohlsson          | `O(k)` | \u0026#10004; |\n| `ParetoSampling`            | Algorithm by Rosén            | `O(k)` | \u0026#10004; |\n\n### 1 Algorithm R by Waterman\n\nSignature: `WatermanSampling` implements `RandomSampling`\n\n#### References\n\n- The Art of Computer Programming, Vol II, Random Sampling and Shuffling.\n\n### 2 Algorithm X by Vitter\n\nSignature: `VitterXSampling` implements `RandomSampling`\n\n#### References\n\n- [Vitter, Jeffrey S. \"Random sampling with a reservoir.\" ACM Transactions on Mathematical Software (TOMS) 11.1 (1985): 37-57.](https://doi.org/10.1145/3147.3165)\n\n### 3 Algorithm Z by Vitter\n\nSignature: `VitterZSampling` implements `RandomSampling`\n\n#### References\n\n- [Vitter, Jeffrey S. \"Random sampling with a reservoir.\" ACM Transactions on Mathematical Software (TOMS) 11.1 (1985): 37-57.](https://doi.org/10.1145/3147.3165)\n\n### 4 Algorithm L by Li\n\nSignature: `LiLSampling` implements `RandomSampling`\n\n#### References\n\n- [Li, Kim-Hung. \"Reservoir-sampling algorithms of time complexity O (n (1+ log (N/n))).\" ACM Transactions on Mathematical Software (TOMS) 20.4 (1994): 481-493.](https://doi.org/10.1145/198429.198435)\n\n### 5 Algorithm A-Res by Efraimidis\n\nSignature: `EfraimidisSampling` implements `WeightedRandomSampling`\n\n#### References\n\n- [Efraimidis, Pavlos S., and Paul G. Spirakis. \"Weighted random sampling with a reservoir.\" Information Processing Letters 97.5 (2006): 181-185.](https://doi.org/10.1016/j.ipl.2005.11.003)\n\n### 6 Algorithm by Chao\n\nSignature: `ChaoSampling` implements `WeightedRandomSampling`\n\n#### References\n\n- [Chao, M. T. \"A general purpose unequal probability sampling plan.\" Biometrika 69.3 (1982): 653-656.](https://doi.org/10.2307/2336002)\n- [Sugden, R. A. \"Chao's list sequential scheme for unequal probability sampling.\" Journal of Applied Statistics 23.4 (1996): 413-421.](https://doi.org/10.1080/02664769624152)\n\n### 7 Algorithm by Ohlsson\n\nSignature: `SequentialPoissonSampling` implements `WeightedRandomSampling`\n\n#### References\n\n- [Ohlsson, Esbjörn. \"Sequential poisson sampling.\" Journal of official Statistics 14.2 (1998): 149.](https://www.mendeley.com/catalogue/95bcff1f-86be-389c-ab3f-717796d22abd/)\n\n### 7 Algorithm by Rosén\n\nSignature: `ParetoSampling` implements `WeightedRandomSampling`\n\n#### References\n\n- [Rosén, Bengt. \"Asymptotic theory for order sampling.\" Journal of Statistical Planning and Inference 62.2 (1997): 135-158.](https://doi.org/10.1016/S0378-3758(96)00185-1)\n- [Rosén, Bengt. \"On sampling with probability proportional to size.\" Journal of statistical planning and inference 62.2 (1997): 159-191.](https://doi.org/10.1016/S0378-3758(96)00186-3)\n\n## References\n\n[1] [Wikipedia contributors. \"Reservoir sampling.\" Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 17 Oct. 2017. Web. 21 Nov. 2017.](https://en.wikipedia.org/wiki/Reservoir_sampling)\n\n[2] [Efraimidis, Pavlos S. \"Weighted random sampling over data streams.\" Algorithms, Probability, Networks, and Games. Springer International Publishing, 2015. 183-195.](https://doi.org/10.1007/978-3-319-24024-4_12)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgstamatelat%2Frandom-sampling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgstamatelat%2Frandom-sampling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgstamatelat%2Frandom-sampling/lists"}