{"id":18767772,"url":"https://github.com/lightstep/varopt","last_synced_at":"2025-04-13T06:32:31.032Z","repository":{"id":48689655,"uuid":"219333396","full_name":"lightstep/varopt","owner":"lightstep","description":"A Golang implementation of the VarOpt weighted sampling algorithm ","archived":false,"fork":false,"pushed_at":"2024-10-18T06:23:07.000Z","size":66,"stargazers_count":11,"open_issues_count":1,"forks_count":1,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-26T23:21:56.141Z","etag":null,"topics":["algorithm","sampling","sampling-weighting"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lightstep.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-03T16:49:11.000Z","updated_at":"2025-03-13T06:58:28.000Z","dependencies_parsed_at":"2024-06-19T02:57:55.096Z","dependency_job_id":"5cc20979-1fd8-49b9-8082-09e732822ccf","html_url":"https://github.com/lightstep/varopt","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightstep%2Fvaropt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightstep%2Fvaropt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightstep%2Fvaropt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightstep%2Fvaropt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lightstep","download_url":"https://codeload.github.com/lightstep/varopt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248674678,"owners_count":21143760,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","sampling","sampling-weighting"],"created_at":"2024-11-07T19:08:30.529Z","updated_at":"2025-04-13T06:32:27.196Z","avatar_url":"https://github.com/lightstep.png","language":"Go","readme":"[![Docs](https://godoc.org/github.com/lightstep/varopt?status.svg)](https://godoc.org/github.com/lightstep/varopt)\n\n# VarOpt Sampling Algorithm\n\nThis is an implementation of VarOpt, an unbiased weighted sampling\nalgorithm described in the paper [Stream sampling for variance-optimal\nestimation of subset sums](https://arxiv.org/pdf/0803.0473.pdf) (2008)\nby Edith Cohen, Nick Duffield, Haim Kaplan, Carsten Lund, and Mikkel\nThorup.\n\nVarOpt is a reservoir-type sampler that maintains a fixed-size sample\nand provides a mechanism for merging unequal-weight samples.\n\nThis repository also includes a simple reservoir sampling algorithm,\noften useful in conjunction with weighed reservoir sampling, that\nimplements Algorithm R from [Random sampling with a\nreservoir](https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R)\n(1985) by Jeffrey Vitter.\n\n## Usage: Natural Weights\n\nA typical use of VarOpt sampling is to estimate network flows using\nsample packets.  In this use-case, the weight applied to each sample\nis the size of the packet.  Because VarOpt computes an unbiased\nsample, sample data points can be summarized along secondary\ndimensions.  For example, we can select a subset of sampled packets\naccording to a secondary attribute, sum the sample weights, and the\nresult is expected to equal the size of packets corresponding to the\nsecondary attribute from the original population.\n\nSee [weighted_test.go](https://github.com/lightstep/varopt/blob/master/weighted_test.go) for an example.\n\n## Usage: Inverse-probability Weights\n\nAnother use for VarOpt sampling uses inverse-probability weights to\nestimate frequencies while simultaneously controlling sample\ndiversity.  Suppose a sequence of observations can be naturally\ncategorized into N different buckets.  The goal in this case is to\ncompute a sample where each bucket is well represented, while\nmaintaining frequency estimates.\n\nIn this use-case, the weight assigned to each observation is the\ninverse probability of the bucket it belongs to.  The result of\nweighted sampling with inverse-probability weights is a uniform\nexpectated value; in this example we expect an equal number of\nobservations falling into each bucket.  Each observation represents a\nfrequency of its sample weight (computed by VarOpt) divided by its\noriginal weight (the inverse-probability).\n\nSee [frequency_test.go](https://github.com/lightstep/varopt/blob/master/frequency_test.go) for an example.\n\n## Usage: Merging Samples\n\nVarOpt supports merging independently collected samples one\nobservation at a time.  This is useful for building distributed\nsampling schemes.  In this use-case, each node in a distributed system\ncomputes a weighted sample.  To combine samples, simply input all the\nobservations and their corresponding weights into a new VarOpt sample.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightstep%2Fvaropt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flightstep%2Fvaropt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightstep%2Fvaropt/lists"}