Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lightstep/varopt
A Golang implementation of the VarOpt weighted sampling algorithm
https://github.com/lightstep/varopt
algorithm sampling sampling-weighting
Last synced: 8 days ago
JSON representation
A Golang implementation of the VarOpt weighted sampling algorithm
- Host: GitHub
- URL: https://github.com/lightstep/varopt
- Owner: lightstep
- License: apache-2.0
- Created: 2019-11-03T16:49:11.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2023-11-29T23:24:45.000Z (12 months ago)
- Last Synced: 2024-06-19T03:10:02.534Z (5 months ago)
- Topics: algorithm, sampling, sampling-weighting
- Language: Go
- Homepage:
- Size: 62.5 KB
- Stars: 10
- Watchers: 10
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Docs](https://godoc.org/github.com/lightstep/varopt?status.svg)](https://godoc.org/github.com/lightstep/varopt)
# VarOpt Sampling Algorithm
This is an implementation of VarOpt, an unbiased weighted sampling
algorithm described in the paper [Stream sampling for variance-optimal
estimation of subset sums](https://arxiv.org/pdf/0803.0473.pdf) (2008)
by Edith Cohen, Nick Duffield, Haim Kaplan, Carsten Lund, and Mikkel
Thorup.VarOpt is a reservoir-type sampler that maintains a fixed-size sample
and provides a mechanism for merging unequal-weight samples.This repository also includes a simple reservoir sampling algorithm,
often useful in conjunction with weighed reservoir sampling, that
implements Algorithm R from [Random sampling with a
reservoir](https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R)
(1985) by Jeffrey Vitter.## Usage: Natural Weights
A typical use of VarOpt sampling is to estimate network flows using
sample packets. In this use-case, the weight applied to each sample
is the size of the packet. Because VarOpt computes an unbiased
sample, sample data points can be summarized along secondary
dimensions. For example, we can select a subset of sampled packets
according to a secondary attribute, sum the sample weights, and the
result is expected to equal the size of packets corresponding to the
secondary attribute from the original population.See [weighted_test.go](https://github.com/lightstep/varopt/blob/master/weighted_test.go) for an example.
## Usage: Inverse-probability Weights
Another use for VarOpt sampling uses inverse-probability weights to
estimate frequencies while simultaneously controlling sample
diversity. Suppose a sequence of observations can be naturally
categorized into N different buckets. The goal in this case is to
compute a sample where each bucket is well represented, while
maintaining frequency estimates.In this use-case, the weight assigned to each observation is the
inverse probability of the bucket it belongs to. The result of
weighted sampling with inverse-probability weights is a uniform
expectated value; in this example we expect an equal number of
observations falling into each bucket. Each observation represents a
frequency of its sample weight (computed by VarOpt) divided by its
original weight (the inverse-probability).See [frequency_test.go](https://github.com/lightstep/varopt/blob/master/frequency_test.go) for an example.
## Usage: Merging Samples
VarOpt supports merging independently collected samples one
observation at a time. This is useful for building distributed
sampling schemes. In this use-case, each node in a distributed system
computes a weighted sample. To combine samples, simply input all the
observations and their corresponding weights into a new VarOpt sample.