Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rozap/spacesaving

stream count distinct element estimation
https://github.com/rozap/spacesaving

Last synced: 8 days ago
JSON representation

stream count distinct element estimation

Awesome Lists containing this project

README

        

# Spacesaving

Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.

[Docs on hex](http://hexdocs.pm/spacesaving/Spacesaving.html)

## Usage

Add it to you `mix.exs` deps
```elixir
{:spacesaving, "~> 0.0.2"}
```

Init with 3 spaces, so we track 3 elements
```elixir
import Spacesaving

state = init(3)
```

Push some elements
```elixir
state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)
```

Get the top k elements
```elixir
top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound
```

Merge two states
```elixir
left = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)

merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]
```

## References
[Original Paper](http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf)