Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rozap/spacesaving
stream count distinct element estimation
https://github.com/rozap/spacesaving
Last synced: 8 days ago
JSON representation
stream count distinct element estimation
- Host: GitHub
- URL: https://github.com/rozap/spacesaving
- Owner: rozap
- License: mit
- Created: 2016-02-22T06:07:48.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-04-11T19:23:59.000Z (over 8 years ago)
- Last Synced: 2024-10-04T23:16:03.347Z (about 1 month ago)
- Language: Elixir
- Size: 8.79 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- freaking_awesome_elixir - Elixir - stream count distinct element estimation using the "space saving" algorithm. (Algorithms and Data structures)
- fucking-awesome-elixir - spacesaving - stream count distinct element estimation using the "space saving" algorithm. (Algorithms and Data structures)
- awesome-elixir - spacesaving - stream count distinct element estimation using the "space saving" algorithm. (Algorithms and Data structures)
README
# Spacesaving
Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.
[Docs on hex](http://hexdocs.pm/spacesaving/Spacesaving.html)
## Usage
Add it to you `mix.exs` deps
```elixir
{:spacesaving, "~> 0.0.2"}
```Init with 3 spaces, so we track 3 elements
```elixir
import Spacesavingstate = init(3)
```Push some elements
```elixir
state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)
```Get the top k elements
```elixir
top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound
```Merge two states
```elixir
left = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]
```## References
[Original Paper](http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf)