Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/emaadmanzoor/streaming-unique-counting

Approximately counting unique items in a stream. Algorithms course project at KAUST.
https://github.com/emaadmanzoor/streaming-unique-counting

Last synced: about 1 month ago
JSON representation

Approximately counting unique items in a stream. Algorithms course project at KAUST.

Awesome Lists containing this project

README

        

# Streaming Unique Counting
*Algorithms, KAUST, Fall 2013 - with Prof. Mikhail Moshkov.*

Implementations of HyperLogLog and adaptive sampling for approximate counting of unique items in a stream. A [report](www.eyeshalfclosed.com/docs/CS260_Final_Report.pdf) describing some interesting empirical comparision results is also available.

## Quickstart

```
pip install mmh3
git clone [email protected]:emaadmanzoor/streaming-unique-counting.git
cd streaming-unique-counting

python cardinality_estimation.py "test_data/wuthering_heights.txt" 0 exact
python cardinality_estimation.py "test_data/wuthering_heights.txt" 0 adaptive 1150

python cardinality_estimation.py "test_data/big_test/" 1 exact
python cardinality_estimation.py "test_data/big_test/" 1 adaptive 5000
```

## Contributors

* [Emaad Ahmed Manzoor](http://eyeshalfclosed.com)
* Tariq Alturkestani
* Jumana Baghabra
* Fatemah Alzayer
* Meshari Alazmi