https://github.com/venantius/droplet
Droplet is a Python library for sampling, sketching, and summarizing data from massive data streams.
https://github.com/venantius/droplet
Last synced: 9 months ago
JSON representation
Droplet is a Python library for sampling, sketching, and summarizing data from massive data streams.
- Host: GitHub
- URL: https://github.com/venantius/droplet
- Owner: venantius
- Created: 2013-03-28T03:54:04.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2013-11-01T19:42:58.000Z (about 12 years ago)
- Last Synced: 2025-03-23T20:43:42.257Z (10 months ago)
- Language: Python
- Homepage:
- Size: 2.44 MB
- Stars: 6
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Droplet
=======
Introduction
------------
Droplet is a Python library for sampling, sketching and summarizing massive data streams. More information can be found on the [wiki](https://github.com/venantius/droplet/wiki).
Current status is PRE-ALPHA. Do not expect anything to work.
Contents
--------
Samplers:
* L0-sampler
Sketches:
* Count-min (TODO)
* Top-k (TODO)
* HyperLogLog (TODO)
Summaries:
* TBD
Installation guide
------------------
Pretty simple, really. From the terminal:
git clone https://github.com/venantius/droplet.git
cd droplet
python setup.py install
Usage
-----
Droplet is designed for use with massive data streams (GB+,TB+, etc.) that may only be read once.
EXAMPLE GOES HERE
Dependencies
------------
Pypi:
- mmh3
License
-------
Droplet is licensed under the Apache license.