https://github.com/jettify/uddsketch-py
uddsketch data structure for fast and accurate tracking of quantiles in data stream.
https://github.com/jettify/uddsketch-py
ddsketch quantile uddsketch
Last synced: 17 days ago
JSON representation
uddsketch data structure for fast and accurate tracking of quantiles in data stream.
- Host: GitHub
- URL: https://github.com/jettify/uddsketch-py
- Owner: jettify
- License: apache-2.0
- Created: 2022-02-07T01:38:46.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-07-01T21:05:11.000Z (almost 2 years ago)
- Last Synced: 2025-03-25T09:52:56.406Z (about 1 month ago)
- Topics: ddsketch, quantile, uddsketch
- Language: Python
- Homepage:
- Size: 66.4 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 9
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
uddsketch
=============
.. image:: https://github.com/jettify/uddsketch-py/workflows/CI/badge.svg
:target: https://github.com/jettify/uddsketch-py/actions?query=workflow%3ACI
:alt: GitHub Actions status for master branch
.. image:: https://codecov.io/gh/jettify/uddsketch-py/branch/master/graph/badge.svg
:target: https://codecov.io/gh/jettify/uddsketch-py
.. image:: https://img.shields.io/pypi/pyversions/uddsketch.svg
:target: https://pypi.org/project/uddsketch
.. image:: https://img.shields.io/pypi/v/uddsketch.svg
:target: https://pypi.python.org/pypi/uddsketch
..
.. image:: https://readthedocs.org/projects/uddsketch/badge/?version=latest
:target: https://uddsketch.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status**uddsketch** data structure for fast and accurate tracking of quantiles in
data streams. Implantation is based on ideas described in DDSketch_ and
UDDSketch_ papers.Simple example
--------------.. code:: python
from uddsketch import UDDSketch
hist = UDDSketch(initial_error=0.01)
# Populate structure with dummy data
for i in range(0, 100):
hist.add(0.1 * i)# Estimate p95 percentile
q = hist.quantile(0.95)
print(f"quantie: {q}")
# quantie: 9.487973051696695# Estimate median
m = hist.median()
print(f"median: {m}")
# median: 4.903763642930295Merging Two Sketches
--------------------.. code:: python
import numpy as np
from uddsketch import UDDSketch
def main():
# fix seed for reproducible results
rs = np.random.RandomState(seed=42)
# generate dummy data
data = ((rs.pareto(3.0, 5000) + 1) * 2.0).tolist()# add first half to first sketch
hist1 = UDDSketch(initial_error=0.01, max_buckets=128)
for v in data[:500]:
hist1.add(v)# add second half to other sketch
hist2 = UDDSketch(initial_error=0.01, max_buckets=128)
for v in data[500:]:
hist2.add(v)hist1.merge(hist2)
combined_count = hist1.num_values
print(f"Num values added to combined sketch: {combined_count}")
# Num values added to combined sketch: 5000# Estimate combined p95 percentile
q = hist1.quantile(0.95)
print(f"quantie: {q}")
# quantie: 5.419515033387234# Estimate combined median
m = hist1.median()
print(f"median: {m}")
# median: 2.5344610207788048if __name__ == "__main__":
main()Installation
------------
Installation process is simple, just::$ pip install uddsketch
References
----------1. Charles Masson, Jee E. Rim and Homin K. Lee. *DDSketch: a fast and fully-mergeable quantile sketch
with relative-error guarantees.* [https://www.vldb.org/pvldb/vol12/p2195-masson.pdf]2. I. Epicoco, C. Melle, M. Cafaro, M. Pulimeno and G. Morleo. *UDDSketch: Accurate Tracking of
Quantiles in Data Streams.* [https://arxiv.org/abs/2004.08604].. _DDSketch: https://www.vldb.org/pvldb/vol12/p2195-masson.pdf
.. _UDDSketch: https://arxiv.org/abs/2004.08604