{"id":19054383,"url":"https://github.com/datadog/sketches-py","last_synced_at":"2025-04-09T05:10:53.380Z","repository":{"id":39678761,"uuid":"150105762","full_name":"DataDog/sketches-py","owner":"DataDog","description":"Python implementations of the distributed quantile sketch algorithm DDSketch","archived":false,"fork":false,"pushed_at":"2024-09-03T22:01:50.000Z","size":171,"stargazers_count":86,"open_issues_count":4,"forks_count":18,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-04-02T04:04:00.736Z","etag":null,"topics":["ddsketch","quantile"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataDog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-24T13:20:19.000Z","updated_at":"2024-12-18T10:00:51.000Z","dependencies_parsed_at":"2024-11-23T01:02:26.043Z","dependency_job_id":"14be2788-18b1-4bfd-b159-55c7545cf61d","html_url":"https://github.com/DataDog/sketches-py","commit_stats":{"total_commits":121,"total_committers":8,"mean_commits":15.125,"dds":0.5289256198347108,"last_synced_commit":"c0b33bcf2dadbd0b16ee44cecfcceae86e93fd6c"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataDog%2Fsketches-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataDog%2Fsketches-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataDog%2Fsketches-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataDog%2Fsketches-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataDog","download_url":"https://codeload.github.com/DataDog/sketches-py/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247980837,"owners_count":21027808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ddsketch","quantile"],"created_at":"2024-11-08T23:38:14.064Z","updated_at":"2025-04-09T05:10:53.359Z","avatar_url":"https://github.com/DataDog.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ddsketch\n\nThis repo contains the Python implementation of the distributed quantile sketch\nalgorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile\nq in [0, 1]. That is if the true value of the qth-quantile is `x` then DDSketch\nreturns a value `y` such that `|x-y| / x \u003c e` where `e` is the relative error\nparameter. (The default here is set to 0.01.)  DDSketch is also fully mergeable,\nmeaning that multiple sketches from distributed systems can be combined in a\ncentral node.\n\nOur default implementation, `DDSketch`, is guaranteed [1] to not grow too large\nin size for any data that can be described by a distribution whose tails are\nsub-exponential.\n\nWe also provide implementations (`LogCollapsingLowestDenseDDSketch` and\n`LogCollapsingHighestDenseDDSketch`) where the q-quantile will be accurate up to\nthe specified relative error for q that is not too small (or large). Concretely,\nthe q-quantile will be accurate up to the specified relative error as long as it\nbelongs to one of the `m` bins kept by the sketch.  If the data is time in\nseconds, the default of `m = 2048` covers 80 microseconds to 1 year.\n\n## Installation\n\nTo install this package, run `pip install ddsketch`, or clone the repo and run\n`python setup.py install`. This package depends on `numpy` and `protobuf`. (The\nprotobuf dependency can be removed if it's not applicable.)\n\n## Usage\n```\nfrom ddsketch import DDSketch\n\nsketch = DDSketch()\n```\nAdd values to the sketch\n```\nimport numpy as np\n\nvalues = np.random.normal(size=500)\nfor v in values:\n  sketch.add(v)\n```\nFind the quantiles of `values` to within the relative error.\n```\nquantiles = [sketch.get_quantile_value(q) for q in [0.5, 0.75, 0.9, 1]]\n```\nMerge another `DDSketch` into `sketch`.\n```\nanother_sketch = DDSketch()\nother_values = np.random.normal(size=500)\nfor v in other_values:\n  another_sketch.add(v)\nsketch.merge(another_sketch)\n```\nThe quantiles of `values` concatenated with `other_values` are still accurate to within the relative error.\n\n## Development\n\nTo work on ddsketch a Python interpreter must be installed. It is recommended to use the provided development\ncontainer (requires [docker](https://www.docker.com/)) which includes all the required Python interpreters.\n\n    docker-compose run dev\n\nOr, if developing outside of docker then it is recommended to use a virtual environment:\n\n    pip install virtualenv\n    virtualenv --python=3 .venv\n    source .venv/bin/activate\n\n\n### Testing\n\nTo run the tests install `riot`:\n\n    pip install riot\n\nReplace the Python version with the interpreter(s) available.\n\n    # Run tests with Python 3.9\n    riot run -p3.9 test\n\n### Release notes\n\nNew features, bug fixes, deprecations and other breaking changes must have\nrelease notes included.\n\nTo generate a release note for the change:\n\n    riot run reno new \u003cshort-description-of-change-no-spaces\u003e\n\nEdit the generated file to include notes on the changes made in the commit/PR\nand add commit it.\n\n\n### Formatting\n\nFormat code with\n\n    riot run fmt\n\n\n### Type-checking\n\nType checking is done with [mypy](http://mypy-lang.org/):\n\n    riot run mypy\n\n\n### Type-checking\n\nLint the code with [flake8](https://flake8.pycqa.org/en/latest/):\n\n    riot run flake8\n\n\n### Protobuf\n\nThe protobuf is stored in the go repository: https://github.com/DataDog/sketches-go/blob/master/ddsketch/pb/ddsketch.proto\n\nInstall the minimum required protoc and generate the Python code:\n\n```sh\ndocker run -v $PWD:/code -it ubuntu:18.04 /bin/bash\napt update \u0026\u0026 apt install protobuf-compiler  # default is 3.0.0\nprotoc --proto_path=ddsketch/pb/ --python_out=ddsketch/pb/ ddsketch/pb/ddsketch.proto\n```\n\n\n### Releasing\n\n1. Generate the release notes and use [`pandoc`](https://pandoc.org/) to format\nthem for Github:\n```bash\n    git checkout master \u0026\u0026 git pull\n    riot run -s reno report --no-show-source | pandoc -f rst -t gfm --wrap=none\n```\n   Copy the output into a new release: https://github.com/DataDog/sketches-py/releases/new.\n\n2. Enter a tag for the release (following [`semver`](https://semver.org)) (eg. `v1.1.3`, `v1.0.3`, `v1.2.0`).\n3. Use the tag without the `v` as the title.\n4. Save the release as a draft and pass the link to someone else to give a quick review.\n5. If all looks good hit publish\n\n\n## References\n[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019. (The code referenced in the paper, including our implementation of the the Greenwald-Khanna (GK) algorithm, can be found at: https://github.com/DataDog/sketches-py/releases/tag/v0.1 )\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadog%2Fsketches-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatadog%2Fsketches-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadog%2Fsketches-py/lists"}