{"id":15637126,"url":"https://github.com/barrust/pyprobables","last_synced_at":"2025-04-09T04:00:17.700Z","repository":{"id":22537706,"uuid":"96632466","full_name":"barrust/pyprobables","owner":"barrust","description":"Probabilistic data structures in python http://pyprobables.readthedocs.io/en/latest/index.html","archived":false,"fork":false,"pushed_at":"2024-12-27T05:41:39.000Z","size":4558,"stargazers_count":116,"open_issues_count":2,"forks_count":11,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-02T03:01:45.169Z","etag":null,"topics":["bitarray","bloom-filter","count-mean-min-sketch","count-mean-sketch","count-min-sketch","counting-bloom-filter","counting-cuckoo-filter","cuckoo-filter","data-analysis","data-mining","data-science","data-structures","datastructures","heavy-hitters","probabilistic-programming","probability","python","quotient-filter","stream-threshold"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/barrust.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-08T17:02:44.000Z","updated_at":"2025-03-22T01:45:29.000Z","dependencies_parsed_at":"2024-01-30T03:40:07.330Z","dependency_job_id":"d4ca127d-5635-461d-9fe0-afa54d447df8","html_url":"https://github.com/barrust/pyprobables","commit_stats":{"total_commits":139,"total_committers":8,"mean_commits":17.375,"dds":0.1079136690647482,"last_synced_commit":"d88115f149cde60e96e06dada4092d63f599918e"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fpyprobables","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fpyprobables/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fpyprobables/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fpyprobables/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/barrust","download_url":"https://codeload.github.com/barrust/pyprobables/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247974716,"owners_count":21026744,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitarray","bloom-filter","count-mean-min-sketch","count-mean-sketch","count-min-sketch","counting-bloom-filter","counting-cuckoo-filter","cuckoo-filter","data-analysis","data-mining","data-science","data-structures","datastructures","heavy-hitters","probabilistic-programming","probability","python","quotient-filter","stream-threshold"],"created_at":"2024-10-03T11:10:21.769Z","updated_at":"2025-04-09T04:00:17.655Z","avatar_url":"https://github.com/barrust.png","language":"Python","readme":"PyProbables\n===========\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n    :target: https://opensource.org/licenses/MIT/\n    :alt: License\n.. image:: https://img.shields.io/github/release/barrust/pyprobables.svg\n    :target: https://github.com/barrust/pyprobables/releases\n    :alt: GitHub release\n.. image:: https://github.com/barrust/pyprobables/workflows/Python%20package/badge.svg\n    :target: https://github.com/barrust/pyprobables/actions?query=workflow%3A%22Python+package%22\n    :alt: Build Status\n.. image:: https://codecov.io/gh/barrust/pyprobables/branch/master/graph/badge.svg?token=OdETiNgz9k\n    :target: https://codecov.io/gh/barrust/pyprobables\n    :alt: Test Coverage\n.. image:: https://readthedocs.org/projects/pyprobables/badge/?version=latest\n    :target: http://pyprobables.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n.. image:: https://badge.fury.io/py/pyprobables.svg\n    :target: https://pypi.org/project/pyprobables/\n    :alt: Pypi Release\n.. image:: https://pepy.tech/badge/pyprobables\n    :target: https://pepy.tech/project/pyprobables\n    :alt: Downloads\n\n**pyprobables** is a pure-python library for probabilistic data structures.\nThe goal is to provide the developer with a pure-python implementation of\ncommon probabilistic data-structures to use in their work.\n\nTo achieve better raw performance, it is recommended supplying an alternative\nhashing algorithm that has been compiled in C. This could include using the\nmd5 and sha512 algorithms provided or installing a third party package and\nwriting your own hashing strategy. Some options include the murmur hash\n`mmh3 \u003chttps://github.com/hajimes/mmh3\u003e`__ or those from the\n`pyhash \u003chttps://github.com/flier/pyfasthash\u003e`__ library. Each data object in\n**pyprobables** makes it easy to pass in a custom hashing function.\n\nRead more about how to use `Supplying a pre-defined, alternative hashing strategies`_\nor `Defining hashing function using the provided decorators`_.\n\nInstallation\n------------------\n\nPip Installation:\n\n::\n\n    $ pip install pyprobables\n\nTo install from source:\n\nTo install `pyprobables`, simply clone the `repository on GitHub\n\u003chttps://github.com/barrust/pyprobables\u003e`__, then run from the folder:\n\n::\n\n    $ python setup.py install\n\n`pyprobables` supports python 3.6 - 3.11+\n\nFor *python 2.7* support, install `release 0.3.2 \u003chttps://github.com/barrust/pyprobables/releases/tag/v0.3.2\u003e`__\n\n::\n\n    $ pip install pyprobables==0.3.2\n\n\nAPI Documentation\n---------------------\n\nThe documentation of is hosted on\n`readthedocs.io \u003chttp://pyprobables.readthedocs.io/en/latest/code.html#api\u003e`__\n\nYou can build the documentation locally by running:\n\n::\n\n    $ pip install sphinx\n    $ cd docs/\n    $ make html\n\n\n\nAutomated Tests\n------------------\n\nTo run automated tests, one must simply run the following command from the\ndownloaded folder:\n\n::\n\n  $ python setup.py test\n\n\n\nQuickstart\n------------------\n\nImport pyprobables and setup a Bloom Filter\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    from probables import BloomFilter\n    blm = BloomFilter(est_elements=1000, false_positive_rate=0.05)\n    blm.add('google.com')\n    blm.check('facebook.com')  # should return False\n    blm.check('google.com')  # should return True\n\n\nImport pyprobables and setup a Count-Min Sketch\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    from probables import CountMinSketch\n    cms = CountMinSketch(width=1000, depth=5)\n    cms.add('google.com')  # should return 1\n    cms.add('facebook.com', 25)  # insert 25 at once; should return 25\n\n\nImport pyprobables and setup a Cuckoo Filter\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    from probables import CuckooFilter\n    cko = CuckooFilter(capacity=100, max_swaps=10)\n    cko.add('google.com')\n    cko.check('facebook.com')  # should return False\n    cko.check('google.com')  # should return True\n\n\nImport pyprobables and setup a Quotient Filter\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    from probables import QuotientFilter\n    qf = QuotientFilter(quotient=24)\n    qf.add('google.com')\n    qf.check('facebook.com')  # should return False\n    qf.check('google.com')  # should return True\n\n\nSupplying a pre-defined, alternative hashing strategies\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    from probables import BloomFilter\n    from probables.hashes import default_sha256\n    blm = BloomFilter(est_elements=1000, false_positive_rate=0.05,\n                      hash_function=default_sha256)\n    blm.add('google.com')\n    blm.check('facebook.com')  # should return False\n    blm.check('google.com')  # should return True\n\n\n.. _use-custom-hashing-strategies:\n\nDefining hashing function using the provided decorators\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n    import mmh3  # murmur hash 3 implementation (pip install mmh3)\n    from probables.hashes import hash_with_depth_bytes\n    from probables import BloomFilter\n\n    @hash_with_depth_bytes\n    def my_hash(key, depth):\n        return mmh3.hash_bytes(key, seed=depth)\n\n    blm = BloomFilter(est_elements=1000, false_positive_rate=0.05, hash_function=my_hash)\n\n.. code:: python\n\n    import hashlib\n    from probables.hashes import hash_with_depth_int\n    from probables.constants import UINT64_T_MAX\n    from probables import BloomFilter\n\n    @hash_with_depth_int\n    def my_hash(key, seed=0, encoding=\"utf-8\"):\n        max64mod = UINT64_T_MAX + 1\n        val = int(hashlib.sha512(key.encode(encoding)).hexdigest(), 16)\n        val += seed  # not a good example, but uses the seed value\n        return val % max64mod\n\n    blm = BloomFilter(est_elements=1000, false_positive_rate=0.05, hash_function=my_hash)\n\n\nSee the `API documentation \u003chttp://pyprobables.readthedocs.io/en/latest/code.html#api\u003e`__\nfor other data structures available and the\n`quickstart page \u003chttp://pyprobables.readthedocs.io/en/latest/quickstart.html#quickstart\u003e`__\nfor more examples!\n\n\nChangelog\n------------------\n\nPlease see the `changelog\n\u003chttps://github.com/barrust/pyprobables/blob/master/CHANGELOG.md\u003e`__ for a list\nof all changes.\n\n\nBackward Compatible Changes\n---------------------------\n\nIf you are using previously exported probablistic data structures (v0.4.1 or below)\nand used the default hashing strategy, you will want to use the following code\nto mimic the original default hashing algorithm.\n\n.. code:: python\n\n    from probables import BloomFilter\n    from probables.hashes import hash_with_depth_int\n\n    @hash_with_depth_int\n    def old_fnv1a(key, depth=1):\n        return tmp_fnv_1a(key)\n\n    def tmp_fnv_1a(key):\n        max64mod = UINT64_T_MAX + 1\n        hval = 14695981039346656073\n        fnv_64_prime = 1099511628211\n        tmp = map(ord, key)\n        for t_str in tmp:\n            hval ^= t_str\n            hval *= fnv_64_prime\n            hval %= max64mod\n        return hval\n\n    blm = BloomFilter(filpath=\"old-file-path.blm\", hash_function=old_fnv1a)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarrust%2Fpyprobables","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbarrust%2Fpyprobables","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarrust%2Fpyprobables/lists"}