{"id":13700212,"url":"https://github.com/ikegami-yukino/madoka-python","last_synced_at":"2025-07-04T00:38:56.022Z","repository":{"id":11132985,"uuid":"13496280","full_name":"ikegami-yukino/madoka-python","owner":"ikegami-yukino","description":"Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)","archived":false,"fork":false,"pushed_at":"2019-02-10T18:35:03.000Z","size":237,"stargazers_count":26,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-05T22:42:11.370Z","etag":null,"topics":["counter","data-sketches","memory-efficient","probabilistic-data-structures","python-wrapper"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ikegami-yukino.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-11T10:31:05.000Z","updated_at":"2025-01-14T09:01:41.000Z","dependencies_parsed_at":"2022-09-01T20:40:56.396Z","dependency_job_id":null,"html_url":"https://github.com/ikegami-yukino/madoka-python","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ikegami-yukino/madoka-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikegami-yukino%2Fmadoka-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikegami-yukino%2Fmadoka-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikegami-yukino%2Fmadoka-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikegami-yukino%2Fmadoka-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ikegami-yukino","download_url":"https://codeload.github.com/ikegami-yukino/madoka-python/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikegami-yukino%2Fmadoka-python/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259379926,"owners_count":22848785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["counter","data-sketches","memory-efficient","probabilistic-data-structures","python-wrapper"],"created_at":"2024-08-02T20:00:50.458Z","updated_at":"2025-07-04T00:38:56.002Z","avatar_url":"https://github.com/ikegami-yukino.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"madoka\n===========\n\n|travis| |coveralls| |pyversion| |version| |license|\n\nMadoka is an implementation of a Count-Min sketch data structure for summarizing data streams.\n\nString-int pairs in a Madoka-Sketch may take less memory than in a standard Python dict, Counter, Redis.\n\nCounting error rate is about 0.0911 %\n\nMore details are described in `Benchmark.ipynb`_\n\n.. _Benchmark.ipynb: https://github.com/ikegami-yukino/madoka-python/blob/master/Benchmark.ipynb\n\nThis module is based on `madoka`_ C++ library.\n\n.. _madoka: https://github.com/s-yata/madoka\n\nNOTE: Madoka-Sketch does not have index of keys. so Madoka-Sketch can not dump all keys such as Python dict's `dict.keys()`. However, when set `k` parameter to costructer, `most_common` method (returns key and value as many as `k`) is available.\n\nContributions are welcome!\n\nInstallation\n============\n\n::\n\n $ pip install madoka\n\nClass\n============\n\nMadoka has some classes having same interface. These classes are vary in value data type. So you can choose for your purpose.\n\nFor example, if you wants to count float data, it's preferable to choose CroquisFloat class or CroquisDouble class.\n\n- Sketch\n  - storing unsigned long long (64bit) and fast implementation\n- CroquisFloat\n  - storing float (32bit)\n- CroquisDouble\n  - storing double (64bit)\n- CroquisUint8\n  - storing unsigned char (8bit)\n- CroquisUint16\n  - storing unsigned short (16bit)\n- CroquisUint32\n  - storing unsigned int (32bit)\n- CroquisUint64\n  - storing unsigned long long (64bit)\n\nUsage\n=====\n\nFrom here, I will describe about Sketch class.\nBut, Croquis classes have also same interfaces mostly.\nSo you can use other classes by the same way as Sketch class.\nIn that case, you should replace to intended class from \"Sketch\".\n\n\nCreate a new sketch\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e import madoka\n \u003e\u003e\u003e sketch = madoka.Sketch()\n\n- Sketch madoka.Sketch([width=1048576, max_value=35184372088831, path='', flags=0, seed=0, k=5])\n\n  - `width` is a size of register. If you are worrying about gap, you should increase `width` value. The larger `width` is, the fewer mistakes madoka makes in estimating value. But, the larger `width` is, the larger memory consumption is.\n\n  - Permission of `path` should be 644\n\n  - `k` means Top-K used by `most_common` method. if you don't want to use `most_common` method, then I recommend to set `k=0` so it is slightly fast.\n\n  - `madoka.Sketch()` calls `madoka.Sketch.create()`, so you don't have to explicitly call `create()` in initialization\n\n\nIncrement a key value\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch['mami'] += 1\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.inc('mami')\n\n\n- int inc(key[, key_length=0])\n\n  - Note that `key_length` is automatically determined when not giving `key_length`. Thus, the order of parameters differs from original madoka C++ library.\n\n\nAdd a value to the current key value\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch['mami'] += 6\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.add('mami', 6)\n\n\n- int add(key, value[, key_length=0])\n\n  - Note that `key_length` is automatically determined when not giving `key_length`. Thus, the order of parameters differs from original madoka C++ library.\n\n\n\nUpdate a key value\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch['mami'] = 6\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.set('mami', 6)\n\n\n- void set(key, value[, key_length=0])\n\n  * Note that `set()` does nothing when the given value is not greater than the current key value.\n\n  * Also note that the new value is saturated when the given value is greater than the upper limit.\n\n  * Additionally note that `key_length` is automatically determined when not giving `key_length`. Thus, the order of parameters differs from original madoka C++ library.\n\n\nGet a key value\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch['mami']\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.get('mami')\n\n\n- int get(key[, key_length=0])\n\n  - Note that `key_length` is automatically determined when not giving `key_length`. Thus, the order of parameters differs from original madoka C++ library.\n\nGet all values\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.values()\n\n\n- generator\u003cint\u003e values()\n\n  - Note that processing time increases according to sketch's width. But this method may be slow, so I recommend setting width to less than 1000000 when creating sketch.\n\nSave a sketch to a file\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.save('example.madoka')\n\n- void save(path)\n\n  - Permission of `path` should be 644\n\nLoad a sketch from a file\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.load('example.madoka')\n\n- void load(path)\n\n  - Permission of `path` should be 644\n\nClear a sketch\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.clear()\n\n- void clear()\n\n  * Delete all key-value pairs. It differs from `create()` in maintaining current settings.\n\n\nInitialize a sketch with settings change\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.create()\n\n- void create([width=0, max_value=0, path=NULL, flags=0, seed=0])\n\n  - Permission of file given to `path` should be 644\n\nCopy a sketch\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.copy(othersketch)\n\n- void copy(Sketch)\n\n\nMerge two sketches\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch += other_sketch\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.merge(othersketch)\n\n- void merge(Sketch[, lhs_filter=None, rhs_filter=None])\n\n  - lhs_filter is applied for self.sketch, rhs_filter is applied for given sketch\n\n\nShrink a sketch\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.shrink(sketch, width=1000)\n\n- void shrink(Sketch[, width=0, max_value=0, filter=None, path=None, flags=0])\n\n  - When width \u003e 0, width must be less than source sketch\n\n  - Permission of `path` should be 644\n\n\nGet summed sketch\n-----------------------\n\n.. code:: python\n\n \u003e\u003e\u003e summed_sketch = sketch + other_sketch\n\n- Create summed sketch, So it does not break original sketches\n\nGet summed sketch by dict\n--------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e summed_sketch = sketch + {'mami': 1, 'kyoko': 2}\n\n- Create summed sketch, So it does not break original sketches\n\n\nCheck whether sketch contains key value\n-----------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e 'mami' in sketch\n\n\nGet inner product of two sketches\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.inner_product(other_sketch)\n\n- list\u003cfloat\u003e inner_product(Sketch)\n\n  - Returns [inner product, square length of left hands sketch (float), square length of right hands sketch (float)]\n\nGet median value\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch['madoka'] = 1\n \u003e\u003e\u003e sketch['mami'] = 2\n \u003e\u003e\u003e sketch['sayaka'] = 3\n \u003e\u003e\u003e sketch['kyouko'] = 4\n \u003e\u003e\u003e sketch['homura'] = 5\n \u003e\u003e\u003e sketch.median()  # =\u003e 3\n\n- int or float median()\n\nApply filter into all values\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.filter(lambda x: x + 1)\n\n- void filter(Callable[, apply_zerovalue=False])\n\n  - If apply_zerovalue = True, filter_method is applied also 0 values (It may be slow) (from version 0.6 or later)\n\n  - Note that processing time increases according to sketch's width. If you feel this method is slow, I recommend setting width to less than 1000000 when creating sketch\n\nSet values from dict\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.fromdict({'mami': 14, 'madoka': 13})\n\nor\n\n.. code:: python\n\n \u003e\u003e\u003e sketch += {'mami': 14, 'madoka': 13}\n\n\n- void fromdict(dict)\n\nGet most common keys\n--------------------------------------------\n\n.. code:: python\n\n \u003e\u003e\u003e sketch.most_common()\n\n- generator most_common([k=5])\n\n  - returns key-value pair as many as `k`\n\n  - Note that this method is required to set `k` parameter in constructer.\n\nLicense\n=========\n\n- Wrapper code is licensed under New BSD License.\n- Bundled `madoka`_ C++ library is licensed under the Simplified BSD License.\n\n\n.. |travis| image:: https://travis-ci.org/ikegami-yukino/madoka-python.svg?branch=master\n    :target: https://travis-ci.org/ikegami-yukino/madoka-python\n    :alt: travis-ci.org\n\n.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/madoka-python/badge.svg\n    :target: https://coveralls.io/r/ikegami-yukino/madoka-python\n    :alt: coveralls.io\n\n.. |pyversion| image:: https://img.shields.io/pypi/pyversions/madoka.svg\n\n.. |version| image:: https://img.shields.io/pypi/v/madoka.svg\n    :target: http://pypi.python.org/pypi/madoka/\n    :alt: latest version\n\n.. |license| image:: https://img.shields.io/pypi/l/madoka.svg\n    :target: http://pypi.python.org/pypi/madoka/\n    :alt: license\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikegami-yukino%2Fmadoka-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fikegami-yukino%2Fmadoka-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikegami-yukino%2Fmadoka-python/lists"}