{"id":15650028,"url":"https://github.com/barrust/count-min-sketch","last_synced_at":"2025-04-30T17:09:11.733Z","repository":{"id":46554809,"uuid":"92107497","full_name":"barrust/count-min-sketch","owner":"barrust","description":"Count-Min Sketch Implementation in C","archived":false,"fork":false,"pushed_at":"2024-02-04T13:55:56.000Z","size":65,"stargazers_count":47,"open_issues_count":2,"forks_count":16,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-30T17:09:00.023Z","etag":null,"topics":["c","count-mean-min-sketch","count-min-sketch","data-structures","probabilistic","probabilistic-programming"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/barrust.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-22T23:15:34.000Z","updated_at":"2025-03-11T10:32:17.000Z","dependencies_parsed_at":"2024-10-23T01:48:22.279Z","dependency_job_id":"20c37f78-03ef-4892-aa99-2daa52f31849","html_url":"https://github.com/barrust/count-min-sketch","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fcount-min-sketch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fcount-min-sketch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fcount-min-sketch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barrust%2Fcount-min-sketch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/barrust","download_url":"https://codeload.github.com/barrust/count-min-sketch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251748950,"owners_count":21637418,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","count-mean-min-sketch","count-min-sketch","data-structures","probabilistic","probabilistic-programming"],"created_at":"2024-10-03T12:33:02.989Z","updated_at":"2025-04-30T17:09:11.328Z","avatar_url":"https://github.com/barrust.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# count-min-sketch\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![GitHub release](https://img.shields.io/github/v/release/barrust/count-min-sketch.svg)](https://github.com/barrust/count-min-sketch/releases)\n[![C/C++ CI](https://github.com/barrust/count-min-sketch/workflows/C/C++%20CI/badge.svg?branch=master)](https://github.com/barrust/count-min-sketch/actions)\n[![codecov](https://codecov.io/gh/barrust/count-min-sketch/branch/master/graph/badge.svg)](https://codecov.io/gh/barrust/count-min-sketch)\n\n\nA Count-Min Sketch implementation in **C**.\n\nCount-Min Sketch is a probabilistic data-structure that takes sub linear space\nto store the probable count, or frequency, of occurrences of elements added\ninto the data-structure. Due to the structure and strategy of storing elements,\nit is possible that elements are over counted but not under counted.\n\nTo use the library, copy the `src/count_min_sketch.h` and\n`src/count_min_sketch.c` files into your project and include it where needed.\n\n## License:\nMIT 2017\n\n# Point Query Strategies\nTo generic method to query the count-min sketch for the number of times an\nelement was inserted is to return the minimum value from each row in the\ndata-structure. This is the maximum number of times that it may have been\ninserted, but there is a defined bias. This number is always greater than or\nequal to the actual value but ***never*** lower.\n\nTo help account for this bias, there are two other methods of querying the\ndata. One is to use the mean of the results. This will result in larger answers,\nbut is useful when elements can be removed from the count-min sketch.\n\nThe other option is to use the count-mean-min query strategy. This strategy\nattempts to remove the bias by taking the median value from the results of the\nfollowing calculation of each row (where `i` is the bin result of the hash):\n`bin[i] - ((number-elements - bin[i]) / (width - 1))`\n\nFor a good description of different uses and methods of the count-min sketch,\nread [this link](https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/).\n\nFor a **python version**, please check out [pyprobables](https://github.com/barrust/pyprobables)\nwhich has a binary compatible output.\n\n\n## Main Features\n* Ability to add and remove elements from the Count-Min Sketch\n    * Increment or add `x` elements at once\n    * Decrement or remove `x` elements at once\n* Ability to lookup elements in the data-structure\n* Add, remove, or lookup elements based on pre-calculated hashes\n* Ability to set depth \u0026 width or have the library calculate them based on\nerror and confidence\n* Multiple lookup types:\n    * ***Minimum:*** largest possible number of insertions by taking the\n    maximum result\n    * ***Mean:*** good for when removes and negatives are possible, but\n    increases the false count\n    * ***Mean-Min*** attempts to take bias into account; results are less\n    skewed upwards compared to the mean lookup\n* Export and Import count-min sketch to file\n* Ability to merge multiple count-min sketches together\n\n## Future Enhancements\n* add method to calculate the possible bias (?)\n* add do everything directly on disk (?)\n* add import / export to hex (?)\n\n## Usage:\n``` c\n#include \u003cstdio.h\u003e\n#include \"count_min_sketch.h\"\n\nCountMinSketch cms;\ncms_init(\u0026cms, 10000, 7);\n\nint i, res;\nfor (i = 0; i \u003c 10; i++) {\n    res = cms_add(\u0026cms, \"this is a test\");\n}\n\nres = cms_check(\u0026cms, \"this is a test\");\nif (res != 10) {\n    printf(\"Error with lookup: %d\\n\", res);\n}\ncms_destroy(\u0026cms);\n```\n\n\n## Required Compile Flags\n-lm\n\n\n## Backward Compatible Hash Function\nTo use the older count-min sketch (v0.1.8 or lower) that utilized the default hashing\nalgorithm, then change use the following code as the hash function:\n\n``` c\n/* NOTE: The caller will free the results */\nstatic uint64_t* original_default_hash(unsigned int num_hashes, const char* str) {\n    uint64_t *results = (uint64_t*)calloc(num_hashes, sizeof(uint64_t));\n    char key[17] = {0}; // largest value is 7FFF,FFFF,FFFF,FFFF\n    results[0] = __fnv_1a(str);\n    for (unsigned int i = 1; i \u003c num_hashes; ++i) {\n        sprintf(key, \"%\" PRIx64 \"\", results[i-1]);\n        results[i] = old_fnv_1a(key);\n    }\n    return results;\n}\n\nstatic uint64_t old_fnv_1a(const char* key) {\n    // FNV-1a hash (http://www.isthe.com/chongo/tech/comp/fnv/)\n    int i, len = strlen(key);\n    uint64_t h = 14695981039346656073ULL; // FNV_OFFSET 64 bit\n    for (i = 0; i \u003c len; ++i){\n            h = h ^ (unsigned char) key[i];\n            h = h * 1099511628211ULL; // FNV_PRIME 64 bit\n    }\n    return h;\n}\n```\n\nIf using only older count-min sketch, then you can update the // FNV_OFFSET 64 bit\nto use `14695981039346656073ULL`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarrust%2Fcount-min-sketch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbarrust%2Fcount-min-sketch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarrust%2Fcount-min-sketch/lists"}