{"id":20153552,"url":"https://github.com/seqan/hibf","last_synced_at":"2025-07-25T06:37:07.436Z","repository":{"id":92183681,"uuid":"599029859","full_name":"seqan/hibf","owner":"seqan","description":"HIBF and IBF","archived":false,"fork":false,"pushed_at":"2025-03-31T10:33:59.000Z","size":2211,"stargazers_count":4,"open_issues_count":10,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-31T11:31:21.559Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://docs.seqan.de/hibf","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seqan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-08T09:58:46.000Z","updated_at":"2025-03-31T10:34:02.000Z","dependencies_parsed_at":"2024-03-03T21:23:29.989Z","dependency_job_id":"8274fa0e-5317-4f1a-a026-22e651e65946","html_url":"https://github.com/seqan/hibf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":"seqan/library-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seqan%2Fhibf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seqan%2Fhibf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seqan%2Fhibf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seqan%2Fhibf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seqan","download_url":"https://codeload.github.com/seqan/hibf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248114961,"owners_count":21050145,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T23:19:41.602Z","updated_at":"2025-04-09T21:33:11.988Z","avatar_url":"https://github.com/seqan.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\nSPDX-FileCopyrightText: 2006-2025, Knut Reinert \u0026 Freie Universität Berlin\nSPDX-FileCopyrightText: 2016-2025, Knut Reinert \u0026 MPI für molekulare Genetik\nSPDX-License-Identifier: CC-BY-4.0\n--\u003e\n\n# HIBF\n\n[![build status][1]][2]\n[![codecov][3]][4]\n[![license][5]][6]\n![platforms][9]\n\u003c!-- [![latest release][7]][8] --\u003e\n\n\u003c!--\n    Above uses reference-style links with numbers.\n    See also https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#links.\n\n    For example, `[![build status][1]][2]` evaluates to the following:\n        `[link_text][2]`\n        `[2]` is a reference to a link, i.e. `[link_text](https://...)`\n\n        `[link_text]` = `[![build status][1]]`\n        `[1]` is once again a reference to a link - this time an image, i.e. `[![build status](https://...)]\n        `![build status]` is the text that should be displayed if the linked resource (`[1]`) is not available\n\n    `[![build status][1]][2]` hence means:\n    Show the picture linked under `[1]`. In case it cannot be displayed, show the text \"build status\" instead.\n    The picture, or alternative text, should link to `[2]`.\n--\u003e\n\n[1]: https://img.shields.io/github/actions/workflow/status/seqan/hibf/ci_linux.yml?branch=main\u0026style=flat\u0026logo=github\u0026label=CI \"Open GitHub actions page\"\n[2]: https://github.com/seqan/hibf/actions?query=branch%3Amain\n[3]: https://codecov.io/gh/seqan/hibf/branch/main/graph/badge.svg?token=BH1FQiBBle \"Open Codecov page\"\n[4]: https://codecov.io/gh/seqan/hibf\n[5]: https://img.shields.io/badge/license-BSD-green.svg \"Open Copyright page\"\n[6]: https://github.com/seqan/hibf/blob/main/LICENSE.md\n[7]: https://img.shields.io/github/release/seqan/hibf.svg \"Get the latest release\"\n[8]: https://github.com/seqan/hibf/releases/latest\n[9]: https://img.shields.io/badge/platform-linux%20%7C%20bsd%20%7C%20osx-informational.svg\n\nThis library contains the HIBF and layout algorithm.\n\n## Quick start\n\nTo use the HIBF lib in your app:\n\n```cmake\ninclude (FetchContent)\nFetchContent_Declare (\n    hibf_fetch_content\n    GIT_REPOSITORY \"https://github.com/seqan/hibf\"\n    GIT_TAG \"main\")\noption (INSTALL_HIBF \"\" OFF)\nFetchContent_MakeAvailable (hibf_fetch_content)\n\n# ...\n\ntarget_link_libraries (\u003cyour_app\u003e PUBLIC seqan::hibf)\n```\n\nA quick overview on how to use the HIBF lib:\n\n\u003c!-- MARKDOWN-AUTO-DOCS:START (CODE:src=./test/snippet/readme.cpp) --\u003e\n\u003c!-- The below code snippet is automatically added from ./test/snippet/readme.cpp --\u003e\n```cpp\n// SPDX-FileCopyrightText: 2006-2025, Knut Reinert \u0026 Freie Universität Berlin\n// SPDX-FileCopyrightText: 2016-2025, Knut Reinert \u0026 MPI für molekulare Genetik\n// SPDX-License-Identifier: CC0-1.0\n\n#include \u003ccstddef\u003e    // for size_t\n#include \u003ccstdint\u003e    // for uint64_t\n#include \u003cfunctional\u003e // for function\n#include \u003ciostream\u003e   // for basic_ostream, operator\u003c\u003c, cout\n#include \u003cranges\u003e     // for __fn, iota, views\n#include \u003cvector\u003e     // for vector\n\n#include \u003chibf/config.hpp\u003e                                // for insert_iterator, config\n#include \u003chibf/hierarchical_interleaved_bloom_filter.hpp\u003e // for hierarchical_interleaved_bloom_filter\n\nint main()\n{\n    // Let's say we have groups that have data that we find interesting.\n    // For example, each file of the RefSeq data set could be such a group.\n    // In the context of the HIBF, we call such groups user bins.\n\n    // Given a query, we want to quickly determine which user bins this query is likely to occur in.\n    // This is also called Approximate Membership Query (AMQ).\n\n    // In this example, we have three user bins. Each of these user bins is characterized by a range of\n    // unsigned integer values. Some popular techniques for obtaining such unsigned integers from\n    // biological sequences include k-mers, minimisers, and syncmers.\n\n    // For clarity, we show each user bin individually before copying them to user_bin_data.\n    std::vector\u003cuint64_t\u003e user_bin_1{1u, 2u, 3u, 4u, 5u, 6u, 7u, 8u, 9u, 10u};\n    std::vector\u003cuint64_t\u003e user_bin_2{1u, 2u, 3u, 4u, 5u};\n    std::vector\u003cuint64_t\u003e user_bin_3{3u, 9u, 11u};\n    std::vector\u003cstd::vector\u003cuint64_t\u003e\u003e user_bin_data{user_bin_1, user_bin_2, user_bin_3};\n\n    // The HIBF uses a config. There are two required options:\n    // 1) The number of user bins: 3 (user_bin_data.size())\n    // 2) A function to access the input data.\n    //    The signature is (size_t const user_bin_id, seqan::hibf::insert_iterator it). You need to\n    //    provide the function body, and the hibf lib will use this function to access the data of each\n    //    user bin. When this function is called by the library with a specific user_bin_id, all\n    //    unsigned integer values (data) belonging to this user bin have to be assigned to the\n    //    seqan::hibf::insert_iterator.\n    //    Conveniently, this function can be a lambda, and hence capture data outside the function body.\n    auto get_user_bin_data = [\u0026](size_t const user_bin_id, seqan::hibf::insert_iterator it)\n    {\n        for (auto value : user_bin_data[user_bin_id])\n            it = value;\n    };\n\n    // Now we can construct a config, any other settings are optional. We have included some interesting\n    // settings with their respective default values here.\n    seqan::hibf::config config{.input_fn = get_user_bin_data, // required\n                               .number_of_user_bins = 3u,     // required\n                               .number_of_hash_functions = 2u,\n                               .maximum_fpr = 0.05,\n                               .threads = 1u};\n\n    // The HIBF constructor will determine a hierarchical layout for the user bins and build the filter.\n    seqan::hibf::hierarchical_interleaved_bloom_filter hibf{config};\n\n    // Now we can search for some query.\n    std::vector\u003cuint64_t\u003e query1{3u, 9u, 12u, 14u};\n\n    // For this, we use the membership agent of the HIBF. This agent only needs to be created once and\n    // can be reused for multiple subsequent queries.\n    // If you are using multiple threads in your app, each thread should have its own membership agent.\n    auto agent = hibf.membership_agent();\n\n    // The membership_for function takes the query and a threshold. Here, a threshold of two means that\n    // at least (\u003e=) 2 values of the query must be found within a user bin to be a hit.\n    // While exact thresholds can be obtained for some approaches such as k-mers, another popular\n    // approach is to require at least x% of the values in the query to hit.\n    // For example, a threshold of 2 equals 40% of the values in query1 (5 values).\n    // This threshold needs to be provided by the user. In general, some care should be taken with the\n    // threshold. A low threshold requires a traversal of more parts of the hierarchy and slows down\n    // the search.\n    // Note that we bind the result with a `\u0026` to avoid copies!\n    auto \u0026 result1 = agent.membership_for(query1, 2u);\n\n    // query1 hits in user_bin_1 and user_bin_3, which have the IDs 0 and 2, respectively.\n    for (uint64_t hit_user_bin : result1)\n        std::cout \u003c\u003c hit_user_bin \u003c\u003c ' '; // The results are not sorted: 2 0\n    std::cout \u003c\u003c '\\n';\n\n    // Another query.\n    // A query is simply a range of unsigned integer values, e.g., it does not have to be a vector.\n    auto query2 = std::views::iota(0u, 15u); // 0,1,2,...,14\n    auto \u0026 result2 = agent.membership_for(query2, 5u);\n    agent.sort_results(); // Sort the results.\n\n    // query2 hits in user_bin_1 and user_bin_2, which have the IDs 0 and 1, respectively.\n    for (uint64_t hit_user_bin : result2)\n        std::cout \u003c\u003c hit_user_bin \u003c\u003c ' '; // The results are sorted: 0 1\n    std::cout \u003c\u003c '\\n';\n}\n```\n\u003c!-- The below code snippet is automatically added from ./test/snippet/readme.cpp --\u003e\n\u003c!-- MARKDOWN-AUTO-DOCS:END --\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseqan%2Fhibf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseqan%2Fhibf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseqan%2Fhibf/lists"}