{"id":48333920,"url":"https://github.com/imartayan/cbl","last_synced_at":"2026-04-05T01:36:42.218Z","repository":{"id":217947682,"uuid":"730137723","full_name":"imartayan/CBL","owner":"imartayan","description":"A Rust library providing fully dynamic sets of k-mers with high locality","archived":false,"fork":false,"pushed_at":"2026-02-12T12:35:41.000Z","size":242,"stargazers_count":47,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-05T01:36:40.124Z","etag":null,"topics":["bioinformatics","dynamic","indexing","k-mers","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imartayan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-12-11T09:36:38.000Z","updated_at":"2026-02-12T12:35:45.000Z","dependencies_parsed_at":"2024-01-18T22:58:19.042Z","dependency_job_id":"86fb47ab-2f1a-4ecd-adc7-abcccbf01c51","html_url":"https://github.com/imartayan/CBL","commit_stats":{"total_commits":176,"total_committers":1,"mean_commits":176.0,"dds":0.0,"last_synced_commit":"579eeb334615f066385de04b07792f766ce3efb4"},"previous_names":["imartayan/cbl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/imartayan/CBL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imartayan%2FCBL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imartayan%2FCBL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imartayan%2FCBL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imartayan%2FCBL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imartayan","download_url":"https://codeload.github.com/imartayan/CBL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imartayan%2FCBL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31421869,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"ssl_error","status_checked_at":"2026-04-05T00:25:05.923Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","dynamic","indexing","k-mers","rust"],"created_at":"2026-04-05T01:36:41.730Z","updated_at":"2026-04-05T01:36:42.211Z","avatar_url":"https://github.com/imartayan.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Conway-Bromage-Lyndon\n\nA Rust library providing fully dynamic sets of *k*-mers with high [locality](https://en.wikipedia.org/wiki/Locality_of_reference).\n\nThe data structure is described in [Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets](https://doi.org/10.1093/bioinformatics/btae217), please [cite it](#citation) if you use this library.\n\nIt supports the following operations:\n- inserting a single *k*-mer (with `insert`), or every *k*-mer from a sequence (with `insert_seq`)\n- deleting a single *k*-mer (with `remove`), or every *k*-mer from a sequence (with `remove_seq`)\n- membership of a single *k*-mer (with `contains`), or every *k*-mer from a sequence (with `contains_seq`)\n- iterating over the *k*-mers stored in the set (with `iter`)\n- union / intersection / difference of two sets (with `|` / `\u0026` / `-`)\n- (de)serialization with [serde](https://serde.rs/)\n\n## Requirements\n\n### Rust nightly 1.77+\n\nIf you have not installed Rust yet, please visit [rustup.rs](https://rustup.rs/) to install it.\nThis library uses some nightly features of the Rust compiler (version 1.77+), you can install the latest nightly version with\n```sh\nrustup install nightly\n```\n\nIf you don't want to use the `+nightly` flag every time you run `cargo`, you can set it as default with\n```sh\nrustup default nightly\n```\n\n### Additional headers for Linux\n\nThis library uses C++ bindings for the [sux](https://github.com/vigna/sux) library and [tiered vectors](https://github.com/mettienne/tiered-vector).\nDepending on your configuration, some headers used for the bindings might be missing, in that case please install the following packages:\n\n#### Ubuntu\n\n```sh\nsudo apt install -y libstdc++-12-dev libclang-dev\n```\n\n#### Fedora\n\n```sh\nsudo dnf install -y clang15-devel\n```\n\n## Using the library\n\nYou can add `CBL` in an existing Rust project with\n```sh\ncargo +nightly add --git https://github.com/imartayan/CBL.git\n```\nor by adding the following dependency in your `Cargo.toml`\n```toml\ncbl = { git = \"https://github.com/imartayan/CBL.git\" }\n```\nIf the build fails, try to install [additional headers](#additional-headers-for-linux).\n\n### Choosing the right parameters\n\nThe `CBL` struct takes two main parameters as constants:\n- an integer `K` specifying the size of the *k*-mers\n- an integer type `T` (e.g. `u32`, `u64`, `u128`) that must be large enough to store both a *k*-mer *and* its number of bits together\n\nTherefore `T` should be large enough to store $2k + \\lg(2k)$ bits.\nIn particular, since primitive integers cannot store more than 128 bits, this means that `K` must be ≤ 59.\n\nAdditionally, you can specify a third (optional) parameter `PREFIX_BITS` which determines the size of the underlying bitvector.\nChanging this parameter affects the space usage and the query time of the data structure, see the paper for more details.\n\n### Example usage\n\n```rs\nuse cbl::CBL;\nuse needletail::parse_fastx_file;\nuse std::env::args;\n\n// define the parameters K and T\nconst K: usize = 25;\ntype T = u64; // T must be large enough to store $2k + \\lg(2k)$ bits\n\nfn main() {\n    let args: Vec\u003cString\u003e = args().collect();\n    let input_filename = args.get(1).expect(\"No argument given\");\n\n    // create a CBL index with parameters K and T\n    let mut cbl = CBL::\u003cK, T\u003e::new();\n\n    let mut reader = parse_fastx_file(input_filename).unwrap();\n    // for each sequence of the FASTA/Q file\n    while let Some(record) = reader.next() {\n        let seqrec = record.expect(\"Invalid record\");\n\n        // insert each k-mer of the sequence in the index\n        cbl.insert_seq(\u0026seqrec.seq());\n    }\n}\n```\n\n## Building from source\n\nYou can clone the repository and its submodules with\n```sh\ngit clone --recursive https://github.com/imartayan/CBL.git\n```\n\nIf you did not use the `--recursive` flag, make sure to load the submodules with\n```sh\ngit submodule update --init --recursive\n```\n\n### Running the binaries\n\nYou can compile the binaries with\n```sh\ncargo +nightly build --release --examples\n```\nIf the build fails, try to install [additional headers](#additional-headers-for-linux).\n\nBy default, the binaries are compiled with a fixed `K` equal to 25, you can compile them with a different `K` as follows\n```sh\nK=59 cargo +nightly build --release --examples\n```\nNote that `K` values ≥ 60 are not supported by this library.\n\nSimilarly, `PREFIX_BITS` is equal to 24 by default and you can change it with\n```sh\nK=59 PREFIX_BITS=28 cargo +nightly build --release --examples\n```\nNote that `PREFIX_BITS` values ≥ 29 are not supported by this library.\n\nOnce compiled, the main binary will be located at `target/release/examples/cbl`.\nIt supports the following commands:\n```md\nUsage: cbl \u003cCOMMAND\u003e\n\nCommands:\n  build        Build an index containing the k-mers of a FASTA/Q file\n  count        Count the k-mers contained in an index\n  list         List the k-mers contained in an index\n  query        Query an index for every k-mer contained in a FASTA/Q file\n  insert       Add the k-mers of a FASTA/Q file to an index\n  remove       Remove the k-mers of a FASTA/Q file from an index\n  merge        Compute the union of two indexes\n  inter        Compute the intersection of two indexes\n  diff         Compute the difference of two indexes\n  sym-diff     Compute the symmetric difference of two indexes\n  repartition  Show the repartition of the k-mers in the data structure\n  help         Print this message or the help of the given subcommand(s)\n\nOptions:\n  -h, --help     Print help\n  -V, --version  Print version\n```\n\n### Running the tests\n\nYou can run all the tests with\n```sh\ncargo +nightly test --lib\n```\n\n### Building the documentation\n\nYou can build the documentation of the library and open it in your browser with\n```sh\ncargo +nightly doc --lib --no-deps --open\n```\n\n## Citation\n\n\u003e Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets. Martayan, I., Cazaux, B., Limasset, A., and Marchet, C. https://doi.org/10.1093/bioinformatics/btae217\n\n```bibtex\n@article{cbl,\n  title   = {{Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets}},\n  author  = {Martayan, Igor and Cazaux, Bastien and Limasset, Antoine and Marchet, Camille},\n  journal = {Bioinformatics},\n  volume  = {40},\n  number  = {Supplement_1},\n  pages   = {i48-i57},\n  year    = {2024},\n  month   = {06},\n  issn    = {1367-4811},\n  doi     = {10.1093/bioinformatics/btae217},\n  url     = {https://doi.org/10.1093/bioinformatics/btae217},\n  eprint  = {https://academic.oup.com/bioinformatics/article-pdf/40/Supplement\\_1/i48/58354678/btae217.pdf}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimartayan%2Fcbl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimartayan%2Fcbl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimartayan%2Fcbl/lists"}