{"id":25959761,"url":"https://github.com/ajalab/fm-index","last_synced_at":"2025-03-04T18:48:26.831Z","repository":{"id":57631175,"uuid":"213405623","full_name":"ajalab/fm-index","owner":"ajalab","description":"FM-Index for Rust","archived":false,"fork":false,"pushed_at":"2025-02-23T06:05:08.000Z","size":326,"stargazers_count":9,"open_issues_count":21,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-23T07:17:55.686Z","etag":null,"topics":["fm-index","rust","succinct-data-structure"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajalab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-07T14:28:56.000Z","updated_at":"2025-02-16T14:57:47.000Z","dependencies_parsed_at":"2025-01-14T02:26:04.637Z","dependency_job_id":"af17c385-008d-41b4-9816-2966c8314d40","html_url":"https://github.com/ajalab/fm-index","commit_stats":{"total_commits":76,"total_committers":1,"mean_commits":76.0,"dds":0.0,"last_synced_commit":"249294233135c2a7e3476a8dda3743a6c2afe3d3"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajalab%2Ffm-index","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajalab%2Ffm-index/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajalab%2Ffm-index/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajalab%2Ffm-index/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajalab","download_url":"https://codeload.github.com/ajalab/fm-index/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241904719,"owners_count":20040021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fm-index","rust","succinct-data-structure"],"created_at":"2025-03-04T18:48:26.178Z","updated_at":"2025-03-04T18:48:26.823Z","avatar_url":"https://github.com/ajalab.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fm-index\n\n[![Crate](https://img.shields.io/crates/v/fm-index.svg)](https://crates.io/crates/fm-index)\n[![Doc](https://docs.rs/fm-index/badge.svg)](https://docs.rs/fm-index)\n\nThis crate provides implementations of\n[FM-Index](https://en.wikipedia.org/wiki/FM-index) and its variants.\n\nFM-Index, originally proposed by Paolo Ferragina and Giovanni Manzini [^1],\nis a compressed full-text index which supports the following queries:\n\n- `count`: Given a pattern string, counts the number of its occurrences.\n- `locate`: Given a pattern string, lists the all positions of its occurrences.\n- `extract`: Given an integer, gets the character of the text at that position.\n\nThe `fm-index` crate does not support the third query (extracting a\ncharacter from arbitrary position). Instead, it provides backward/forward\niterators that return the text characters starting from a search result.\n\n## Usage\n\nAdd this to your `Cargo.toml`.\n\n```toml\n[dependencies]\nfm-index = \"0.2\"\n```\n\n## Example\n```rust\nuse fm_index::converter::RangeConverter;\nuse fm_index::FMIndex;\n\n// Prepare a text string to search for patterns.\nlet text = concat!(\n    \"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\",\n    \"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\",\n    \"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.\",\n    \"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\",\n).as_bytes().to_vec();\n\n// Converter converts each character into packed representation.\n// `' '` ~ `'~'` represents a range of ASCII printable characters.\nlet converter = RangeConverter::new(b' ', b'~');\n\nlet index = SearchIndexBuilder::with_converter(converter)\n    // the sampling level determines how much is retained in order to support `locate`\n    // queries. `0` retains the full information, but we don't need the whole array\n    // since we can interpolate missing elements in a suffix array from others. A sampler\n    // will _sieve_ a suffix array for this purpose. If you don't need `locate` queries\n    // you can save the memory by not setting a sampling level. \n    .sampling_leveL(2)\n   .build(text);\n\n// Search for a pattern string.\nlet pattern = \"dolor\";\nlet search = index.search_backward(pattern);\n\n// Count the number of occurrences.\nlet n = search.count();\nassert_eq!(n, 4);\n\n// List the position of all occurrences.\nlet positions = search.locate();\nassert_eq!(positions, vec![246, 12, 300, 103]);\n\n// Extract preceding characters from a search position.\nlet i = 0;\nlet mut prefix = search.iter_backward(i).take(16).collect::\u003cVec\u003cu8\u003e\u003e();\nprefix.reverse();\nassert_eq!(prefix, b\"Duis aute irure \".to_owned());\n\n// Extract succeeding characters from a search position.\nlet i = 3;\nlet postfix = search.iter_forward(i).take(20).collect::\u003cVec\u003cu8\u003e\u003e();\nassert_eq!(postfix, b\"dolore magna aliqua.\".to_owned());\n\n// Search can be chained backward.\nlet search_chained = search.search_backward(\"et \");\nassert_eq!(search_chained.count(), 1);\n```\n\n## Implementations\n\n### FM-Index\n\nThe implementation is based on [^1].The index is constructed with a suffix\narray generated by SA-IS [^3] in _O(n)_ time, where _n_ is the size of a text\n string.\n\nBasically it consists of\n\n- a Burrows-Wheeler transform (BWT) of the text string represented with\n  _wavelet matrix_ [^4]\n- an array of size _O(σ)_ (_σ_: number of characters) which stores the number\n  of characters smaller than a given character\n- a (sampled) suffix array\n\n### Run-Length FM-Index\n\nBased on [^2]. The index is constructed with a suffix array generated by SA-IS\n[^3].\n\nIt consists of\n\n- a wavelet matrix that stores the run heads of BWT of the text string\n- a succinct bit vector which stores the run lengths of BWT of the text string\n- a succinct bit vector which stores the run lengths of BWT of the text string\n  sorted in alphabetical order of corresponding run heads\n- an array of size _O(σ)_ (_σ_: number of characters) which stores the number\n  of characters smaller than a given character in run heads\n\n## Reference\n\n[^1]: Ferragina, P., \u0026 Manzini, G. (2000). Opportunistic data structures with\napplications. Annual Symposium on Foundations of Computer Science \\- Proceedings, 390–398. \u003chttps://doi.org/10.1109/sfcs.2000.892127\u003e\n\n[^2]: Mäkinen, V., \u0026 Navarro, G. (2005). Succinct suffix arrays based on\nrun-length encoding. In Lecture Notes in Computer Science (Vol. 3537).\n\u003chttps://doi.org/10.1007/11496656_5\u003e\n\n[^3]: Ge Nong, Sen Zhang, \u0026 Wai Hong Chan. (2010). Two Efficient Algorithms for\nLinear Time Suffix Array Construction. IEEE Transactions on Computers, 60(10),\n1471–1484. \u003chttps://doi.org/10.1109/tc.2010.188\u003e\n\n[^4]: Claude F., Navarro G. (2012). The Wavelet Matrix. In: Calderón-Benavides\nL., González-Caro C., Chávez E., Ziviani N. (eds) String Processing and\nInformation Retrieval. SPIRE 2012. \u003chttps://doi.org/10.1007/978-3-642-34109-0_18\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajalab%2Ffm-index","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajalab%2Ffm-index","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajalab%2Ffm-index/lists"}