{"id":13440079,"url":"https://github.com/BurntSushi/suffix","last_synced_at":"2025-03-20T09:31:43.851Z","repository":{"id":42568877,"uuid":"28555038","full_name":"BurntSushi/suffix","owner":"BurntSushi","description":"Fast suffix arrays for Rust (with Unicode support).","archived":false,"fork":false,"pushed_at":"2023-10-10T13:44:22.000Z","size":207,"stargazers_count":268,"open_issues_count":4,"forks_count":30,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-19T10:59:07.432Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BurntSushi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2014-12-28T06:01:52.000Z","updated_at":"2025-03-11T17:32:34.000Z","dependencies_parsed_at":"2022-08-27T03:50:32.615Z","dependency_job_id":"b0ec4e33-a6a3-4553-bfd6-edb3b941fab4","html_url":"https://github.com/BurntSushi/suffix","commit_stats":{"total_commits":160,"total_committers":5,"mean_commits":32.0,"dds":0.03749999999999998,"last_synced_commit":"c9f982ee28181f94a1b2b6b64a843585e7e624e1"},"previous_names":[],"tags_count":56,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fsuffix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fsuffix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fsuffix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BurntSushi%2Fsuffix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BurntSushi","download_url":"https://codeload.github.com/BurntSushi/suffix/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244585803,"owners_count":20476815,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:19.606Z","updated_at":"2025-03-20T09:31:43.811Z","avatar_url":"https://github.com/BurntSushi.png","language":"Rust","funding_links":[],"categories":["Libraries","库 Libraries","库"],"sub_categories":["Text processing","文本处理 Text processing","文本处理"],"readme":"suffix\n======\nFast linear time \u0026 space suffix arrays for Rust. Supports Unicode!\n\n[![Build status](https://github.com/BurntSushi/suffix/workflows/ci/badge.svg)](https://github.com/BurntSushi/suffix/actions)\n[![](http://meritbadge.herokuapp.com/suffix)](https://crates.io/crates/suffix)\n\nDual-licensed under MIT or the [UNLICENSE](http://unlicense.org).\n\n\n### Documentation\n\nhttps://docs.rs/suffix\n\nIf you just want the details on how construction algorithm used, see the\ndocumentation for the `SuffixTable` type. This is where you'll find info on\nexactly how much overhead is required.\n\n\n### Installation\n\nThis crate works with Cargo and is on\n[crates.io](https://crates.io/crates/suffix). The package is regularly updated.\nAdd it to your `Cargo.toml` like so:\n\n```toml\n[dependencies]\nsuffix = \"1.2\"\n```\n\n\n### Examples\n\nUsage is simple. Just create a suffix array and search:\n\n```rust\nuse suffix::SuffixTable;\n\nfn main() {\n  let st = SuffixTable::new(\"the quick brown fox was quick.\");\n  assert_eq!(st.positions(\"quick\"), \u0026[4, 24]);\n}\n```\n\nThere is also a command line program, `stree`, that can be used to visualize\nsuffix trees:\n\n```bash\ngit clone git://github.com/BurntSushi/suffix\ncd suffix/stree_cmd\ncargo build --release\n./target/release/stree \"banana\" | dot -Tpng | xv -\n```\n\nAnd here's what it looks like:\n\n![\"banana\" suffix tree](http://burntsushi.net/stuff/banana.png)\n\n\n### Status of implementation\n\nThe big thing missing at the moment is a generalized suffix array. I started\nout with the intention to build them into the construction algorithm, but this\nhas proved more difficult than I thought.\n\nA kind-of-sort-of compromise is to append your distinct texts together, and\nseparate them with a character that doesn't appear in your document. (This is\ntechnically incorrect, but maybe your documents don't contain any `NUL`\ncharacters.) During construction of this one giant string, you should record\nthe offsets of where each document starts and stops. Then build a `SuffixTable`\nwith your giant string. After searching with the `SuffixTable`, you can find\nthe original document by doing a binary search on your list of documents.\n\nI'm currently experimenting with different techniques to do this.\n\n\n### Benchmarks\n\nHere are some very rough benchmarks that compare suffix table searching with\nsearching in the using standard library functions. Note that these benchmarks\nexplicitly do not include the construction of the suffix table. The premise of\na suffix table is that you can afford to do that once---but you hope to gain\nmuch faster queries once you do.\n\n```\ntest search_scan_exists_many            ... bench:       2,964 ns/iter (+/- 180)\ntest search_scan_exists_one             ... bench:          19 ns/iter (+/- 1)\ntest search_scan_not_exists             ... bench:      84,645 ns/iter (+/- 3,558)\ntest search_suffix_exists_many          ... bench:         228 ns/iter (+/- 65)\ntest search_suffix_exists_many_contains ... bench:         102 ns/iter (+/- 10)\ntest search_suffix_exists_one           ... bench:         162 ns/iter (+/- 13)\ntest search_suffix_exists_one_contains  ... bench:           8 ns/iter (+/- 0)\ntest search_suffix_not_exists           ... bench:         177 ns/iter (+/- 21)\ntest search_suffix_not_exists_contains  ... bench:          50 ns/iter (+/- 6)\n```\n\nThe \"many\" benchmarks test repeated queries that match. The \"one\" benchmarks\ntest a single query that matches. The \"not_exists\" benchmarks test a single\nquery that does *not* match. Finally, the \"contains\" benchmark test existence\nrather finding all positions.\n\nOne thing you might take away from here is that you'll get a very large\nperformance boost if many of your queries don't match. A linear scan takes a\nlong time to fail!\n\nAnd here are some completely useless benchmarks on suffix array construction.\nThey compare the linear time algorithm with the naive construction algorithm\n(call `sort` on all suffixes, which is `O(n^2 * logn)`).\n\n```\ntest naive_dna_medium                   ... bench:  22,307,313 ns/iter (+/- 939,557)\ntest naive_dna_small                    ... bench:   1,785,734 ns/iter (+/- 43,401)\ntest naive_small                        ... bench:         228 ns/iter (+/- 10)\ntest sais_dna_medium                    ... bench:   7,514,327 ns/iter (+/- 280,544)\ntest sais_dna_small                     ... bench:     712,938 ns/iter (+/- 34,730)\ntest sais_small                         ... bench:       1,038 ns/iter (+/- 58)\n```\n\nThese benchmarks might make you say, \"Whoa, the special algorithm isn't that\nmuch faster.\" That's because the data just isn't big enough. And when it *is*\nbig enough, a micro benchmark is useless. Why? Because using the `naive`\nalgorithm will just burn your CPUs until the end of the time.\n\nIt would be more useful to compare this to other suffix array implementations,\nbut I haven't had time yet. Moreover, most (all?) don't support Unicode and\ninstead operate on bytes, which means they aren't paying the overhead of\ndecoding UTF-8.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBurntSushi%2Fsuffix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBurntSushi%2Fsuffix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBurntSushi%2Fsuffix/lists"}