Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tianyishi2001/suffix-array-li2016
An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li (2016)
https://github.com/tianyishi2001/suffix-array-li2016
Last synced: about 1 month ago
JSON representation
An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li (2016)
- Host: GitHub
- URL: https://github.com/tianyishi2001/suffix-array-li2016
- Owner: TianyiShi2001
- License: mit
- Created: 2021-07-06T15:44:17.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-07-07T09:24:51.000Z (over 3 years ago)
- Last Synced: 2023-03-08T20:27:57.645Z (over 1 year ago)
- Language: Rust
- Size: 31.3 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# suffix-array-li2016
An implemenetation of (part of) the suffix array construction algorithm developed by Zhize Li *et al.* (2016). This algorithm is claimed by the authors to be in-place (i.e. **O(1)** space complexity) and runs in linear time (**O(n)** where `n` is the number of characters in the string). However in practice, the time complexity does not seem to be linear, as evident in the experimental results in their own paper and in experiments with my implementations.
## Performance
I provide both C++ and Rust implementations, which have similar performance. Li's algorithm is not faster than the naive algorithm (naive sorting of all suffixes, **which is also in-place**) with a 32-bit unsigned interger alphabet until `n` is greater than about 800,000. Here I am referring to the Rust implementation of the algorithm in section 3, which allows a mutable string and has a stronger constraint, i.e. `n >= alphabet_size`. The algorithm in section 4 (the main contribution of this paper, which operates on immutable strings and allows `alphabet_size > n`) is expected to be slower.
```
// test tests::bench_li_500_1000 ... bench: 168,711 ns/iter (+/- 27,549)
// test tests::bench_li_500_10000 ... bench: 1,356,838 ns/iter (+/- 79,133) x 8.0 x 8.0
// test tests::bench_li_500_100000 ... bench: 14,359,321 ns/iter (+/- 1,307,696) x10.6 x 85.1
// test tests::bench_li_500_1000000 ... bench: 180,594,903 ns/iter (+/- 13,631,112) x12.6 x1070.4
// test tests::bench_naive_500_1000 ... bench: 90,554 ns/iter (+/- 5,723)
// test tests::bench_naive_500_10000 ... bench: 1,262,911 ns/iter (+/- 58,702) x13.9 x 10.3
// test tests::bench_naive_500_100000 ... bench: 16,251,237 ns/iter (+/- 661,166) x12.9 x 179.5
// test tests::bench_naive_500_1000000 ... bench: 220,851,474 ns/iter (+/- 6,532,465) x13.6 x2438.9
```Curiously, when testing with 10,000,000 or more characters, Li's algorithm becomes slower than the naive algorithm (the C++ implementation has similar performance). What could be the culprit?
```
test tests::bench_li_500_10000000 ... bench: 4,796,887,136 ns/iter (+/- 20,770,245) x26.56 x28433
test tests::bench_naive_500_10000000 ... bench: 3,082,987,837 ns/iter (+/- 63,887,953) x13.95 x34045
```