https://github.com/kampersanda/fast_succinct_trie
String map implementation through Fast Succinct Trie
https://github.com/kampersanda/fast_succinct_trie
map succinct-data-structure trie
Last synced: 6 months ago
JSON representation
String map implementation through Fast Succinct Trie
- Host: GitHub
- URL: https://github.com/kampersanda/fast_succinct_trie
- Owner: kampersanda
- License: apache-2.0
- Created: 2019-07-31T08:04:02.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-07-09T15:57:47.000Z (over 4 years ago)
- Last Synced: 2025-03-30T04:11:22.193Z (7 months ago)
- Topics: map, succinct-data-structure, trie
- Language: C++
- Size: 639 KB
- Stars: 21
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fast\_succinct\_trie
This library provides a string map through Fast Succinct Trie (FST), proposed in [SIGMOD 2018](http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf). The library is implemented by modifying the original FST implementation [efficient/SuRF](https://github.com/efficient/SuRF) and applying a compact minimal-prefix trie form to simulate a trie-based string map (c.f. Section 2.2 of [KAIS 2017](https://kampersanda.github.io/pdf/KAIS2017.pdf)).
## What is FST?
FST is a succinct trie data structure proposed in the paper,
> Zhang, Lim, Leis, Andersen, Kaminsky, Keeton and Pavlo: **SuRF: Practical Range Query Filtering with Fast Succinct Trie,** In *SIGMOD 2018*, pp. 323-336.
Briefly, FST is a practical variant of [LOUDS-trie](https://bitbucket.org/vsmirnov/memoria/wiki/LabeledTree). FST uses two LOUDS implementations: one is fast and the other is space-efficient. FST partitions a trie into two layers at a level and applies the fast one to the top layer and the space-efficient one to the bottom layer. More specific explanations can be found in the [slide](http://www.cs.cmu.edu/~huanche1/slides/FST.pdf) by the author.
Since FST was developed for succinct range query filtering, [the original implementation](https://github.com/efficient/SuRF) allows us to include false positives in query solutions. [This library](https://github.com/kampersanda/fast_succinct_trie) modifies it and provides a string map based on FST.
## Install
This library consists of only header files. Please through the path to the directory [`include`](https://github.com/kampersanda/fast_succinct_trie/tree/master/include).
## Build instructions
You can download and compile this library as the following commands:
```sh
$ git clone https://github.com/kampersanda/fast_succinct_trie.git
$ cd fast_succinct_trie
$ mkdir build
$ cd build
$ cmake ..
$ make -j
```## Sample usage
```cpp
#include
#include#include
int main() {
std::vector keys = {
"ACML", "AISTATS", "DS", "DSAA", "ICDM", "ICML", //
"PAKDD", "SDM", "SIGIR", "SIGKDD", "SIGMOD",
};// a trie-index constructed from string keys sorted
fst::Trie trie(keys);// keys are mapped to unique integers in the range [0,#keys)
std::cout << "[searching]" << std::endl;
for (size_t i = 0; i < keys.size(); ++i) {
fst::position_t key_id = trie.exactSearch(keys[i]);
std::cout << " - " << keys[i] << ": " << key_id << std::endl;
}std::cout << "[statistics]" << std::endl;
std::cout << " - number of keys: " << trie.getNumKeys() << std::endl;
std::cout << " - number of nodes: " << trie.getNumNodes() << std::endl;
std::cout << " - number of suffix bytes: " << trie.getSuffixBytes() << std::endl;
std::cout << " - memory usage in bytes: " << trie.getMemoryUsage() << std::endl;
std::cout << " - output file size in bytes: " << trie.getSizeIO() << std::endl;std::cout << "[configure]" << std::endl;
trie.debugPrint(std::cout);// write the trie-index to a file
{
std::ofstream ofs("fst.idx");
trie.save(ofs);
}// read the trie-index from a file
{
fst::Trie other;
std::ifstream ifs("fst.idx");
other.load(ifs);
}std::remove("fst.idx");
return 0;
}
```
The output will be```
[searching]
- ACML: 1
- AISTATS: 2
- DS: 4
- DSAA: 5
- ICDM: 6
- ICML: 7
- PAKDD: 0
- SDM: 3
- SIGIR: 8
- SIGKDD: 9
- SIGMOD: 10
[statistics]
- number of keys: 11
- number of nodes: 19
- number of suffix bytes: 24
- memory usage in bytes: 587
- output file size in bytes: 312
[configure]
-- LoudsDense (heigth=1) --
LABEL: A D I P S |
CHILD: 1 1 1 0 1 |
PREFX: 0 |
-- LoudsSparse --
LABEL: C I S C D I ? A D M G I K M ?
CHILD: 0 0 1 1 0 1 0 0 0 0 1 0 0 0
LOUDS: 1 0 1 1 1 0 1 0 1 0 1 1 0 0
-- Suffixes --
POINTERS: 17 11 1 9 0 22 9 12 7 19 14
SUFFIXES: ? S T A T S ? R ? M ? M L ? O D ? A K D D ? A ?
```## Todo
- Support more operations
- Implement a normal trie form also## Licensing
This library is free software provided under [Apache License 2.0](https://github.com/kampersanda/fast_succinct_trie/blob/master/LICENSE), following the License of [efficient/SuRF](https://github.com/efficient/SuRF).
The modifications are shown in each source file.