Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/gvinciguerra/PGM-index

πŸ…State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
https://github.com/gvinciguerra/PGM-index

b-tree big-data compression cpp data-structures database header-only indexing machine-learning multidimensional multidimensional-trees research spatial-index succinct-data-structure

Last synced: about 2 months ago
JSON representation

πŸ…State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Lists

README

        


The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.


Website
| Documentation
| Paper
| Slides
| Python wrapper
| AΒ³ Lab


GitHub Workflow Status
License
GitHub stars
GitHub forks
Run on Repl.it

## Quickstart

This is a header-only library. It does not need to be installed. Just clone the repo with

```bash
git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index
```

and copy the `include/pgm` directory to your system's or project's include path.

The `examples/simple.cpp` file shows how to index and query a vector of random integers with the PGM-index:

```cpp
#include
#include
#include
#include
#include "pgm/pgm_index.hpp"

int main() {
// Generate some random data
std::vector data(1000000);
std::generate(data.begin(), data.end(), std::rand);
data.push_back(42);
std::sort(data.begin(), data.end());

// Construct the PGM-index
const int epsilon = 128; // space-time trade-off parameter
pgm::PGMIndex index(data);

// Query the PGM-index
auto q = 42;
auto range = index.search(q);
auto lo = data.begin() + range.lo;
auto hi = data.begin() + range.hi;
std::cout << *std::lower_bound(lo, hi, q);

return 0;
}
```

[Run and edit this and other examples on Repl.it](https://repl.it/github/gvinciguerra/PGM-index). Or run it locally via:

```bash
g++ examples/simple.cpp -std=c++17 -I./include -o simple
./simple
```

## Classes overview

Other than the `pgm::PGMIndex` class in the example above, this library provides the following classes:

- `pgm::DynamicPGMIndex` supports insertions and deletions.
- `pgm::MultidimensionalPGMIndex` stores points in k dimensions and supports orthogonal range queries.
- `pgm::MappedPGMIndex` stores data on disk and uses a PGMIndex for fast search operations.
- `pgm::CompressedPGMIndex` compresses the segments to reduce the space usage of the index.
- `pgm::OneLevelPGMIndex` uses a binary search on the segments rather than a recursive structure.
- `pgm::BucketingPGMIndex` uses a top-level lookup table to speed up the search on the segments.
- `pgm::EliasFanoPGMIndex` uses a top-level succinct structure to speed up the search on the segments.

The full documentation is available [here](https://pgm.di.unipi.it/docs/).

## Compile the tests and the tuner

After cloning the repository, build the project with

```bash
cmake . -DCMAKE_BUILD_TYPE=Release
make -j8
```

The test runner will be placed in `test/`. The [tuner](https://pgm.di.unipi.it/docs/tuner/) executable will be placed in `tuner/`. The [benchmark](https://pgm.di.unipi.it/docs/benchmark/) executable will be placed in `benchmark/`.

## License

This project is licensed under the terms of the Apache License 2.0.

If you use the library please put a link to the [website](https://pgm.di.unipi.it) and cite the following paper:

> Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

```tex
@article{Ferragina:2020pgm,
Author = {Paolo Ferragina and Giorgio Vinciguerra},
Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
Year = {2020},
Volume = {13},
Number = {8},
Pages = {1162--1175},
Doi = {10.14778/3389133.3389135},
Url = {https://pgm.di.unipi.it},
Issn = {2150-8097},
Journal = {{PVLDB}}}
```