https://github.com/weaversa/mphf-wbpm
A minimal perfect hash function generator that uses weighted bipartite perfect matching and XORSAT dictionaries.
https://github.com/weaversa/mphf-wbpm
Last synced: about 1 month ago
JSON representation
A minimal perfect hash function generator that uses weighted bipartite perfect matching and XORSAT dictionaries.
- Host: GitHub
- URL: https://github.com/weaversa/mphf-wbpm
- Owner: weaversa
- License: bsd-3-clause
- Created: 2020-02-10T13:05:26.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-08-04T15:11:11.000Z (almost 3 years ago)
- Last Synced: 2025-02-17T08:34:20.867Z (4 months ago)
- Language: C
- Size: 22.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MPHF-WBPM
Minimal Perfect Hash Functions using weighted bipartite perfect matching and XORSAT.
# Description
This project supports building minimal perfect hash functions
for large sets at roughly 1.83 bits per element.An MPHF is a bijective function that maps a set of keywords `W = {w_0,
..., w_{n-1}}` to the integers `{0, ..., n-1}`. Details of the
encoding used here are presented in this paper: [Constructing Minimal
Perfect Hash Functions Using SAT
Technology](https://www.cs.cmu.edu/~mheule/publications/AAAI-WeaverS.1625.pdf).# Dependencies
This project relies on pthreads and some git submodules. To get these modules, clone
the repository by doing either
```
git clone --recursive [email protected]:weaversa/MPHF-WBPM.git
```
or
```
git clone [email protected]:weaversa/MPHF-WBPM.git
cd MPHF-WBPM
git submodule update --init --recursive
```# Install
Run `make` in the project root directory. The library file
`libmphfwbpm.a` will (assuming successful compilation) be
created in this package's `lib` directory# Building an MPFH
An MPHF is built in two separate phases. The first involves adding
elements to a builder object. The second involves creating a querier
object from a builder object. Once completed, the querier object may
be queried ad infinitum.A builder is first allocated using `MPHFBuilderAlloc`, like so:
```
MPHFBuilder *mphfb = MPHFBuilderAlloc(0);
```Here, the first and only argument `0` is the number of expected
elements the MPHF will encode. It is safe to leave this number as `0`,
but will decrease calls to malloc if the actual number is given ahead
of time.Elements are added to the builder, like so:
```
if(MPHFBuilderAddElement(mphfb, pElement, nElementBytes) != 0) {
fprintf(stderr, "Element insertion failed...exiting\n");
return -1;
}
```Here, `pElement` is a pointer to at least `nElementBytes` number of
bytes. This element will be copied into the builder.After all elements have been stored, the querier is ready to be
created:```
MPHFQuerier *mphfq = MPHFBuilderFinalize(mphfb, MPHFPaperParameters, nThreads);
```The first argument is the builder. The second argument is a structure
consisting of two parameters: the number of elements per block (the initial split to parallelize over) and the XORSAT filter parameters.
For example,```
MPHFParameters MPHFPaperParameters =
{ .nEltsPerBlock = 12288,
.xsfp =
{ .nLitsPerRow = 0,
.nSolutions = 1,
.nEltsPerBlock = 4608,
.fEfficiency = 1.00 },
};
```Feel free to define your own parameters to meet the needs of your
application.The returned querier (`mphfq`) will be `NULL` on error.
When finalizing, you will notice that some progress is printed to
stderr. These print statements can be turned off by commenting out the
following line in mphf.h and recompiling the package.```
#define MPHF_PRINT_BUILD_PROCESS
```After creating the querier, it is suggested that the builder be
free'd, like so:```
MPHFBuilderFree(mphfb);
```# Querying an MPFH
The MPHF can be queried against an element, like so:
```
uint32_t key = MPHFQuery(mphfq, pElement, nElementBytes);
```Here, `pElement` is a pointer to `nElementBytes` number of bytes. The
key unique to this element is returned.When querying is finished, the querier can be freed, like so:
```
MPHFQuerierFree(mphfq);
```# Serialization
Queriers can be serialized (written to a file) in the following way:
```
FILE *fout = fopen("mphf.out", "w");
if(MPHFSerialize(fout, mphfq) != 0) {
fprintf(stderr, "Serialization failed...exiting\n");
return -1;
}
fclose(fout);
```Here, `fout` is of type `FILE *`. `ret` will be `0` on failure and `1`
on success.A querier can be deserialized (read from a file) in the following way:
```
fout = fopen("mphf.out", "r");
mphfq = MPHFDeserialize(fout);
if(mphfq == NULL) {
fprintf(stderr, "Deserialization failed...exiting\n");
return -1;
}
fclose(fout);
```Here, `fout` is of type `FILE *`. `mphfq` will be `NULL` on error.
# Linking
To use, simply link against `lib/libmphfwbpm.a` and include
`include/mphf.h`.# Test
A sample interface is given in the `test` directory. The test builds
an MPHF for 2^20 random 10-byte elements and then queries the MPHF
against the original elements (for a consistency check) and prints
statistics. To run the test type:```
$ make test/test && test/test
```# Further Information
A paper about MPHF and WBPM is available here: [Constructing Minimal
Perfect Hash Functions Using SAT
Technology](https://www.cs.cmu.edu/~mheule/publications/AAAI-WeaverS.1625.pdf).