Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ikelin/cuckoofilter

Cuckoo Filter is a thread safe probability filter that performs set membership tests.
https://github.com/ikelin/cuckoofilter

cuckoo-filter set-membership

Last synced: 23 days ago
JSON representation

Cuckoo Filter is a thread safe probability filter that performs set membership tests.

Awesome Lists containing this project

README

        

# Cuckoo Filter

A thread safe probability filter that performs set membership tests. A lookup returns either __might be in set__ or __definitely not in set__.

[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.ikelin/cuckoofilter/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.ikelin/cuckoofilter)
[![Build Status](https://travis-ci.org/ikelin/cuckoofilter.svg?branch=master)](https://travis-ci.org/ikelin/cuckoofilter)
[![Coverage Status](https://coveralls.io/repos/github/ikelin/cuckoofilter/badge.svg?branch=master)](https://coveralls.io/github/ikelin/cuckoofilter?branch=master)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/c6ef9f24101546b9afe9f06b7751eec8)](https://www.codacy.com/app/ikelin/cuckoofilter?utm_source=github.com&utm_medium=referral&utm_content=ikelin/cuckoofilter&utm_campaign=Badge_Grade)

## Usage

Maven `pom.xml`:

```xml

com.ikelin
cuckoofilter
{VERSION}

```

### Creating a Filter

```java
CuckooFilter filter = CuckooFilter.create(10000)
.withFalsePositiveProbability(0.001)
.withConcurrencyLevel(8)
.build();
```

This creates a cuckoo filter of with expected max capacity of `10,000`, false positive probability of `0.001` (or 0.1%), and concurrency level of `8`.

*Expected Max Capacity* specifies the expected number of items this set can hold.

*False Positive Probability* is the probability that lookup item operation will return a false positive.

The allowed concurrency among read and write operations is guided by *Concurrency Level*.

### Lookup Item

To lookup an item in the filter, use the `mightContain()` method:

```java
CuckooFilter tables = CuckooFilter.create(100).build();
boolean tableMightExist = tables.mightContain(tableHash)
if (!tableMightExist) {
// table definitely does not exist, do not query database
}
```

If `mightContain()` returns `true`, the item might or might not be in the filter. If `mightContain()` returns `false`, the item is definitely not in the set.

### Put Item

To put an item into the filter, use the `put()` method:

```java
CuckooFilter blacklistedWebsites = CuckooFilter.create(3000000).build();
boolean success = blacklistedWebsites.put(websiteHash);
if (!success) {
// set expectedMaxCapacity reached...
}

```

Always check the `boolean` returned from the `put()` method. If `put()` returns `true`, the item is successfully inserted. If `put()` returns `false`, the filter has reached its capacity. In this case, create a new filter with larger capacity.

### Remove Item

To remove an item from the filter, use the `remove()` method:

```java
CuckooFilter cdnCachedContents = CuckooFilter.create(100000000).build();
filter.remove(purgedCdnCachedContentHash);
```

## Item Hash Value

Use a performant hashing library that generates a well distributed long hash value for items. For example, OpenHFT's [Zero Allocation Hash](https://github.com/OpenHFT/Zero-Allocation-Hashing) or Google's Guava [Hashing](https://github.com/google/guava/wiki/HashingExplained).

```java
LongHashFunction hashFunction = LongHashFunction.xx();
long itemHash = hashFunction.hashChars("item-foo");

if (!filter.mightContain(itemHash)) {
// item definitely does not exist in filter...
}
```