Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ikelin/cuckoofilter
Cuckoo Filter is a thread safe probability filter that performs set membership tests.
https://github.com/ikelin/cuckoofilter
cuckoo-filter set-membership
Last synced: 23 days ago
JSON representation
Cuckoo Filter is a thread safe probability filter that performs set membership tests.
- Host: GitHub
- URL: https://github.com/ikelin/cuckoofilter
- Owner: ikelin
- License: apache-2.0
- Created: 2019-01-11T06:33:27.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-30T05:12:41.000Z (over 5 years ago)
- Last Synced: 2024-07-01T17:15:21.138Z (6 months ago)
- Topics: cuckoo-filter, set-membership
- Language: Java
- Homepage:
- Size: 364 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cuckoo Filter
A thread safe probability filter that performs set membership tests. A lookup returns either __might be in set__ or __definitely not in set__.
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.ikelin/cuckoofilter/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.ikelin/cuckoofilter)
[![Build Status](https://travis-ci.org/ikelin/cuckoofilter.svg?branch=master)](https://travis-ci.org/ikelin/cuckoofilter)
[![Coverage Status](https://coveralls.io/repos/github/ikelin/cuckoofilter/badge.svg?branch=master)](https://coveralls.io/github/ikelin/cuckoofilter?branch=master)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/c6ef9f24101546b9afe9f06b7751eec8)](https://www.codacy.com/app/ikelin/cuckoofilter?utm_source=github.com&utm_medium=referral&utm_content=ikelin/cuckoofilter&utm_campaign=Badge_Grade)## Usage
Maven `pom.xml`:
```xml
com.ikelin
cuckoofilter
{VERSION}```
### Creating a Filter
```java
CuckooFilter filter = CuckooFilter.create(10000)
.withFalsePositiveProbability(0.001)
.withConcurrencyLevel(8)
.build();
```This creates a cuckoo filter of with expected max capacity of `10,000`, false positive probability of `0.001` (or 0.1%), and concurrency level of `8`.
*Expected Max Capacity* specifies the expected number of items this set can hold.
*False Positive Probability* is the probability that lookup item operation will return a false positive.
The allowed concurrency among read and write operations is guided by *Concurrency Level*.
### Lookup Item
To lookup an item in the filter, use the `mightContain()` method:
```java
CuckooFilter tables = CuckooFilter.create(100).build();
boolean tableMightExist = tables.mightContain(tableHash)
if (!tableMightExist) {
// table definitely does not exist, do not query database
}
```If `mightContain()` returns `true`, the item might or might not be in the filter. If `mightContain()` returns `false`, the item is definitely not in the set.
### Put ItemTo put an item into the filter, use the `put()` method:
```java
CuckooFilter blacklistedWebsites = CuckooFilter.create(3000000).build();
boolean success = blacklistedWebsites.put(websiteHash);
if (!success) {
// set expectedMaxCapacity reached...
}```
Always check the `boolean` returned from the `put()` method. If `put()` returns `true`, the item is successfully inserted. If `put()` returns `false`, the filter has reached its capacity. In this case, create a new filter with larger capacity.
### Remove Item
To remove an item from the filter, use the `remove()` method:
```java
CuckooFilter cdnCachedContents = CuckooFilter.create(100000000).build();
filter.remove(purgedCdnCachedContentHash);
```## Item Hash Value
Use a performant hashing library that generates a well distributed long hash value for items. For example, OpenHFT's [Zero Allocation Hash](https://github.com/OpenHFT/Zero-Allocation-Hashing) or Google's Guava [Hashing](https://github.com/google/guava/wiki/HashingExplained).
```java
LongHashFunction hashFunction = LongHashFunction.xx();
long itemHash = hashFunction.hashChars("item-foo");if (!filter.mightContain(itemHash)) {
// item definitely does not exist in filter...
}
```