Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/davidmoten/bplustree

B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys
https://github.com/davidmoten/bplustree

Last synced: 17 days ago
JSON representation

B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys

Host: GitHub
URL: https://github.com/davidmoten/bplustree
Owner: davidmoten
License: apache-2.0
Created: 2019-09-20T10:08:34.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2024-10-11T16:08:20.000Z (about 1 month ago)
Last Synced: 2024-10-14T07:45:50.437Z (30 days ago)
Language: Java
Homepage:
Size: 417 KB
Stars: 45
Watchers: 5
Forks: 11
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# bplustree

[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.davidmoten/bplustree/badge.svg?style=flat)](https://maven-badges.herokuapp.com/maven-central/com.github.davidmoten/bplustree)

[![codecov](https://codecov.io/gh/davidmoten/bplustree/branch/master/graph/badge.svg)](https://codecov.io/gh/davidmoten/bplustree)

**Status:** beta

Disk based B+-tree in java using memory mapped files (size limited only by available disk space).

## Features
* size only limited by available disk
* supports range queries
* optionally supports duplicate keys
* much faster read and write than H2 file-based database (because no transactions and different persistence model).

## Requirements

* fast read time for range queries by time and key
* fast insert time
* single node implementation (not distributed)
* use memory-mapped files for speed
* fixed size keys
* variable size values
* very large size storage (>2GB of keys or values)
* optimized for insert in approximate index order
* single threaded
* no transactions
* delete not supported (?)

## Getting started
Add this to your pom.xml:

```xml

com.github.davidmoten
bplustree
VERSION_HERE

```

## Example

Lets create a file based index of timestamped strings (for example lines from a log). Timestamps don't have to be unique.

```java
BPlusTree tree =
BPlusTree
.file()
.directory(indexDirectory)
.maxLeafKeys(32)
.maxNonLeafKeys(8)
.segmentSizeMB(1)
.keySerializer(Serializer.LONG)
.valueSerializer(Serializer.utf8())
.naturalOrder();

// insert some values
tree.insert(1000L, "hello");
tree.insert(2000L, "there");

// search the tree for values with keys between 0 and 3000
// and print out key value pairs
tree.findEntries(0, 3000).forEach(System.out.println);

// search the tree for values with keys between 0 and 3000
// and print out values only
tree.find(0, 3000).forEach(System.out.println);
```
## Duplicate keys
Duplicate keys are allowed by default. You can force overwrite of keyed values by setting `.unique(false)` in the builder.

Note that for efficiency values with duplicate keys are entered into the tree in reverse insert order so to extract the values retaining insert order a special method is used:

```java
tree.findOrderPreserving(0, 3000);
```

## Using bplustree for String keys
Suppose you want to create a B-+ tree with String keys and those keys can have effectively arbitrary length. Keys are stored as fixed size records (unlike values which can be arbitrary in length). You can use hashes to get good find performance and keep the keys small (4 bytes of hash code) by making a tree of type:

```java
BPlusTree tree = ...
```
So you insert the String hashcode in the key and combine the String with the value. You find records using the hashcode of the String key and then filter the results based on an exact match of the String component of StringAndValue.

## Design
B+-tree index is stored across multiple files (of fixed size). Pointers to values are stored in the tree and the values are stored across a separate set of files (of fixed size).

A LargeByteBuffer abstracts access via Memory Mapped Files to a set of files (ByteBuffer only offers int positions which restricts size to 2GB, LargeByteBuffer offers long positions with no effective limit of size (apart from available disk)).