https://github.com/timboudreau/storage
Libraries for building high-performance, single-purpose micro-databases with indexes over memory-mapped files in Java
https://github.com/timboudreau/storage
Last synced: 11 months ago
JSON representation
Libraries for building high-performance, single-purpose micro-databases with indexes over memory-mapped files in Java
- Host: GitHub
- URL: https://github.com/timboudreau/storage
- Owner: timboudreau
- Created: 2022-04-24T22:06:21.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-16T06:19:15.000Z (over 2 years ago)
- Last Synced: 2025-01-10T15:51:01.943Z (about 1 year ago)
- Language: Java
- Size: 68.4 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Storage
=======
A set of libraries for working with memory-mapped data and fixed-record-length storage, to build single-purpose,
high-performance, persistent micro-databases.
Originally written for building efficient indexes into Java heap dumps, this library provides a general-purpose
set of tools for working with arbitrarily sized memory-mapped binary data, and building such indexes.
These allow you to build single-purpose high-performance persistent micro-databases.
The Storage Library
-------------------
A `Storage` gives read- and (optionally) write- access to some bytes. It has a record-size in bytes, which
is the number of bytes in one record, and may contain many records.
A number of implementations with different characteristics are included - `FileChannel`-based (slow but low footprint), a single
memory-mapped implementation for files where you can guarantee the file size will remain below the
maximum size the operating system or available memory will be able to accomodate in a single memory-mapping.
A multiple-memory-mapping version maintains as many memory-mappings as are needed to have the entire storage
memory-mapped.
The data in the backing file of a `Storage` is entirely record-based - there are no headers or other
data, so the position of a record is a simple function of the number of records from the start of the file
it exists at.
Records can be written and/or rewritten or appended, looked up by index, and a Storage can be treated as an
`Iterable`.
The two less prosaic features of `Storage` are:
* The contents can be sorted in-place using whatever sort function you choose
* Once sorted, the contents can be efficiently binary-searched
That makes possible...
The Indexes Library
-------------------
An `Index` utilizes the above feature to allow fast lookup of records. The typical pattern is that you
write some data you want to look up by multiple sortable fields; either as you write it or thereafter
you create an index that allows you to look up elements by field.
An index has a schema; the schema is expressed as a Java `enum`, where the fields appear in the order
they occur in the enum. Each has a (primitive) type and a byte-offset into the record, and an
`IndexKind` that describes whether that field is unique or not, and should be treated as the canonical
ordering - equivalent to a _primary key_ in SQL.
Unique and many-to-many indexes are supported - while keys are sorted, there can be runs of any number
of records that use the same key.
Typically the canonical ordering is the position of the referenced record in the original data you're
creating an index over - that way, your index only need contain the file offset and the one field
you want to index, and gets stored in that order.
Example:
```java
public enum Classes implements SchemaItem {
// offset, classId, nameId, superclassId, instanceSize
FILE_OFFSET, CLASS_ID, NAME_ID, SUPERCLASS_ID, INSTANCE_SIZE;
@Override
public ValueType type() {
return ValueType.LONG;
}
@Override
public int byteOffset() {
switch (this) {
case FILE_OFFSET:
return Integer.BYTES;
case CLASS_ID:
return Integer.BYTES + Long.BYTES;
case NAME_ID:
return Integer.BYTES + Long.BYTES * 2;
case SUPERCLASS_ID:
return Integer.BYTES + Long.BYTES * 3;
case INSTANCE_SIZE:
return Integer.BYTES + Long.BYTES * 4;
default:
throw new AssertionError(this);
}
}
@Override
public IndexKind indexKind() {
switch (this) {
case FILE_OFFSET:
return IndexKind.CANONICAL_ORDERING;
case CLASS_ID:
return IndexKind.UNIQUE;
default:
return IndexKind.NONE;
}
}
}
```
Notes / Caveats
---------------
This library generally assumes the user knows the state of the files they are
memory mapping and passes in correct values for things - in particular, if you want
to use write methods, you need to pass in a read-write `FileChannel`, etc.
Unlike most of the `com.mastfrog` libraries, this library requires JDK 16 or greater,
since it relies on fixed-position read and write methods on buffers and channels which
allow them to be effectively stateless.