https://github.com/komozoi/libexcessive
C++ On-Disk Datastructure library for performance and reliability, with lots of other goodies included.
https://github.com/komozoi/libexcessive
bigint biginteger-library btree btree-implementation btree-indexes concurrency concurrent cpp lib library logging mmap parallel persistence persistent-memory persistent-storage raii range-search
Last synced: 15 days ago
JSON representation
C++ On-Disk Datastructure library for performance and reliability, with lots of other goodies included.
- Host: GitHub
- URL: https://github.com/komozoi/libexcessive
- Owner: komozoi
- License: apache-2.0
- Created: 2026-03-27T13:56:46.000Z (2 months ago)
- Default Branch: master
- Last Pushed: 2026-05-19T02:23:29.000Z (15 days ago)
- Last Synced: 2026-05-19T03:53:12.007Z (15 days ago)
- Topics: bigint, biginteger-library, btree, btree-implementation, btree-indexes, concurrency, concurrent, cpp, lib, library, logging, mmap, parallel, persistence, persistent-memory, persistent-storage, raii, range-search
- Language: C++
- Homepage:
- Size: 489 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# LibExcessive
[](https://github.com/komozoi/excessive/actions)
- High-performance, parallel file I/O with RAII semantics
- High-performance on-disk data structures
- A bunch of utility functions and classes
## Try it in 5 minutes
Create a CMakeLists.txt:
```cmake
cmake_minimum_required(VERSION 3.14)
project(BTreeDemo)
# Adds excessive to the project
include(FetchContent)
FetchContent_Declare(
excessive
GIT_REPOSITORY https://gitea.com/komozoi/excessive.git
GIT_TAG v0.3.0
GIT_SHALLOW TRUE
GIT_PROGRESS ON
SYSTEM
)
FetchContent_MakeAvailable(excessive)
# Create the demo executable and link excessive into it
add_executable(demo demo.cpp)
target_link_libraries(demo PRIVATE excessive)
```
And create a file `demo.cpp`:
```c++
#include
#include
#include
#include
#include
struct btree_entry_s {
int key;
int value;
static int compare(const btree_entry_s &a, const btree_entry_s &b) {
return a.key - b.key;
}
};
int main() {
FdHandle file = FdHandle::open("btree.bin", O_RDWR | O_CREAT, 0644);
if (!file) {
printf("Failed to open file!\n");
return 1;
}
// It is easy to check if the file already existed or was just created
printf("File is %s.\n", file.isNew() ? "newly created" : "existing");
BTree tree(file, 0, btree_entry_s::compare);
// Add 100 random elements to the tree
std::random_device rd;
for (int i = 0; i < 100; ++i) {
btree_entry_s new_entry{(int) rd() % 5000, (int) rd()};
tree.insert(new_entry);
}
// See what the next highest values are for given inputs
// Each run this would change as the BTree grows
for (int i = 0; i < 5000; i += 500) {
btree_entry_s result{i, 0};
if (tree.findNext(result)) {
printf("Next highest entry from %i is (%i, %i)\n", i, result.key, result.value);
} else {
printf("No next highest entry found for %i\n", i);
}
}
// All data is already written to the file (although not necessarily flushed)
// For this reason, no cleanup is needed here for the BTree.
// File automatically closes when all references go out of scope,
// but can be closed manually with:
// write_handle.close();
return 0;
}
```
If you are using an IDE, this may be enough to import the project and run it - very convenient!
If not, run these commands to compile and run:
```bash
# Setup
mkdir build && cd build && cmake ..
# Compile
make
# Run the demo to create the data file
./demo
# Run the demo again to add more to the data file and see the effects
./demo
```
And that's it - efficient and persistent data storage in less than 60 lines of code. No extra installation steps
or complex APIs. It just works.
## Overview
LibExcessive is intended for large, data-heavy backend
applications such as servers and data processing tools where speed and reliability matter. The
design goals and features are:
* Provide familiar, Java-like APIs and richer helper types, especially containers
* With less verbosity than Java APIs, thankfully.
* Provide rich, threadsafe, and extremely efficient utilities for interacting with files
* Threadsafe file handles and transactions
* Mmap handles
* Open file reference counting
* Utilities for keeping data on-disk
* BTree for indexing various sortable datatypes
* DiskBytestringSearchTree for handling sorted bytestrings and anything they can encode
* Files with dynamically allocated regions
* ThreadPool for efficient parallel task execution
* (planned) Utilities for building on-disk indexes and databases
* Favor explicit memory and performance control. Many components are designed to be friendly to
custom allocators and memory pools.
* Tracking memory separately for different components of an application, which helps
to find memory hogs
* Safer allocation and memory management with less heap fragmentation
* Small, focused algorithms and helpers for string handling, byte buffers, serialization, and more
* Support modern C++ compilers from C++17 and up.
Much of the code was originally written to run on the Teensy 4.1, which is extremely memory
constrained compared to our desktop computers, having a mere 1MiB of RAM. For this reason,
there is a lot of code for tightly controlling memory usage and performance.
Everything is tested on Debian Linux currently, although
any Unix flavor should work. I do not plan on supporting Windows.
Main repository is on Gitea at https://gitea.com/komozoi/excessive, but is mirrored to GitHub
at https://github.com/komozoi/libexcessive.
## Building
I develop this with CLion, which imports the CMake project for me, but if you prefer to build on
the terminal, it's not hard:
```bash
git clone https://gitea.com/komozoi/excessive.git
cd libexcessive
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
```
## Usage Examples
Tests are included for almost everything, which you can use as a larger reference if needed.
### File Access with Mmap
```cpp
#include "fs/FdHandle.h"
int main() {
const char* temp_filename = "mmap_test.tmp";
// All file handles are smart pointers!
FdHandle write_handle = FdHandle::open(temp_filename, O_RDWR | O_CREAT, 0660);
MmapHandle write_mmap = write_handle.getMmapHandle(0, sizeof(my_struct_t));
// Writing structs with mmap is easy and safe
my_struct_t value{1, true, {}};
write_mmap.write(value);
// File automatically closes when all references go out of scope,
// but can be closed manually with:
// write_handle.close();
}
```
### On-Disk Search Tree with Bytestring Keys
For on-disk indexing by variable-length keys (like strings), `DiskBytestringSearchTree` provides an efficient O(log(n))
lookup in the average case.
```cpp
#include "fs/DiskBytestringSearchTree.h"
#include "fs/FreeSpaceFile.h"
#include
int main() {
FdHandle file = FdHandle::open("search_tree.bin", O_RDWR | O_CREAT, 0644);
FreeSpaceFile fss(file);
uint64_t rootOffset;
if (file.isNew()) {
rootOffset = DiskBytestringSearchTree::initialize(fss);
} else {
// Root offset should be stored and retrieved from a known location.
// For this example, we assume it's right after the FreeSpaceFile header.
rootOffset = fss.getHeaderEnd();
}
DiskBytestringSearchTree tree(fss, rootOffset);
// Insert keys
tree.insert(Bytestring("user_123"), 0xDEADBEEF);
// Find keys
uint64_t value = tree.find(Bytestring("user_123"));
if (value != 0) {
printf("Found value: %llx\n", (unsigned long long)value);
}
// FdHandle closes on its own
return 0;
}
```
### Bigint
Unlike in other libraries, the bigint implementation uses a fixed width. The bigint code was originally designed for use
in an EVM implementation, where most values are 256-bit. This is designed to work like a typical register or even
normal fixed-width datatype, just bigger, and acts like you would expect with truncation and such.
There are plenty of great libraries that implement variable-width bigint; there is no reason to add that to this
library. Originally I wasn't going to include my own bigint implementation as I figured existing libraries were
sufficient, but I changed my mind when I saw that they were all variable-length, which does not work for my
typical applications.
Examples:
```c++
#include "bigint.h"
int main() {
uint256_t a = uint256_t("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
uint256_t b("0x1234567890ABCDEF1234567890ABCDEF");
uint256_t sum = a + b;
uint256_t prod = a * b;
uint256_t pow = b.pow(4);
printf("a = %s\n", a.toHexString().c_str());
printf("b = %s\n", b.toHexString().c_str());
printf("a + b = %s\n", sum.toHexString().c_str());
printf("a * b = %s\n", prod.toHexString().c_str());
printf("b**4 = %s\n", pow.toHexString().c_str());
// Output:
// a = ffffffffffffffffffffffffffffffff
// b = 1234567890abcdef1234567890abcdef
// a + b = 0000000000000000000000001234567890abcdef1234567890abcdefffffffff
// a * b = 1234567890abcdef1234567890abcdeeffffffffffffffedcba98765432111
// b**4 = 33ee0e405772f4bd1fa6d7a4e8c14117ea371272c23e2b10
// Default sizes include uint128_t, uint192_t, and uint256_t. Custom sizes are also possible.
uint192_t threeWordValue = "0x6935282358963433459348abcdef1ee7";
UnsignedFixedWidthBigInt<7> sevenWordValue = "0x8b20159b1c579b1088048f054bedebfd02de6b23919371be36d872ec46fe9cebe684edd2675ab1101262b78877b3c09966366c07df0fcccf";
// It is also possible to multiply directly with doubles
// The result of the multiplication is floored and almost exact
uint192_t productWithDouble = threeWordValue * 0.7311;
// productWithDouble would be something like 0x4cead4b7be06ccb79c0e92711d09028c
uint256_t x3("0x783924abc37678847777fcba");
x3 = x3.root(3); // 0xc6fc718e
}
```
### Smart Pointer with Copy on Write Behavior
```c++
#include
struct Data {
int value;
};
int main() {
// UNIQUE (default-style ownership)
sp x(SpPointerType::UNIQUE, Data{10});
// Copying a UNIQUE pointer does NOT immediately copy the data
sp y = x;
// x stays UNIQUE
// y becomes COPY_ON_WRITE
// both point to the same underlying object (for now)
// First write triggers a deep copy
y.mut().value = 20;
// now:
// x->value == 10
// y->value == 20
// they no longer share memory
// You can keep copying before mutation is needed
sp z = x;
// still sharing with x
z.mut().value = 30;
// z detaches and becomes independent
// x is still unchanged
// SHARED mode = always shared, no copy-on-write
sp sharedA(SpPointerType::SHARED, Data{100});
sp sharedB = sharedA;
sharedB.mut().value = 200;
// both see the change:
// sharedA->value == 200
// sharedB->value == 200
// Move = transfer ownership, no copies
sp moved = std::move(sharedA);
// sharedA is now null
// moved owns the data
// Scoped lifetime (RAII)
{
sp temp(SpPointerType::UNIQUE, Data{5});
sp alias = temp;
// alias is COPY_ON_WRITE
// object is destroyed exactly once when both go out of scope
}
// Polymorphic support (New in v0.3.0)
// Seamlessly convert from sp to sp
// sp derived(SpPointerType::UNIQUE, Derived{});
// sp base = derived;
return 0;
}
```
### Thread Pool
Efficient parallel task execution using a pool of worker threads.
```cpp
#include "parallel/ThreadPool.h"
#include
int main() {
// Create a pool with 8 worker threads
ThreadPool pool(8);
// Submit a lambda
pool.submit([]() {
printf("Parallel task running\n");
});
// Submit a function with arguments
pool.submit([](int x, int y) {
printf("Result: %d\n", x + y);
}, 10, 20);
// Graceful shutdown
pool.shutdown();
return 0;
}
```
### Simple Containers
Simple examples showing the style of usage that matches the library design:
```cpp
#include "ds/ArrayList.h"
int main() {
ArrayList list;
list.add(1);
list.add(2);
list.addCopies(5, 3); // add three copies of 5
for (int element: list)
printf("value %d\n", element);
return 0;
}
```
```cpp
#include "ds/ArrayList.h"
int main() {
// Initialize list as {2, 3, 4}
ArrayList list{2,3,4};
// Add 1 to the beginning
list.addFirst(1);
// 1, 2, 3, 4
for (int element: list)
printf("value %d\n", element);
}
```