https://github.com/wtarreau/cebtree
Compact Elastic Binary Trees: only require two pointers, like a doubly-linked list, to build a tree. Duplicates not implemented for now, but algorithmically supported.
https://github.com/wtarreau/cebtree
binary-search-tree ebtree
Last synced: about 1 year ago
JSON representation
Compact Elastic Binary Trees: only require two pointers, like a doubly-linked list, to build a tree. Duplicates not implemented for now, but algorithmically supported.
- Host: GitHub
- URL: https://github.com/wtarreau/cebtree
- Owner: wtarreau
- Created: 2023-11-26T23:13:27.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-05-04T09:19:32.000Z (about 1 year ago)
- Last Synced: 2025-05-04T09:31:36.773Z (about 1 year ago)
- Topics: binary-search-tree, ebtree
- Language: C
- Homepage:
- Size: 928 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Compact Elastic Binary Tree (cebtree)
## Abstract
CEBTree is a form of very compact binary search tree that only requires two
pointers (like a list) to index data. The structure is a much more compact
variant of the [EBTree](https://github.com/wtarreau/ebtree) structure, so no
allocation is needed when inserting a node. There are no upward pointers so
some operations are slower as they will require a preliminary lookup. For
example a `next()` operation requires a first descent to identify the closest
fork point, then a second descent from that optimal point.
But the structure provides a number of other benefits. The first one being the
memory usage: **the tree uses the same storage as a list**, thus can be
installed anywhere a list would be used. This can be particularly interesting
for read-mostly data (configuration, append-only indexes etc). It preserves
structure alignment, thus does not require to contain the data itself, the data
may be appended just after the pointer nodes, which saves the need for typed
trees thus typed operations. It may also make the code a bit cleaner, because
with EBTree it's often tempting to touch node->key from the main code, without
always realizing the impacts (namely with signed values).
It should also be easier to implement variants (e.g. case insensitive strings
lookups, or faster memory lookups matching one word at a time, etc) thanks to
the unified data types.
## Properties
Just like EBTrees, duplicate keys are supported and are visited in insertion
order. This allows for duplicate detection and graceful handling for example
in configuration files, as well as timer management (though ebtrees are much
faster for timer management, albeit bigger).
#### Comparison of costs by model
For more info please consult the [EB vs CEB benchmark](results/bench-eb-ceb/)
that was run using the [bench utility](tests/bench.c).
| model | list | hash (B buckets) | rbtree | cebtree | ebtree |
|-------------------|:---------------:|:------------------:|:---------------:|:---------------:|:-----------------:|
|__operation__ | min / avg / max | min / avg / max | min / avg / max | min / avg / max | min / avg / max |
|lookup ops | - / O(N) / - | 1 / N/B / O(N) | - / O(logN) / - | - / O(logN) / - | - / O(logN) / - |
|insert ordered | - / O(N) / - | 1 / N/B / O(N) | - / O(logN) / - | - / O(logN) / - | - / O(logN) / - |
|append (unordered) | - / O(1) / - | - / O(1) / - | - / O(logN) / - | - / O(logN) / - | - / O(logN) / - |
|delete | - / O(1) / - | - / O(1) / - | - / O(logN) / - | - / O(logN) / - | - / O(1) / - |
|first/last | - / O(1) / - | 1 / N/B / O(N) | - / O(logN) / - | - / O(logN) / - | - / O(logN) / - |
|next/prev | - / O(1) / - | 1 / O(1) / O(N/B) | O(1) / O(1) / O(logN) | - / O(logN) / - | O(1) / O(1) / O(logN) |
||
|__Costs per operation__|
|string lookup cost| N*strcmp() | N/B*strcmp() | ~logN*strcmp() | ~1*strcmp() | ~1*strcmp() |
|visited nodes | N | N/B | 2*logN | 2*logN | 1*logN |
#### Synthetic performance comparison
- enumeration in insertion order: lists > ebtree = rbtree > cebtree
- enumeration in key order: ebtree > cebtree = rbtree > lists
- random lookups: ebtree > cebtree = rbtree > lists
- random deletion: lists > ebtree > cebtree = rbtree
- total purge: lists > ebtree > cebtree = rbtree
The tagged pointers permit the string lookup cost to remain low. Without tagged
pointers (i.e. version 0.2), the string lookup cost becomes logN*strcmp() since
a complete string needs to be compared at each layer (like in other non-radix
trees such as rbtree).
## API
The application integration and API are documented in [this document](doc/API.md).
## Limitations and future improvements
Relative addressing is not yet implemented but is in progress. This is handy to
manipulate data in memory areas shared between multiple processes, where no
pointer is stored.
Performance is a bit lower than EBTrees even for small keys due to the need to
read both branches at each node to figure whether to stop or continue the
descent. It effectively doubles the number of visited nodes (hence is less TLB
friendly), though it does not necessarily increase the memory bandwidth since
the nodes are much smaller.
It was verified that an almost lockless approach could be implemented: lookups
and insertion could be done without locking but deletion requires locking. As
such, an approach would require [rwlocks](https://github.com/wtarreau/plock)
with shared locking for insertion and lookup, and exclusive locking for
deletion. This might come as an advantage over EBTrees for highly contended
environments.