Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/naasking/arraymappedset
Fast, Compact Immutable Sets
https://github.com/naasking/arraymappedset
Last synced: 9 days ago
JSON representation
Fast, Compact Immutable Sets
- Host: GitHub
- URL: https://github.com/naasking/arraymappedset
- Owner: naasking
- License: lgpl-2.1
- Created: 2015-09-22T13:27:03.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-09-22T15:34:35.000Z (over 9 years ago)
- Last Synced: 2023-02-28T10:32:06.561Z (almost 2 years ago)
- Language: C#
- Homepage:
- Size: 156 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fast, Compact Immutable Set
This provides a hash array mapped set, based on the well known hash array mapped trie.
It utilizes a struct encoding for compactness:
public struct SimpleSet : IEnumerable
{
// 1. when children != null, then either a collision node or an internal set node
// a. collision nodes have bitmap == 0
// b. internal nodes have bitmap != 0
// 2. when children is null, then either an empty tree or a leaf node
// a. empty trees have bitmap == 0
// b. leaf nodes have bitmap == IS_LEAF (or really, any non-zero value)
uint bitmap;
T value;
SimpleSet[] children;
...
}A unique property of this struct encoding is that single-element sets are just as fast
and compact as inlining the value itself into its enclosing context, at least when T
is a reference type. For instance, if you have a field of type T, a field of
type SimpleSet\ containing a single value is just as compact and efficient.# Future Work
I plan to add a few variants and benchmark them against each other to see which one
is truly more compact and efficient overall:1. Change the "T value" field to "T[] values". In this encoding, if children and values
are both null, then the set is empty; if children is not null, then bitmap encodes
the entries in children; if values is not null, then the tree is height 1
and bitmap encodes the entries in values, and when we add an element to an index
that has an entry, we promote the whole node into the children array.
2. Take the encoding in #1, and note that we can inline the T[] array one level *up* the
tree if we keep separate bitmaps for values and children. So subtrees with only a single
value are kept in the values array, until a new value is added at which point we promote
it to the children array. This is essentially the representation which Jules Jacobs devised
that is used in Sasa's trie [1]. It's much more compact for large trees because there
are many more leaves than internal nodes, and leaves are encoded as T[] instead of
Set\[], which has many unused headers. With a suitable encoding using generics, we
can compact this even further for the final level of the tree.[1] https://sourceforge.net/p/sasa/code/ci/default/tree/Sasa.Collections/Trie.cs
# License
LGPL v2.1