Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/naasking/arraymappedset

Fast, Compact Immutable Sets
https://github.com/naasking/arraymappedset

Last synced: 9 days ago
JSON representation

Fast, Compact Immutable Sets

Host: GitHub
URL: https://github.com/naasking/arraymappedset
Owner: naasking
License: lgpl-2.1
Created: 2015-09-22T13:27:03.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2015-09-22T15:34:35.000Z (over 9 years ago)
Last Synced: 2023-02-28T10:32:06.561Z (almost 2 years ago)
Language: C#
Homepage:
Size: 156 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Fast, Compact Immutable Set

This provides a hash array mapped set, based on the well known hash array mapped trie.

It utilizes a struct encoding for compactness:

    public struct SimpleSet : IEnumerable

    {

        // 1. when children != null, then either a collision node or an internal set node

        //   a. collision nodes have bitmap == 0

        //   b. internal nodes have bitmap != 0

        // 2. when children is null, then either an empty tree or a leaf node

        //   a. empty trees have bitmap == 0

        //   b. leaf nodes have bitmap == IS_LEAF  (or really, any non-zero value)

        uint bitmap;

        T value;

        SimpleSet[] children;

		...

	}

A unique property of this struct encoding is that single-element sets are just as fast

and compact as inlining the value itself into its enclosing context, at least when T

is a reference type. For instance, if you have a field of type T, a field of

type SimpleSet\ containing a single value is just as compact and efficient.

# Future Work

I plan to add a few variants and benchmark them against each other to see which one

is truly more compact and efficient overall:

 1. Change the "T value" field to "T[] values". In this encoding, if children and values

    are both null, then the set is empty; if children is not null, then bitmap encodes

	the entries in children; if values is not null, then the tree is height 1

	and bitmap encodes the entries in values, and when we add an element to an index

	that has an entry, we promote the whole node into the children array.

 2. Take the encoding in #1, and note that we can inline the T[] array one level *up* the

	tree if we keep separate bitmaps for values and children. So subtrees with only a single

	value are kept in the values array, until a new value is added at which point we promote

	it to the children array. This is essentially the representation which Jules Jacobs devised

	that is used in Sasa's trie [1]. It's much more compact for large trees because there

	are many more leaves than internal nodes, and leaves are encoded as T[] instead of

	Set\[], which has many unused headers. With a suitable encoding using generics, we

	can compact this even further for the final level of the tree.

[1] https://sourceforge.net/p/sasa/code/ci/default/tree/Sasa.Collections/Trie.cs

# License

LGPL v2.1