https://github.com/davidrusu/hashseq

A BFT Sequence CRDT suitable for Permisonless Networks with Unbounded Number of participants
https://github.com/davidrusu/hashseq

Last synced: about 1 year ago
JSON representation

A BFT Sequence CRDT suitable for Permisonless Networks with Unbounded Number of participants

Host: GitHub
URL: https://github.com/davidrusu/hashseq
Owner: davidrusu
Created: 2022-03-07T17:19:35.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2023-11-28T18:40:31.000Z (over 2 years ago)
Last Synced: 2025-04-20T13:41:07.904Z (over 1 year ago)
Language: Rust
Size: 171 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          

# HashSeq

A Byzantine-Fault-Tolerant (BFT) Sequence CRDT suitable for unpermissioned networks with unbounded number of collaborators.

## Merge Semantics

### Concurrent Inserts are not interleaved:

| Site 1 | Site 2  |

|--------|---------|

|  hello | goodbye |

On merge we see:

`hellogoodbye` OR `goodbyehello`

### Common Prefix is Deduplicated:

| Site 1 | Site 2  |

|--------|---------|

|  hello earth | hello mars |

On merge we see:

`hello earthmars` OR `hello marsearth`

(i.e. hello is not duplicated even though Site 1 and Site 2 both inserted it.)

### Stable Ordering

let _S_,_R_ be HashSeq instances on Site 1, Site 2 respectively.

Both _S_ and _R_ form a montonic sub-sequence of _Q_ = merge(_S_, _R_).

Stated differently, for sequence elements _a_,_b_ ∈ _S_, if _a_ comes before _b_ in _S_, and _a_,_b_ ∈ _R_, then _a_ comes before _b_ in _R_.

## Current Complexity:

Assuming you are using the Cursor interface:

|   op   | time | space |

|--------|------|-------|

| insert | O(1) | O(1)  |

| remove | O(n) | O(n)  |

| seek   | O(n) | O(n)  |

These are still WIP, we should be able to get `remove` and `seek` down to O(log(n)) once we have a secondary position index into the ordering tree.

## Design

Each edit produces a HashNode containing an Op and some extra dependencies:

```rust

pub enum Op {

    InsertRoot(char),

    InsertAfter(Id, char),

    InsertBefore(Id, char),

    Remove(Id),

}

pub struct HashNode {

    extra_dependenciess: BTreeSet,

    op: Op,

}

impl HashNode {

    fn id(&self) -> Id;

}

```

* `InsertRoot` is used when the HashSeq is empty.

* `InsertAfter(id, char) is used to constrain this `HashNode` to appear after the node with id `id`.

* `InsertBefore(id, char)` is used to constrain this HashNode to appear before the node with id `id`.

* `Remove(id)` is used to removing the node with id `id`.

#### Example 1. Writing "hello" by appending end

```

InsertRoot('h')       -- id = 0x0

InsertAfter(0x0, 'e') -- id = 0x1

InsertAfter(0x1, 'l') -- id = 0x2

InsertAfter(0x2, 'l') -- id = 0x3

InsertAfter(0x3, 'o') -- id = 0x4

  h <- e <- l <- l <- o

-- "hello"

```

Each insert produces a Node holding a value, the hashes of the immediate nodes to the left, and the immediate nodes to the right:

s

```

struct Node {

   value: V,

   lefts: Set,

   rights: Set,

}

```

E.g.

Inserting 'a', 'b', 'c' in sequential order produces the graph:

```

 a <- b <- c

```

Inserting 'd' between 'a' and 'b'

```

a <- d -> b <- c

   \_____/

```

We linearize these Hash Graphs by performing a biased topological sort.

The bias is used to decide a canonical ordering in cases where multiple linearizations satisfy the left/right constraints.

E.g.

```

            s - a - m

           /         \

h - i - ' '           ! - !

           \         /

            d - a - n

```

The above hash-graph can serialize to `hi samdan!!` or `hi dansam` or even any interleaving of sam/dan: `hi sdaamn`, `hi sdanam`, ... . We need a canonical ordering that preserves some semantic information, (i.e. no interleaving of concurrent runs)

The choice we make is: in a fork, we choose the branch whose starting element has the smaller hash, then to avoid interleaving of concurrent runs, our topological sort runs depth first rather than the traditional breadth first.

So in the above example, assuming `hash(s)` < `hash(d)`, we'd get is: `hi samdan!!`.

## Optimizations:

If we detect hash-chains, we can collabse them to just the first left hashes and the right hashes:

```rust

struct Run {

   run: Vec

   lefts: Set

   rights: Set

}

```

i.e. in the first example, a,b,c are sequential, they all have a common right hand (empty set), and their left hand is the previous element in the sequence.

So we could represent this as:

```rust

// a <- b <- c == RUN("abc")

Run {

  run: "abc",

  lefts: {},

  rights: {}

}

```

Inserting 'd' splits the run:

```

a <- d -> RUN("bc")

   \_____/

```

And the fork example:

```

           RUN("sam")

          /          \

RUN("hi ")            RUN("!!")

          \          /

           RUN("dan")

```

This way we only store hashes at forks, the rest can be recomputed when necessary.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/davidrusu/hashseq

Awesome Lists containing this project

README