Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/junjizhi/lww-element-set

An Python implementation of Last-Writer-Wins Element Set.
https://github.com/junjizhi/lww-element-set

Last synced: 12 days ago
JSON representation

An Python implementation of Last-Writer-Wins Element Set.

Awesome Lists containing this project

README

        

# lww-element-set

This is a Last-Writer-Wins Element Set(lww-set) data
structure implemented in Python language. Conceptually, lww-set is an
Operation-based [Conflict-Free Replicated Date
Type](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type)(CRDT). It
can achieve strong eventual consistency and
monotonicy (i.e., no rollbacks).

The repo includes lww_python, an implementation based on Python
dicitonaries (non-distributed) and lww_redis, implemented based on
Redis sorted set (ZSET). Both implementations are
thread-safe. lww_redis can be a building block for a time-series event
storage layer like [Roshi](https://github.com/soundcloud/roshi). Below
we generally refer to both implementations as "lww-set".

### Operations
The lww-set provides the following API:

- add(element, timestamp): Adds an *element* associated with its *timestamp* in the set.
- remove(element, timestamp): Removes an *element* associated with its
*timestamp* in the set.
- exist(element): returns True if the *element* exists, False
otherwise. The criteria is whether the *element*'s most recent
operation was an add.
- get(): returns a list of existing elements in *lww-set*.

lww-set consists of two separate underlying sets: *add_set* and
*remove_set*. Each set records entries of (element, timestamp). When
inserting an element to add_set or remove_set, create a new entry if
the element does not exist. Otherwise, update the corresponding
entry's timestamp if it is more recent (i.e., the timestamp
argument is larger).

In order to satisfy the CRDT properties, i.e., *Associativity*,
*Commutativity* and *Idempotence*, lww-set defines the following
combinations of operations and their resulting states.

| Original state | Operation | Resulting state |
|----------------|-------------|-----------------|
| A(a,1) R() | add(a,0) | A(a,1) R() |
| A(a,1) R() | add(a,1) | A(a,1) R() |
| A(a,1) R() | add(a,2) | A(a,2) R() |
| A() R(a,1) | add(a,0) | A(a,0) R(a,1) |
| A() R(a,1) | add(a,1) | A(a,1) R(a,1) |
| A() R(a,1) | add(a,2) | A(a,2) R(a,1) |
| A() R(a,1) | remove(a,0) | A() R(a,1) |
| A() R(a,1) | remove(a,1) | A() R(a,1) |
| A() R(a,1) | remove(a,2) | A() R(a,2) |
| A(a,1) R() | remove(a,0) | A(a,1) R(a,0) |
| A(a,1) R() | remove(a,1) | A(a,1) R(a,1) |
| A(a,1) R() | remove(a,2) | A(a,1) R(a,2) |

Note: The above table is different from the one in
[Roshi](https://github.com/soundcloud/roshi) which does instant
garbage collection. The above table makes
sure lww-set meets strictly the *Commutativity* property, i.e., a+b
= b+a, the order of applying operations does not matter. However,
because there is no garbage collection, there can be redundant copies
of elements in both add_set and remove_sets, which wastes space.

### Semantics of lww-set add/remove operations

Unlike any other usual
set data structure, lww-set's add/remove operation semantic is
slightly different. When you call add() or remove(), what actually
happens internally under the hood is record the add/remove operations
in the underlying sets. The below two cases are possible

- A remove() operation of (element, timestamp) is recorded in
remove_set before the corresponding element exists in add_set.
- A previously existing element has been deleted in lww-set, i.e., the
element in remove_set has a higher timestamp than in
add_set. However, a client can still issue the remove() on that
element multiple times and the timestamp of that element in
remove_set will be updated multiple times.

These above scanarios are allowed in lww-set because we position
lww-set as a layer where operations come in asynchronously. Because of
the CRDT properties, the order of the operation arrival time does not
matter, so all replicated lww-set will eventually converge.

#### Return Values of add/remove()

In the API, what value should lww-set remove operation return? We are
facing the dilemma of using usual set semantic vs. lww-set state
changes.

For example, when add set and remove set are both empty, i.e., A()
R(), we perform Remove(a,1) operation on the lww-set. From an external
view, we are deleting a non-existing element in lww-set, so it should
return False in this semantic. But what actually happens is remove set
will record an R operation and the internal result is changed to: A()
R(a,1). So it indeed **changes** the state of lww-set and it should
return True.

If the return value is based on the usual set add/remove semantic,
then it may confuse the users because some other processes calling the
same method concurrently may have overwritten the element (with a more
recent timestamp).

If the return value is based on whether lww-set internal state has
been changed, then we probably should we enumerate all return values
for all possible scenarios(the table above). However, exposing the
internal states of lww-set violates the encapsulation.

In this API, the return value of these two functions is boolean
value. If it returns True, it only indicates whether the request is
acknowledged and processed by the data structure according to CRDT
semantic, but there is no guarantee it has take effect. If it returns
False, it could be due to a network connection problem and a retry may
solve the problem.

#### Eventuality
For add() or remove(), if the timestamp argument is
the most recent (compared with all other elements), the operation will
eventually succeed. Otherwise, when other processes or threads are
invoking add() or remove() operations concurrently , the timestamp of
an element may be overwritten by another newer timestamp. Therefore,
each client won't know whether its add() or remove() operations
succeeds because it does not have the ''big picture'' of whether there
is more recent operation. The return value is always none and does
not indicate the success of this operation. The only way to check is
by calling exist() or get().

However, when calling exist() or get(), if no other processes/threads
is calling add() or remove() concurrently, it is guaranteed that
exist() or get() always returns the most recent result. But when there
are concurrent operations, there is no such guarantees and it is
possible to get an out-dated result. After all on-the-fly operations
complete, calling exist() or get() will eventually return the
up-to-date result.

## Implementation Details

### Element and timestamp value type
All ``element`` arguments are converted to strings and used as keys
to index into the set. The code will simply call ``str(element)`` to try
to convert it into a string.
It is the users' responsibility to design
the element-to-string conversion schema. If the users
decides to use built-in types such as
int, long, float as element, the conversion is natural. However, for more
complex object types, users may need to overwrite
the ``__str__`` functions to return the object identity string. Similar to Redis, we limit the element string to be [512 Megabytes in length](http://redis.io/topics/data-types).

On the other hand, ``timestamp`` is stored as a float value.

### Synchronization considerations

#### Underlying sets and locking in add() and remove()

For lww_python, the two underlying sets are implemented in Python dictionaries. Since
Python's GIL guarantees that only one thread runs at a time,
individual dictionary operations, for example, D[x] = y, are [atomic
and
thread-safe](http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm).

However, there is no atomic test & set operations in Python
dictionary. For lww-set add/remove operations which is to insert
operations to add_sets/remove_sets, we need to protect the critical
region (like the one below) from data race.

```
# Element test & set region
# race condition could happen in below snippet without locking
if element in target_set:
current_timestamp = target_set[element]
if current_timestamp < timestamp:
target_set[element] = timestamp
else:
target_set[element] = timestamp
```

Similar design is used to lww_redis. The operations used are mainly
ZADD, ZSCORE and ZRANGE. The Redis client used is
[redis.py](https://pypi.python.org/pypi/redis). Since ZADD simply
updates the score of an existing number, it has the same semantic as
Python dictionaries. For test & set semantics in lww_redis, we still
need to use locks to protect add()/remove() operations.

#### No lock in exist() or get()

For exist() or get() implementations, they only need to read values
from the Python dictionaries, i.e.,add_set and remove_set. Because
each dictionary read operation in Python is atomic (dicussed above), there is
no need to protect them with a lock.

The similar observation applies to Redis. Up to the current
version(3.0.7), Redis instance is single threaded and each individual
command is atomic, just like Python dictionaries. Therefore,
we do not protect the ZSET read operations, i.e., ZSCORE, ZRANGE, with
locks.

The implication is that there is no guarantee that exist() or get()
returns the most up-to-date results when there is concurrent write
operations on the fly. But the results will be up-to-date eventually.

## Usage Examples
```
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
lww = LWW_redis(r)
#lww = LWW_python() # lww based on python dict
a = 1
lww.add(a,1)
lww.exist(str(a)) # Should return True
lww.get() # Should return ['1']
lww.remove(a,2)
lww.exist(str(a)) # Should return False
lww.get() # Should return []
```

## Future work

lww_redis can be a building block for a time-series event storage
layer like [Roshi](https://github.com/soundcloud/roshi). It is a thin
layer that supports fast, scalable and reliable writes to a popular
value (e.g., a twitter update from a celebrity) whose updates will
propagate to a large number of followers. To do so, lww_redis needs
the help of the following components:

- Data sharding that partitions dataset horizontally to achieve better
load balancing.
- Data repair in case of replica instance or network failures. Because of the CRDT properties,
defining policies to update out-dated replicas is straight-forwrad.
- A server component that exposes proper APIs (e.g., HTTP REST) for clients.

## References
1. [Conflict-Free Replicated Date Type](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) Wikipedia page.
2. [Redis Replication](http://redis.io/topics/replication)
3. [Roshi](https://github.com/soundcloud/roshi), an LWW-element-set CRDT implemented in Go by SoundCloud.