An open API service indexing awesome lists of open source software.

https://github.com/josephg/editing-traces

Real world text editing traces for benchmarking CRDT and Rope data structures
https://github.com/josephg/editing-traces

benchmarking collaborative-editing crdt crdts rope-string text-editor

Last synced: about 1 year ago
JSON representation

Real world text editing traces for benchmarking CRDT and Rope data structures

Awesome Lists containing this project

README

          

# What is this?

This repository contains some editing histories from real world character-by-character editing traces. The goal of this repository is to provide some standard benchmarks that we can use to compare the performance of rope libraries and various OT / CRDT implementations.

## Where is the data?

This repository stores 2 kinds of data, in 2 subdirectories:

#### [Sequential Traces](sequential_traces/)

The [sequential_traces](sequential_traces/) folder contains a set of simple editing traces where all the edits can be applied in sequence to produce a final text document.

Most of these data sets come from individual users typing into text documents. Each editing event (keystroke) has been recorded so they can be replayed later.

Some of these traces are generated by linearizing ("flattening") the concurrent traces (below). Regardless, the data format is the same.

These traces are super simple to replay - just apply each change, one by one, into an empty document and you'll get the expected output.

See [sequential_traces/README.md](sequential_traces/README.md) for detail on the data format used and other notes.

These traces are useful for benchmarking how CRDTs behave when there is only a single user making changes to a text document. Or benchmarking rope libraries.

These data sets describe their editing positions using unicode character offsets. If you don't want to think about unicode offsets while benchmarking, use the [`ascii_only`](sequential_traces/ascii_only) variants of these traces. In the ascii variants, all non-ascii inserts have been replaced with the underscore character.

#### [Concurrent Traces](concurrent_traces/)

The [concurrent_traces](concurrent_traces/) folder contains editing traces where multiple users typed into a shared text document concurrently. (Concurrently means, they were typing at the same time).

These traces are much harder to replay, because each editing position listed in the file is relative to the version of the document on that user's computer when they were typing. This complexity is, unfortunately, necessary to replay a collaborative editing session between multiple users. - Which is what we need when benchmarking text based CRDTs.

See [concurrent_traces/README.md](concurrent_traces/README.md) for detail on the data format used and notes.