https://github.com/dinosaure/art
Adaptive Radix Tree in OCaml
https://github.com/dinosaure/art
Last synced: 10 months ago
JSON representation
Adaptive Radix Tree in OCaml
- Host: GitHub
- URL: https://github.com/dinosaure/art
- Owner: dinosaure
- License: mit
- Created: 2019-11-19T17:11:40.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2024-08-06T08:01:32.000Z (over 1 year ago)
- Last Synced: 2025-02-27T01:53:00.623Z (11 months ago)
- Language: OCaml
- Size: 693 KB
- Stars: 48
- Watchers: 5
- Forks: 1
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-list - art
README
## Adaptive Radix Tree (ART) in OCaml
This is an implementation in OCaml of [ART][ART]. Adaptive Radix Tree is like a
simple `Hashtbl` with order:
```ocaml
# let tree = Art.make () ;;
# Art.insert tree (Art.key "foo") 42 ;;
# Art.insert tree (Art.key "bar") 21 ;;
# Art.find tree (Art.key "foo")
- : int = 42
```
Operation like `minimum` or `maximum` are available (which don't exist for a
simple `Hashtbl.t`):
```ocaml
# let tree = Art.make () ;;
# Art.insert tree (Art.key "0") 0 ;;
# Art.insert tree (Art.key "1") 1 ;;
# Art.insert tree (Art.key "2") 2 ;;
# Art.minimum tree
- : int = 0
# Art.maximum tree
- : int = 2
```
If you want the order and the speed of `Hashtbl.t`, Art is your library:
- Benchmark on [`find`][find-bechamel]
- Benchmark on [`insert`][insert-bechamel]
The function `prefix_iter` is also available if you want to get a subset of your
tree:
```ocaml
# let t = Art.make () ;;
# Art.insert t (Art.key
# Art.insert t (Art.key "Dalton Joe") 0 ;;
# Art.insert t (Art.key "Dalton Jack") 1 ;;
# Art.insert t (Art.key "Dalton William") 2 ;;
# Art.insert t (Art.key "Dalton Averell") 3 ;;
# Art.insert t (Art.key "Rantanplan") 4 ;;
# let dalton = Art.prefix_iter ~prefix:(Art.key "Dalton")
(fun k _ a -> (k :> string) :: a) [] t ;;
- : string list = [ "Dalton Joe"
; "Dalton Jack"
; "Dalton William"
; "Dalton Averell" ]
```
## Read Optimised Write Exclusion (ROWEX) in OCaml
ROWEX is a second implementation of ART with atomic operations. It's a _functor_
which expects an implementation of atomic operations such as `load` or `store`.
### Parallelism, atomic operation & OCaml
The current version of OCaml has a global lock for the GC. By this way, it's not
possible for us to execute ROWEX operations (`find`/`insert`) with true
parallelism if we use the same OCaml runtime. Even if you use LWT or ASYNC, you
execute jobs concurrently.
However, ROWEX wants to provide an implementation where `find`/`insert` can be
executed in parallel without any problems (race condition or ABA problem). So
ROWEX provides an implementation, `persistent`, which implements atomic
operations on a memory area. Then, we are able, as [`parmap`][parmap], to
simulate true parallelism as long as each operations are executed into their own
[`fork()`][fork].
The goal of this library is provide:
- the most easy way to switch the implementation to
[ocaml-multicore][ocaml-multicore]
- a baby step to be able to manipulate a file by several processes
(consumers/`find`, producers/`insert`) in parallel
ROWEX follows two main papers:
- The initial implementation of [ROWEX][ROWEX]
- A derivation of it to be **persistent**: [PART][PART]
### Tools
The distribution comes with some tools to manipulate an _index_:
```sh
$ opam pin add -y https://github.com/dinosaure/art
$ opam install rowex
$ part.make index.idx
$ ls -lh
-rw-r--r-- 1 user user 8,0M ----- -- --:-- index.idx
prw------- 1 user user 0 ----- -- --:-- index.idx.socket
prw------- 1 user user 0 ----- -- --:-- index.idx-truncate.socket
$ part.insert index.idx foo 1
$ part.find index.idx foo
1
```
On the OCaml side, a `Part` module exists which implements these functions:
```ocaml
type 'a t constraint 'a = [< `Rd | `Wr ]
val create : ?len:int -> string -> unit
val insert : [> `Rd | `Wr ] t -> string -> int -> unit
val lookup : [> `Rd ] t -> string -> int
```
`part` is Unix dependent (and it need an **Unix named pipe**). It ensures with
explained internal mechanisms to use multiple readers and one writer:
- The _writer_ can take the exclusive ownership on the index file and its named
pipe
- _readers_ don't need to take the ownership but they must send a signal into
the named pipe (to the _writer_) that they start to introspect the index
For _readers_, some functions exist to signal their existence to the _write_:
```ocaml
val append_reader : Ipc.t -> unit
val delete_reader : Ipc.t -> unit
val ipc : _ t -> Ipc.t
```
### Status: experimental
This part of the distribution is **experimental** - even if the distribution
comes with several tests to ensure that the implementation works, ROWEX is
fragile! It still need a synchronization mechanism `fsync()` which is added
pervasively in some parts of the code according to outcomes of errors.
[ART]: https://db.in.tum.de/~leis/papers/ART.pdf
[ROWEX]: https://db.in.tum.de/~leis/papers/artsync.pdf
[PART]: https://arxiv.org/pdf/1909.13670.pdf
[find-bechamel]: https://dinosaure.github.io/art/bench/find.html
[insert-bechamel]: https://dinosaure.github.io/art/bench/insert.html
[parmap]: https://github.com/rdicosmo/parmap
[fork]: https://man7.org/linux/man-pages/man2/fork.2.html
[ocaml-multicore]: https://github.com/ocaml-multicore/ocaml-multicore