{"id":13568499,"url":"https://github.com/datacrypt-project/hitchhiker-tree","last_synced_at":"2025-04-12T16:35:31.434Z","repository":{"id":143638790,"uuid":"52108968","full_name":"datacrypt-project/hitchhiker-tree","owner":"datacrypt-project","description":"Functional, persistent, off-heap, high performance data structure","archived":false,"fork":false,"pushed_at":"2018-07-22T21:01:33.000Z","size":393,"stargazers_count":1192,"open_issues_count":17,"forks_count":65,"subscribers_count":53,"default_branch":"master","last_synced_at":"2025-04-03T16:13:53.587Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datacrypt-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-02-19T18:49:16.000Z","updated_at":"2025-03-24T07:59:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"f3d36e50-fc94-4c31-a337-7356e359e62e","html_url":"https://github.com/datacrypt-project/hitchhiker-tree","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacrypt-project%2Fhitchhiker-tree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacrypt-project%2Fhitchhiker-tree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacrypt-project%2Fhitchhiker-tree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacrypt-project%2Fhitchhiker-tree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datacrypt-project","download_url":"https://codeload.github.com/datacrypt-project/hitchhiker-tree/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248597139,"owners_count":21130818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:00:26.765Z","updated_at":"2025-04-12T16:35:31.414Z","avatar_url":"https://github.com/datacrypt-project.png","language":"Clojure","funding_links":[],"categories":["Clojure","Advanced datastructures","others"],"sub_categories":[],"readme":"# Hitchhiker Tree\n\nHitchhiker trees are a newly invented (by @dgrnbrg) datastructure, synthesizing fractal trees and functional data structures, to create fast, snapshottable, massively scalable databases.\n\n[Watch the talk from Strange Loop](https://www.youtube.com/watch?v=jdn617M3-P4) to learn more, especially about the concept!\n\n## What's in this Repository?\n\nThe hitchhiker namespaces contain a complete implementation of a persistent, serializable, lazily-loaded hitchhiker tree.\nThis is a sorted key-value datastructure, like a scalable `sorted-map`.\nIt can incrementally persist and automatically lazily load itself from any backing store which implements a simple protocol.\n\nOutboard is a sample application for the hitchhiker tree.\nIt includes an implementation of the IO subsystem backed by Redis, and it manages all of the incremental serialization and flushing.\n\nThe hitchhiker tree is designed very similarly to how Datomic's backing trees must work--I would love to see integration with [DataScript](https://github.com/tonsky/datascript) for a fully open source [Datomic](http://www.datomic.com).\n\n## Outboard\n\nOutboard is a simple API for your Clojure applications that enables you to make use of tens of gigabytes of local memory, far beyond what the JVM can manage.\nOutboard also allows you to restart your application and reuse all of that in-memory data, which dramatic reduces startup times due to data loading.\n\nOutboard has a simple API, which may be familiar if you've ever used Datomic.\nUnlike Datomic, however, Outboard trees can be \"forked\" like git repositories, not just transacted upon.\nOnce you've created a tree, you can open a connection to it.\nThe connection mediates all interactions with the outboard data:\nit can accept transactions, provide snapshots for querying, and be cloned.\n\n### API Usage Example\n\n```clojure\n(require '[hitchhiker.outboard :as ob])\n\n;; First, we'll create a connection to a new outboard\n(def my-outboard (ob/create \"first-outboard-tree\"))\n\n;; We'll get a snapshot of the outboard's current state, which is empty for now\n;; Note that snapshots are only valid for 5 seconds, but making a new snapshot is free\n;; It would be easy to write an \"extend-life\" function for snapshots\n(def first-snapshot (ob/snapshot my-outboard))\n\n;; This will insert the pair \"hello\" \"world\" only into the snapshot\n(-\u003e first-snapshot\n    (ob/insert \"hello\" \"world\")\n    (ob/lookup \"hello\"))\n;;=\u003e \"world\"\n\n;; Inserts must be done in a transaction to persist\n(-\u003e (ob/snapshot my-outboard)\n    (ob/lookup \"hello\"))\n;;=\u003e nil\n\n;; We can insert some data into it via a transaction\n;; The update! function is atomic, just like swap! for atoms\n;; update! will pass its transaction function a snapshot of the outboard\n(ob/update! my-outboard (fn [snapshot] (ob/insert snapshot \"goodbye\" \"moon\")))\n\n;; Since the insert was transacted, it persists\n(-\u003e (ob/snapshot my-outboard)\n    (ob/lookup \"goodbye\"))\n;;=\u003e \"moon\"\n\n;; If you'd like, you can \"fork\" an outboard. Let's fork our outboard.\n;; To fork, you just save a snapshot under a new name\n(def forked-outboard (ob/save-as (ob/snapshot my-outboard) \"forked-outboard\"))\n\n;; Now, we can transact into the snapshot, which will not affect other forks\n(ob/update! forked-outboard (fn [snapshot] (ob/insert snapshot \"goodbye\" \"sun\")))\n\n;; As we can see:\n(-\u003e (ob/snapshot my-outboard)\n    (ob/lookup \"goodbye\"))\n;;=\u003e \"moon\"\n(-\u003e (ob/snapshot forked-outboard)\n    (ob/lookup \"goodbye\"))\n;;=\u003e \"sun\"\n```\n\nYou should check out the docstrings/usage of these functions, too:\n\n- `close` will gracefully shut down an outboard connection\n- `open` will reopen an outboard (you can only create outboards which don't exist)\n- `destroy` will delete all data related to the closed, named outboard\n- `lookup` and `lookup-fwd-iter` provide single and ordered sequence access to snapshots\n\n## Background\n\nOutboard is an off-heap functionally persistent sorted map.\nThis map allows your applications to retain huge data structures in memory across process restarts.\n\nOutboard is the first library to make use of hitchhiker trees.\nHitchhiker trees are a functionally persistent, serializable, off-heap fractal B tree.\nThey can be extended to contain a mechanism to make statistical analytics blazingly fast, and to support column-store facilities.\n\nDetails about hitchhiker trees, including related work, can be found in `docs/hitchhiker.adoc`.\n\n## Testing\n\nYou'l need a local Redis instance running to run the tests. Once you have it, just run  \n \n    lein test\n\n\n## Benchmarking\n\nThis library includes a detailed, instrumented benchmarking suite.\nIt's built to enable comparative benchmarks between different parameters or code changes, so that improvements to the structure can be correctly categorized as such, and bottlenecks can be reproduced and fixed.\n\nTo try it, just run\n\n    lein bench\n\nThe benchmark tool supports testing with different parameters, such as:\n\n- The tree's branching factor\n- Whether to enable fractal tree features, just use the B-tree features, or compare to a vanilla Clojure sorted map\n- Reordering of delete operations (to stress certain workloads)\n- Whether to use the in-memory or Redis-backed implementation\n\nThe benchmarking tool is designed to make it convenient to run several benchmarks;\neach benchmark's parameters can be separate by a `--`.\nThis makes it easy to understand the characteristics of the hitchhiker tree over a variety of settings for a parameter.\n\nYou can run a more sophisticated experiment benchmark by doing\n\n    lein bench OUTPUT_DIR options -- options-for-2nd-experiment -- options-for-3rd-experiment\n\nThis generates an Excel workbooks called \"analysis.xlsx\" with benchmark results.\nFor instance, if you'd like to run experiments to understand the performance difference between various values of B (the branching factor), you can do:\n\n    lein bench perf_diff_experiment -b 10 -- -b 20 -- -b 40 -- -b 80 -- -b 160 -- -b 320 -- -b 640\n\nAnd it will generate lots of data and the Excel workbook for analysis.\n\nIf you'd like to see the options for the benchmarking tool, just run `lein bench`.\n\n## Technical details\n\nSee the `doc/` folder for technical details of the hitchhiker tree and Redis garbage collection system.\n\n## Gratitude\n\nThanks to the early reviewers, Kovas Boguta \u0026 Leif Walsh.\nAlso, thanks to Tom Faulhaber for making the Excel analysis awesome!\n\n## License\n\nCopyright © 2016 David Greenberg\n\nDistributed under the Eclipse Public License version 1.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacrypt-project%2Fhitchhiker-tree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatacrypt-project%2Fhitchhiker-tree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacrypt-project%2Fhitchhiker-tree/lists"}