https://github.com/greglook/blocks
Clojure content-addressable data storage.
https://github.com/greglook/blocks
clojure content-addressable-storage storage
Last synced: 10 months ago
JSON representation
Clojure content-addressable data storage.
- Host: GitHub
- URL: https://github.com/greglook/blocks
- Owner: greglook
- License: unlicense
- Created: 2015-09-25T05:15:22.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2024-03-22T17:14:23.000Z (almost 2 years ago)
- Last Synced: 2024-04-24T05:50:30.558Z (almost 2 years ago)
- Topics: clojure, content-addressable-storage, storage
- Language: Clojure
- Homepage:
- Size: 730 KB
- Stars: 110
- Watchers: 9
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
Block Storage
=============
[](https://circleci.com/gh/greglook/blocks)
[](https://codecov.io/gh/greglook/blocks)
[](https://cljdoc.org/d/mvxcvi/blocks/)
This library implements [content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage)
types and protocols for Clojure. Content-addressable storage has several useful properties:
- Data references are abstracted away from the knowledge of where and how the
blocks are stored, and so can never be 'stale'.
- Blocks are immutable, so there's no concern over having the 'latest version'
of something - you either have it, or you don't.
- References are _secure_, because a client can re-compute the digest to ensure
they have received the original data unaltered.
- Synchronizing data between stores only requires enumerating the stored blocks
in each and exchanging missing ones.
- Data can be structurally shared by different higher-level constructs. For
example, a file's contents can be referenced by different versions of
metadata without duplicating the file data.
## Installation
Library releases are published on Clojars. To use the latest version with
Leiningen, add the following dependency to your project definition:
[](http://clojars.org/mvxcvi/blocks)
## Block Values
A _block_ is a sequence of bytes identified by the cryptographic digest of its
content. All blocks have an `:id` and a `:size` - the block identifier is a
[multihash](//github.com/greglook/clj-multiformats) value, and the size is the
number of bytes in the block content. Blocks may also have a `:stored-at`
value, which is the instant the backing store received the block.
```clojure
=> (require '[blocks.core :as block])
;; Read a block into memory:
=> (def hello (block/read! "hello, blocks!"))
#'user/hello
=> hello
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684",
:size 14,
:stored-at #inst "2019-02-18T07:02:28.751Z"}
=> (:id hello)
#multi/hash "hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684",
=> (:size hello)
14
;; Write a block to some output stream:
=> (let [baos (java.io.ByteArrayOutputStream.)]
(block/write! hello baos)
(String. (.toByteArray baos)))
"hello, blocks!"
```
Internally, blocks either have a buffer holding the data in memory, or a reader
which can be invoked to create new input streams for the block content. A block
with in-memory content is a _loaded block_ while a block with a reader is a
_lazy block_.
```clojure
=> (block/loaded? hello)
true
;; Create a block from a local file:
=> (def readme (block/from-file "README.md"))
#'user/readme
;; Block is lazily backed by the file on disk:
=> (block/loaded? readme)
false
=> (block/lazy? readme)
true
```
To abstract over the loaded/lazy divide, you can create an input stream over a
block's content using `open`:
```clojure
=> (slurp (block/open hello))
"hello, blocks!"
;; You can also provide a start/end index to get a range of bytes:
=> (with-open [content (block/open readme {:start 0, :end 32})]
(slurp content))
"Block Storage\n=============\n\n[ deferred values. If you want
to treat them synchronously, deref the responses immediately.
```clojure
;; Create a new memory store:
=> (require 'blocks.store.memory)
=> (def store (block/->store "mem:-"))
#'user/store
=> store
#blocks.store.memory.MemoryBlockStore {:memory #}
;; Initially, the store is empty:
=> (block/list-seq store)
()
;; Lets put our blocks in the store so they don't get lost:
=> @(block/put! store hello)
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684",
:size 14,
:stored-at #inst "2019-02-18T07:06:43.655Z"}
=> @(block/put! store readme)
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa",
:size 8597,
:stored-at #inst "2019-02-18T07:07:06.458Z"}
;; We can `stat` block ids to get metadata without content:
=> @(block/stat store (:id hello))
{:id #multi/hash "hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa",
:size 14,
:stored-at #inst "2019-02-18T07:07:06.458Z"}
;; `list` returns the blocks, and has some basic filtering options:
=> (block/list-seq store :algorithm :sha2-256)
(#blocks.data.Block
{:id #multi/hash "hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa",
:size 8597,
:stored-at #inst "2019-02-18T07:07:06.458Z"}
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684",
:size 14,
:stored-at #inst "2019-02-18T07:06:43.655Z"})
;; Use `get` to fetch blocks from the store:
=> @(block/get store (:id readme))
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa",
:size 8597,
:stored-at #inst "2019-02-18T07:07:06.458Z"}
;; You can also store them directly from a byte source like a file:
=> @(block/store! store (io/file "project.clj"))
#blocks.data.Block
{:id #multi/hash "hash:sha2-256:95344c6acadde09ecc03a7899231001455690f620f31cf8d5bbe330dcda19594",
:size 2013,
:stored-at #inst "2019-02-18T07:11:12.879Z"}
=> (def project-hash (:id *1))
#'user/project-hash
;; Use `delete!` to remove blocks from a store:
=> @(block/delete! store project-hash)
true
;; Checking with stat reveals the block is gone:
=> @(block/stat store project-hash)
nil
```
### Implementations
This library comes with a few block store implementations built in:
- `blocks.store.memory` provides an in-memory map of blocks for transient
block storage.
- `blocks.store.file` provides a simple one-file-per-block store in a local
directory.
- `blocks.store.buffer` holds blocks in one store, then flushes them to another.
- `blocks.store.replica` stores blocks in multiple backing stores for
durability.
- `blocks.store.cache` manages two backing stores to provide an LRU cache that
will stay under a certain size limit.
Other storage backends are provided by separate libraries:
- [blocks-s3](//github.com/greglook/blocks-s3) backed by a bucket in Amazon S3.
These storage backends exist but aren't compatible with 2.X yet:
- [blocks-adl](//github.com/amperity/blocks-adl) backed by Azure DataLake store.
- [blocks-blob](//github.com/amperity/blocks-blob) backed by Azure Blob Storage.
- [blocks-monger](//github.com/20centaurifux/blocks-monger) backed by MongoDB.
## Block Metrics
The `blocks.meter` namespace provides instrumentation for block stores to
measure data flows, call latencies, and other metrics. These measurements are
built around the notion of a _metric event_ and an associated _recording
function_ on the store which the events are passed to. Each event has a
namespaced `:type` keyword, a `:label` associated with the store, and a numeric
`:value`. The store currently measures the call latencies of the storage methods
as well as the flow of bytes into or out of a store's blocks.
To enable metering, set a `::meter/recorder` function on the store. The function
will be called with the store itself and each metric event. The `:label` on each
event is derived from the store - it will use the store's class name or an
explicit `::meter/label` value if available.
## License
This is free and unencumbered software released into the public domain.
See the UNLICENSE file for more information.