{"id":20780769,"url":"https://github.com/greglook/blocks","last_synced_at":"2025-04-07T11:09:44.592Z","repository":{"id":57713866,"uuid":"43111008","full_name":"greglook/blocks","owner":"greglook","description":"Clojure content-addressable data storage.","archived":false,"fork":false,"pushed_at":"2024-03-22T17:14:23.000Z","size":748,"stargazers_count":110,"open_issues_count":1,"forks_count":6,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-04-24T05:50:30.558Z","etag":null,"topics":["clojure","content-addressable-storage","storage"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greglook.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-25T05:15:22.000Z","updated_at":"2024-03-22T17:04:52.000Z","dependencies_parsed_at":"2024-11-18T00:31:28.334Z","dependency_job_id":null,"html_url":"https://github.com/greglook/blocks","commit_stats":{"total_commits":500,"total_committers":5,"mean_commits":100.0,"dds":"0.21999999999999997","last_synced_commit":"c962d4431c05ac686a0cd1145e05bc0a0e0c4d7d"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greglook%2Fblocks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greglook%2Fblocks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greglook%2Fblocks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greglook%2Fblocks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greglook","download_url":"https://codeload.github.com/greglook/blocks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247640465,"owners_count":20971557,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","content-addressable-storage","storage"],"created_at":"2024-11-17T13:39:08.592Z","updated_at":"2025-04-07T11:09:44.564Z","avatar_url":"https://github.com/greglook.png","language":"Clojure","funding_links":[],"categories":["大数据"],"sub_categories":["Spring Cloud框架"],"readme":"Block Storage\n=============\n\n[![CircleCI](https://circleci.com/gh/greglook/blocks.svg?style=shield\u0026circle-token=d652bef14116ac200c225d12b6c7af33933f4c26)](https://circleci.com/gh/greglook/blocks)\n[![codecov](https://codecov.io/gh/greglook/blocks/branch/master/graph/badge.svg)](https://codecov.io/gh/greglook/blocks)\n[![cljdoc lib](https://img.shields.io/badge/cljdoc-lib-blue.svg)](https://cljdoc.org/d/mvxcvi/blocks/)\n\nThis library implements [content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage)\ntypes and protocols for Clojure. Content-addressable storage has several useful properties:\n\n- Data references are abstracted away from the knowledge of where and how the\n  blocks are stored, and so can never be 'stale'.\n- Blocks are immutable, so there's no concern over having the 'latest version'\n  of something - you either have it, or you don't.\n- References are _secure_, because a client can re-compute the digest to ensure\n  they have received the original data unaltered.\n- Synchronizing data between stores only requires enumerating the stored blocks\n  in each and exchanging missing ones.\n- Data can be structurally shared by different higher-level constructs. For\n  example, a file's contents can be referenced by different versions of\n  metadata without duplicating the file data.\n\n\n## Installation\n\nLibrary releases are published on Clojars. To use the latest version with\nLeiningen, add the following dependency to your project definition:\n\n[![Clojars Project](http://clojars.org/mvxcvi/blocks/latest-version.svg)](http://clojars.org/mvxcvi/blocks)\n\n\n## Block Values\n\nA _block_ is a sequence of bytes identified by the cryptographic digest of its\ncontent. All blocks have an `:id` and a `:size` - the block identifier is a\n[multihash](//github.com/greglook/clj-multiformats) value, and the size is the\nnumber of bytes in the block content. Blocks may also have a `:stored-at`\nvalue, which is the instant the backing store received the block.\n\n```clojure\n=\u003e (require '[blocks.core :as block])\n\n;; Read a block into memory:\n=\u003e (def hello (block/read! \"hello, blocks!\"))\n#'user/hello\n\n=\u003e hello\n#blocks.data.Block\n{:id #multi/hash \"hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684\",\n :size 14,\n :stored-at #inst \"2019-02-18T07:02:28.751Z\"}\n\n=\u003e (:id hello)\n#multi/hash \"hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684\",\n\n=\u003e (:size hello)\n14\n\n;; Write a block to some output stream:\n=\u003e (let [baos (java.io.ByteArrayOutputStream.)]\n     (block/write! hello baos)\n     (String. (.toByteArray baos)))\n\"hello, blocks!\"\n```\n\nInternally, blocks either have a buffer holding the data in memory, or a reader\nwhich can be invoked to create new input streams for the block content.  A block\nwith in-memory content is a _loaded block_ while a block with a reader is a\n_lazy block_.\n\n```clojure\n=\u003e (block/loaded? hello)\ntrue\n\n;; Create a block from a local file:\n=\u003e (def readme (block/from-file \"README.md\"))\n#'user/readme\n\n;; Block is lazily backed by the file on disk:\n=\u003e (block/loaded? readme)\nfalse\n\n=\u003e (block/lazy? readme)\ntrue\n```\n\nTo abstract over the loaded/lazy divide, you can create an input stream over a\nblock's content using `open`:\n\n```clojure\n=\u003e (slurp (block/open hello))\n\"hello, blocks!\"\n\n;; You can also provide a start/end index to get a range of bytes:\n=\u003e (with-open [content (block/open readme {:start 0, :end 32})]\n     (slurp content))\n\"Block Storage\\n=============\\n\\n[![\"\n```\n\nA block's properties and content cannot be changed after construction, but\nblocks do support metadata. In order to guard against the content changing in\nthe underlying storage layer, blocks can be validated by re-reading their\ncontent:\n\n```clojure\n;; In-memory blocks will never change:\n=\u003e (block/validate! hello)\nnil\n\n;; But if the README file backing the second block is changed:\n=\u003e (block/validate! readme)\n; IllegalStateException Block hash:sha2-256:515c169aa0d95... has mismatched content\n;   blocks.core/validate! (core.clj:115)\n\n;; Metadata can be set and queried:\n=\u003e (meta (with-meta readme {:baz 123}))\n{:baz 123}\n```\n\n\n## Storage Interface\n\nA _block store_ is a system which saves and retrieves block data. Block stores\nhave a very simple interface: they must store, retrieve, and enumerate the\ncontained blocks. The simplest type of block storage is a memory store, which is\nbacked by a map in memory. Another basic example is a store backed by a local\nfilesystem, where blocks are stored as files in a directory.\n\nThe block storage protocol is comprised of five methods:\n- `list` - enumerate the stored blocks as a stream\n- `stat` - get metadata about a stored block\n- `get` - retrieve a block from the store\n- `put!` - add a block to the store\n- `delete!` - remove a block from the store\n\nThese methods are asynchronous operations which return\n[manifold](https://github.com/ztellman/manifold) deferred values. If you want\nto treat them synchronously, deref the responses immediately.\n\n```clojure\n;; Create a new memory store:\n=\u003e (require 'blocks.store.memory)\n=\u003e (def store (block/-\u003estore \"mem:-\"))\n#'user/store\n\n=\u003e store\n#blocks.store.memory.MemoryBlockStore {:memory #\u003cRef@2573332e {}\u003e}\n\n;; Initially, the store is empty:\n=\u003e (block/list-seq store)\n()\n\n;; Lets put our blocks in the store so they don't get lost:\n=\u003e @(block/put! store hello)\n#blocks.data.Block\n{:id #multi/hash \"hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684\",\n :size 14,\n :stored-at #inst \"2019-02-18T07:06:43.655Z\"}\n\n=\u003e @(block/put! store readme)\n#blocks.data.Block\n{:id #multi/hash \"hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa\",\n :size 8597,\n :stored-at #inst \"2019-02-18T07:07:06.458Z\"}\n\n;; We can `stat` block ids to get metadata without content:\n=\u003e @(block/stat store (:id hello))\n{:id #multi/hash \"hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa\",\n :size 14,\n :stored-at #inst \"2019-02-18T07:07:06.458Z\"}\n\n;; `list` returns the blocks, and has some basic filtering options:\n=\u003e (block/list-seq store :algorithm :sha2-256)\n(#blocks.data.Block\n {:id #multi/hash \"hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa\",\n  :size 8597,\n  :stored-at #inst \"2019-02-18T07:07:06.458Z\"}\n #blocks.data.Block\n {:id #multi/hash \"hash:sha2-256:d2eef339d508c69fb6e3e99c11c11fc4fc8c035d028973057980d41c7d162684\",\n  :size 14,\n  :stored-at #inst \"2019-02-18T07:06:43.655Z\"})\n\n;; Use `get` to fetch blocks from the store:\n=\u003e @(block/get store (:id readme))\n#blocks.data.Block\n{:id #multi/hash \"hash:sha2-256:94d0eb8d13137ebced045b1e7ef48540af81b2abaf2cce34e924ce2cde7cfbaa\",\n :size 8597,\n :stored-at #inst \"2019-02-18T07:07:06.458Z\"}\n\n;; You can also store them directly from a byte source like a file:\n=\u003e @(block/store! store (io/file \"project.clj\"))\n#blocks.data.Block\n{:id #multi/hash \"hash:sha2-256:95344c6acadde09ecc03a7899231001455690f620f31cf8d5bbe330dcda19594\",\n :size 2013,\n :stored-at #inst \"2019-02-18T07:11:12.879Z\"}\n\n=\u003e (def project-hash (:id *1))\n#'user/project-hash\n\n;; Use `delete!` to remove blocks from a store:\n=\u003e @(block/delete! store project-hash)\ntrue\n\n;; Checking with stat reveals the block is gone:\n=\u003e @(block/stat store project-hash)\nnil\n```\n\n### Implementations\n\nThis library comes with a few block store implementations built in:\n\n- `blocks.store.memory` provides an in-memory map of blocks for transient\n  block storage.\n- `blocks.store.file` provides a simple one-file-per-block store in a local\n  directory.\n- `blocks.store.buffer` holds blocks in one store, then flushes them to another.\n- `blocks.store.replica` stores blocks in multiple backing stores for\n  durability.\n- `blocks.store.cache` manages two backing stores to provide an LRU cache that\n  will stay under a certain size limit.\n\nOther storage backends are provided by separate libraries:\n\n- [blocks-s3](//github.com/greglook/blocks-s3) backed by a bucket in Amazon S3.\n\nThese storage backends exist but aren't compatible with 2.X yet:\n\n- [blocks-adl](//github.com/amperity/blocks-adl) backed by Azure DataLake store.\n- [blocks-blob](//github.com/amperity/blocks-blob) backed by Azure Blob Storage.\n- [blocks-monger](//github.com/20centaurifux/blocks-monger) backed by MongoDB.\n\n\n## Block Metrics\n\nThe `blocks.meter` namespace provides instrumentation for block stores to\nmeasure data flows, call latencies, and other metrics. These measurements are\nbuilt around the notion of a _metric event_ and an associated _recording\nfunction_ on the store which the events are passed to. Each event has a\nnamespaced `:type` keyword, a `:label` associated with the store, and a numeric\n`:value`. The store currently measures the call latencies of the storage methods\nas well as the flow of bytes into or out of a store's blocks.\n\nTo enable metering, set a `::meter/recorder` function on the store. The function\nwill be called with the store itself and each metric event. The `:label` on each\nevent is derived from the store - it will use the store's class name or an\nexplicit `::meter/label` value if available.\n\n\n## License\n\nThis is free and unencumbered software released into the public domain.\nSee the UNLICENSE file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreglook%2Fblocks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreglook%2Fblocks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreglook%2Fblocks/lists"}