{"id":22868618,"url":"https://github.com/diennea/blobit","last_synced_at":"2025-08-22T06:09:45.164Z","repository":{"id":21959733,"uuid":"94549931","full_name":"diennea/blobit","owner":"diennea","description":"BlobIt - a Distributed Large Object Storage","archived":false,"fork":false,"pushed_at":"2023-09-05T14:55:41.000Z","size":1136,"stargazers_count":37,"open_issues_count":3,"forks_count":11,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-07-09T12:49:24.354Z","etag":null,"topics":["blob-storage","bookkeeper","distributed","java","storage"],"latest_commit_sha":null,"homepage":"https://blobit.org","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diennea.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-16T14:07:09.000Z","updated_at":"2025-03-22T17:23:06.000Z","dependencies_parsed_at":"2024-06-19T22:47:05.103Z","dependency_job_id":"c9dde0f9-0144-49d4-b4bd-828236d20f82","html_url":"https://github.com/diennea/blobit","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/diennea/blobit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diennea%2Fblobit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diennea%2Fblobit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diennea%2Fblobit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diennea%2Fblobit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diennea","download_url":"https://codeload.github.com/diennea/blobit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diennea%2Fblobit/sbom","scorecard":{"id":341548,"data":{"date":"2025-08-11","repo":{"name":"github.com/diennea/blobit","commit":"4371f3b8a3fac708a795716f07555e89e6eb8c0d"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.1,"checks":[{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":6,"reason":"Found 16/26 approved changesets -- score normalized to 6","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/pr-validation.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: third-party GitHubAction not pinned by hash: .github/workflows/pr-validation.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/diennea/blobit/pr-validation.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pr-validation.yml:28: update your workflow using https://app.stepsecurity.io/secureworkflow/diennea/blobit/pr-validation.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pr-validation.yml:30: update your workflow using https://app.stepsecurity.io/secureworkflow/diennea/blobit/pr-validation.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pr-validation.yml:34: update your workflow using https://app.stepsecurity.io/secureworkflow/diennea/blobit/pr-validation.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/pr-validation.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/diennea/blobit/pr-validation.yml/master?enable=pin","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.5.1 not signed: https://api.github.com/repos/diennea/blobit/releases/119963658","Warn: release artifact v0.6.0 not signed: https://api.github.com/repos/diennea/blobit/releases/110091691","Warn: release artifact v0.5.0 not signed: https://api.github.com/repos/diennea/blobit/releases/58987632","Warn: release artifact v0.3.1 not signed: https://api.github.com/repos/diennea/blobit/releases/24845716","Warn: release artifact v0.3.0 not signed: https://api.github.com/repos/diennea/blobit/releases/24668107","Warn: release artifact v0.5.1 does not have provenance: https://api.github.com/repos/diennea/blobit/releases/119963658","Warn: release artifact v0.6.0 does not have provenance: https://api.github.com/repos/diennea/blobit/releases/110091691","Warn: release artifact v0.5.0 does not have provenance: https://api.github.com/repos/diennea/blobit/releases/58987632","Warn: release artifact v0.3.1 does not have provenance: https://api.github.com/repos/diennea/blobit/releases/24845716","Warn: release artifact v0.3.0 does not have provenance: https://api.github.com/repos/diennea/blobit/releases/24668107"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 20 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"35 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-vmq6-5m68-f53m","Warn: Project is vulnerable to: GHSA-6v67-2wr5-gvf4","Warn: Project is vulnerable to: GHSA-pr98-23f8-jwxv","Warn: Project is vulnerable to: GHSA-h46c-h94j-95f3","Warn: Project is vulnerable to: GHSA-5mg8-w23w-74h3","Warn: Project is vulnerable to: GHSA-7g45-4rm6-3mm3","Warn: Project is vulnerable to: GHSA-735f-pc8j-v9w8","Warn: Project is vulnerable to: GHSA-mm8h-8587-p46h","Warn: Project is vulnerable to: GHSA-pvp8-3xj6-8c6x","Warn: Project is vulnerable to: GHSA-78wr-2p64-hpwj","Warn: Project is vulnerable to: GHSA-j288-q9x7-2f5v","Warn: Project is vulnerable to: GHSA-5jpm-x58v-624v","Warn: Project is vulnerable to: GHSA-prj3-ccx8-p6x4","Warn: Project is vulnerable to: GHSA-xpw8-rcwv-8f8p","Warn: Project is vulnerable to: GHSA-389x-839f-4rhx","Warn: Project is vulnerable to: GHSA-xq3w-v528-46rv","Warn: Project is vulnerable to: GHSA-4g8c-wm8x-jfhw","Warn: Project is vulnerable to: GHSA-6mjq-h674-j845","Warn: Project is vulnerable to: GHSA-2qrg-x229-3v8q","Warn: Project is vulnerable to: GHSA-65fg-84f6-3jq3","Warn: Project is vulnerable to: GHSA-f7vh-qwp3-x37m","Warn: Project is vulnerable to: GHSA-fp5r-v3w9-4333","Warn: Project is vulnerable to: GHSA-w9p3-5cr8-m3jj","Warn: Project is vulnerable to: GHSA-w7f5-jrpr-5c2m","Warn: Project is vulnerable to: GHSA-fj2m-w3wv-x9pr","Warn: Project is vulnerable to: GHSA-7286-pgfv-vxvh","Warn: Project is vulnerable to: GHSA-r978-9m6m-6gm6","Warn: Project is vulnerable to: GHSA-67mf-3cr5-8w23","Warn: Project is vulnerable to: GHSA-68m8-v89j-7j2p","Warn: Project is vulnerable to: GHSA-8xfc-gm6g-vgpv","Warn: Project is vulnerable to: GHSA-2qp4-g3q3-f92w","Warn: Project is vulnerable to: GHSA-55g7-9cwv-5qfv","Warn: Project is vulnerable to: GHSA-fjpj-2g6w-x25r","Warn: Project is vulnerable to: GHSA-pqr6-cmr2-h8hf","Warn: Project is vulnerable to: GHSA-qcwq-55hx-v3vh"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T06:00:11.547Z","repository_id":21959733,"created_at":"2025-08-18T06:00:11.548Z","updated_at":"2025-08-18T06:00:11.548Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271594377,"owners_count":24786707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blob-storage","bookkeeper","distributed","java","storage"],"created_at":"2024-12-13T12:36:38.994Z","updated_at":"2025-08-22T06:09:45.138Z","avatar_url":"https://github.com/diennea.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BlobIt\n\nBlobIt is a ditributed binary large objects (BLOBs) storage built upon Apache BookKeeper\n\n# Overview\n\nBlobIt stores *BLOBS* (binary large objects) in *buckets*, a bucket is like a namespace.\nMultitenanty is fundamental in BlobIt architecture and it is expected that each\n*tenant* uses its own bucket.\n\nData is stored on a [Apache BookKeeper](https://bookkeeper.apache.org) cluster, \nand this automagically enables BlobIt to scale horizontally, the more *Bookies* you have\nthe more amount of data you will be able to store.\n\nBlobIt needs a *metadata service* in order to store refecences to the data, it ships\nby default with [HerdDB](https://herddb.org), which is also built upon BookKeeper.\n\n# Architectural overview\n\nBlobIt is designed for performance and expecially low latency in this scenario:\nthe *writer* stores one BLOB and *readers* immediately read such BLOB (usually from different machines).\nThis is the most common path in [EmailSuccess](https://emailsuccess.com), as BlobIt\nis the core datastore for it.\nBlobs are supposed to be retained for a couple of weeks, not for very long term,\nbut there is nothing in the design of BlobIt that prevents you for storing data for\nyears.\n\nBlobIt clients talk directly to Bookies both for reads and writes, this way\nwe are exploiting directly all of the BookKeeper optimizations on the write and read path.\nThis architecture is totally decentralized, there is no BlobIt server.\nYou can use the convenience binaries [BlobIt service](blobit-service) that is simply a pre-package bundle\nable to run ZooKeeper, BookKeeper, HerdDB and a REST API.\n\nYou can see BlobIt as simple extension to BookKeeper, with a metadata layer which makes it simple to:\n- reference Data using a user-supplied name (in form of bucketId/name)\n- organize efficently data in BookKeeper, an allow deletion of BLOBs.\n\n![Write path](docs/writepath.png)\n\n# Writes\n\nBatches of Blobs are stored in BookKeeper ledgers (using the WriteHandleAdv API),\nWe are storing more then one BLOB inside one BookKeeper ledger.\nBlobIt will collect unused ledgers and delete them.\n\nWhen a Writer stores a BLOB it receives immedialy an unique ID of the blob,\nthis ID is unique in the whole cluster, not only in the scope of the bucket.\nSuch ID is a \"smart id\" and it contains all of the information needed to retrieve\ndata without using the metadata service.\n\nSuch ID contains information like:\n- the ID of the ledger (64bit)\n- fist entry id (64bit)\n- number of entries (32bit)\n- size of the entry (64bit)\n\nWith such information it is possible to read the whole BLOB or even only parts.\nAn object is immutable and it cannot be modified.\n\nThe client can assign a custom name, unique inside the context of the Bucket,\nReaders will be able to access the object using this key.\nYou can assign the same key to another object, this way \n\nIf you are using custom keys the writer and the reader have to perform an additional RPC\nto the metadata service.\n\n![Write path](docs/writerflow.png)\n\nBookKeeper client stores data in immutable ledgers, and performs writes to a \nquorum of Bookies, which are only dta storage nodes.\nEach ledger will be written to several bookies, and all the information\nneeded for data retrival is stored on ZooKeeper.\nZooKeeper also stores data for Bookie discovery.\n\nSo the normal write flow is:\n* create a new ledger:\n  * choose a set of available Bookies using ZooKeeper\n  * write new ledger metadata to ZooKeeper\n*  write each part of the BLOB directly to the Bookies\n*  record on the metadata service (HerdDB) which entries of the ledger contains the data\n  * perform an RPC to the HerdDB tablespace ledger for the bucket to write  metadata\n  * the database will perform a write on BookKeeper (still to a quorum of bookies)\n\nThe metadata service is decentralized: each bucket will have a dedicated *tablespace* on HerdDB,\nthis leader will be indipendent from the ledgers of other tablespaces of otherbuckets,\nthis way the system will scale horizonally with the number of Buckets.\n\n# Reads\n\nBlobIt clients read data directly from Bookies. Because the objectId\ncontains all of the information to access the data.\nIn case of lookup by custom key a lookup on the metadata service is needed.\n\nThe reader can read the full Blob of parts of it.\nBlobIT supports a Streaming API for reads, suitable for very large objects:\nas soon as data comes from BookKeeper it is streamed to the application: Think about\nan HTTP service which retrieves an object and serves it directly to the client.\n\n# Buckets and data locality\n\nYou can use Buckets in order to make it possible to store\ndata nearby the writer or the reader.\nBlobIt is able to use an *HerdDB tablespace* for each bucket, this way all of the metadata\nof the bucket will be handled using the placement policies configured in the system.\n\nThis is very important, because each Bucket will be able to survive and work\nindipendently from the others.\n\nA typical scenario is to move readers, writers and the primary copy metadata and data \nnext to each other, and have replicas on other machines/racks.\nBoth for the metadata service (HerdDB) and the data service (BookKeeper) replicas\nwill be activated immediately as soon as the reference machines are no more available\nwithout any service outage.\n\n# Deleting data\n\nData deletion is the most tricky part, because BlobIt is storing more than\none BLOB inside the same ledeger, so you can delete a ledeger only when there is\nno live BLOB stored in it.\nWe have a garbage collector system which makes maintenance of the Bucket and \ndeleted data from BookKeeper when it is no more needed.\nBookies in turn will do their own Garbage Collection, depending on the configuration.\nSo disk space won't be reclaimed as soon as a BLOB is deleted.\n\nBlobIt garbage collection is totally decentralized, any client can run the\nprocedure, and it runs per bucket.\nEven in this case it is expected that services which operate on a bucket\nare co-located and take care of running the garbage collection in a timely manner.\nUsually it makes sense to run the GC of a bucket after deleting a batch of BLOBs of the same bucket.\n\n# Java Client example\n\nA tipical writer looks like this:\n\n```\nString BUCKET_ID = \"test\";\nbyte[] TEST_DATA = \"foo\".getBytes();\nHerdDBDataSource datasource = new HerdDBDataSource();\ndatasource.setUrl(\"jdbc:herddb:localhost\");\nConfiguration configuration\n                = new Configuration()\n                    .setType(Configuration.TYPE_BOOKKEEPER)\n                    .setConcurrentWriters(10)\n                    .setUseTablespaces(true)\n                    .setZookeeperUrl(env.getAddress());\ntry (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      \n      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();\n\n      BucketHandle bucket = manager.getBucket(BUCKET_ID);\n      String id = bucket.put(null, TEST_DATA).get();\n}\n```\n\nA typical reader looks like this:\n\n```\nString BUCKET_ID = \"test\";\nbyte[] TEST_DATA = \"foo\".getBytes();\nHerdDBDataSource datasource = new HerdDBDataSource();\ndatasource.setUrl(\"jdbc:herddb:localhost\");\nConfiguration configuration\n                = new Configuration()\n                    .setType(Configuration.TYPE_BOOKKEEPER)\n                    .setConcurrentWriters(10)\n                    .setUseTablespaces(true)\n                    .setZookeeperUrl(env.getAddress());\ntry (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      \n      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();\n\n      BucketHandle bucket = manager.getBucket(BUCKET_ID);\n      \n      byte[] data = bucketReaders.get(it).get();\n}\n```\n\nMost of the APIs are async and they are based on CompletableFuture.\n\nREST API\n\nWe are delivering a REST service which is (almost) compatible with [Open Stack Swift API](https://docs.openstack.org/swift/latest/api/object_api_v1_overview.html).\nThis service is still in ALPHA phase and it is currently used in order to perform\nbenchmarks and comparisons with other products.\n\nSecurity\n\nAs BlobIt is mostly a layer on top of BookKeeper, ZooKeeper and HerdDB all of the security\naspects are handled directly but low level clients.\nOur suggestion is that you enable SASL/Kerberos authentication on all of such services.\nThere is no support for other security features, because we expecte the client application\nto be in charge for handling semantics of buckets/blobs.\n\n## Getting in touch\n\nFeel free to create issues in order to interact with the community.\n\nDocumentation will come soon, start with the examples inside the test cases\nin order to understand better how it works.\n\nPlease let us know if you are trying out this project, we will be happy to hear about your\ncase and help you.\n\n## License\n\nBlobIt is under [Apache 2 license](http://www.apache.org/licenses/LICENSE-2.0.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiennea%2Fblobit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiennea%2Fblobit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiennea%2Fblobit/lists"}