{"id":13509953,"url":"https://github.com/tsileo/blobstash","last_synced_at":"2025-03-17T01:31:28.819Z","repository":{"id":16353017,"uuid":"19103001","full_name":"tsileo/blobstash","owner":"tsileo","description":"You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash","archived":false,"fork":false,"pushed_at":"2020-07-19T20:35:19.000Z","size":73603,"stargazers_count":103,"open_issues_count":0,"forks_count":8,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-02-27T16:17:30.533Z","etag":null,"topics":["backup","blob-store","blobstash","content-addressed","deduplication","document-store","go","storage"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tsileo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-04-24T09:29:02.000Z","updated_at":"2025-02-14T20:34:40.000Z","dependencies_parsed_at":"2022-09-10T14:40:32.961Z","dependency_job_id":null,"html_url":"https://github.com/tsileo/blobstash","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsileo%2Fblobstash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsileo%2Fblobstash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsileo%2Fblobstash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsileo%2Fblobstash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tsileo","download_url":"https://codeload.github.com/tsileo/blobstash/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243835959,"owners_count":20355613,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backup","blob-store","blobstash","content-addressed","deduplication","document-store","go","storage"],"created_at":"2024-08-01T02:01:18.570Z","updated_at":"2025-03-17T01:31:23.803Z","avatar_url":"https://github.com/tsileo.png","language":"Go","funding_links":[],"categories":["Go","go"],"sub_categories":[],"readme":"BlobStash\n=========\n\n\u003cp align=\"center\"\u003e\n  \u003cimg \n    src=\"https://sos-ch-dk-2.exo.io/hexaninja/blobstash.png\" \n    width=\"192\" height=\"192\" border=\"0\" alt=\"microblog.pub\"\u003e\n\u003c/p\u003e\n\n[![builds.sr.ht status](https://builds.sr.ht/~tsileo/blobstash.svg)](https://builds.sr.ht/~tsileo/blobstash?)\n\u0026nbsp; \u0026nbsp;[![License](http://img.shields.io/badge/license-MIT-red.svg?style=flat)](https://git.sr.ht/~tsileo/blobstash/tree/master/LICENSE)\n\nYour personal database.\n\n**Still in early development.**\n\n## Manifesto\n\nBlobStash is primarily a database, you can store raw blobs, key-value pairs, JSON documents and files/directories. \n\nIt can also acts as a web server/reverse proxy.\n\nThe web server supports HTTP/2 and can generate you TLS certs on the fly using Let's Encrypt.\nYou can proxy other applications and gives them free certs at the same time, you can also write apps (using Lua) that lets\nyou interact with BlobStash's database.\nHosting static content is also an option.\nIt let you easily add authentication to any app/proxied service.\n\n### Blobs\n\nThe content-addressed blob store (the identifier of a blob is its own hash, the chosen hash function is [BLAKE2b](https://blake2.net/)) is at the heart of everything in BlobStash. Everything permanently stored in BlobStash ends up in a blob.\n\nBlobStash has its own storage engine: [BlobsFile](https://github.com/tsileo/blobsfile), data is stored in an append-only flat file.\nAll data is immutable, stored with error correcting code for bit-rot protection, and indexed in a temporary index for fast access, only 2 seeks operations are needed to access any blobs.\n\nThe blob store supports real-time replication via an Oplog (powered by Server-Sent Events) to replicate to another BlobStash instance (or any system), and also support efficient synchronisation between instances using a Merkle tree to speed-up operations.\n\n### Key-values\n\nKey-value pairs lets you keep a mutable reference to an internal or external object, it can be a hash and/or any sequence of bytes.\n\nEach key-value has a timestamp associated, its version. you can easily list all the versions, by default, the latest version is returned.\nInternally, each \"version\" is stored as a separate blob, with a specific format, so it can be detected and re-indexed.\n\nKey-Values are indexed in a temporary database (that can be rebuilt at any time by scanning all the blobs) and stored as a blob.\n\n### Files, tree of files\n\nFiles and tree of files are first-class citizen in BlobStash.\n\nFiles are split in multiple chunks (stored as blobs, using content-defined chunking, giving deduplication at the file level), and everything is stored in a kind of Merkle tree where the hash of the JSON file containing the file metadata is the final identifier (which will also be stored as blob).\n\nThe JSON format also allow to model directory. A regular HTTP multipart endpoint can convert file to BlobStash internal format for you, or you can do it locally to prevent sending blobs that are already present.\n\nFiles can be streamed easily, range requests are supported, EXIF metadata automatically extracted and served, and on-the-fly resizing (with caching) for images.\n\nYou can also enable a S3 compatible gateway to manage your files.\n\n### Role Based Access Control (RBAC)\n\nBlobStash features fine-grained permissions support, with a model similar to AWS roles.\n\n#### Predefined roles\n\n - `admin`: full access to everything\n   - `action:*`/`resource:*`\n\n## Document Store\n\nThe _Document Store_ stores JSON documents, think MongoDB or CouchDB, and exposes it over an HTTP API.\n\nDocuments are stored in a collection. All collections are stored in a single namespace.\n\nEvery document versions is kept (and always accessible via temporal queries, i.e. querying the state of a collection at an instant `t`).\n\nThe _Document Store_ supports ETag, conditional requests (`If-Match`...) and [JSON Patch](http://jsonpatch.com/) for partial/consistent update.\n\nDocuments are queried with Lua functions, like:\n\n```Lua\nlocal docstore = require('docstore')\nreturn function(doc)\n  if doc.subdoc.counter \u003e 10 and docstore.text_search(doc, \"query\", {\"content\"}) then\n    return true\n  end\n  return false\nend\n```\n\nIt also implements a basic MapReduce framework (Lua powered too).\n\nAnd lastly, a document can hold pointers to filse/nodes stored in the _FileTree Store_.\n\nInternally, a JSON document \"version\" is stored as a \"versioned key-value\" entry.\nDocument IDs encode the creation version, and are lexicographically sorted by creation date (8 bytes nano timestamp + 4 random bytes).\nThe _Versioned Key-Value Store_ is the default index for listing/sorting documents.\n\n### Collections\n\n#### GET /api/docstore\n\nList all the collections.\n\n##### HTTP Request\n\n```shell\n$ http --auth :apikey GET https://instance.com/api/docstore\n```\n\n##### HTTP Response\n\n```json\n{\n    \"data\": [\n        \"mycollection\"\n    ], \n    \"pagination\": {\n        \"count\": 1, \n        \"cursor\": \"\", \n        \"has_more\": false, \n        \"per_page\": 50\n    }\n}\n```\n\n##### blobstash-python\n\n```python\nfrom blobstash.docstore import DocStoreClient\n\nclient = DocStoreClient(\"https://instance.com\", api_key=\"apikey\")\n\nclient.collections()\n# [blobstash.docstore.Collection(name='mycollection')]\n```\n\n### Inserting documents\n\nCollections are created on-the-fly when a document is inserted.\n\n#### POST /api/docstore/{collection}\n\n##### HTTP Request\n\n```shell\n$ http --auth :apikey post https://instance.com/api/docstore/{collection} content=lol\n```\n\n##### HTTP Response\n\n```\n{\n    \"_created\": \"2020-02-23T15:28:06Z\", \n    \"_id\": \"15f6119d6dddd68fa986d4c7\", \n    \"_version\": \"1582471686918100623\"\n}\n```\n\n##### blobstash-python\n\n```python\nfrom blobstash.docstore import DocStoreClient\n\nclient = DocStoreClient(\"https://instance.com\", api_key=\"apikey\")\n\n# or `client[\"mycol\"]` or `client.collection(\"mycol\")`\ncol = client.mycol\n\ndoc = {\"content\": \"lol\"}\n\ncol.insert(doc)\n# blobstash.docstore.ID(_id='15f611f032ae804d668dd855')\n\n# the `dict` will be updated with its `_id`\ndoc\n# {'content': 'lol',\n#  '_id': blobstash.docstore.ID(_id='15f611f032ae804d668dd855')}\n```\n\n### Updating a document (by replacing it)\n\n#### POST /api/docstore/{collection}/{id}\n\n##### HTTP Request\n\n```shell\n$ http --auth :apikey post https://instance.com/api/docstore/{collection} content=lol\n```\n\n##### HTTP Response\n\n```\n{\n    \"_created\": \"2020-02-23T15:28:06Z\", \n    \"_id\": \"15f6119d6dddd68fa986d4c7\", \n    \"_version\": \"1582471686918100623\"\n}\n```\n\n#### PATCH /api/docstore/{collection}/{id}\n\n##### HTTP Request\n\n##### HTTP Response\n\n##### blobstash-python\n\n### Deleting documents\n\n#### DELETE /api/docstore/{collection}/{id}\n\n##### HTTP Request\n\n```shell\n$ http --auth :apikey delete https://instance.com/api/docstore/{collection}/{id}\n```\n\n##### HTTP Response\n\n204 no content.\n\n##### blobstash-python\n\n```python\nfrom blobstash.docstore import DocStoreClient\n\nclient = DocStoreClient(\"https://instance.com\", api_key=\"apikey\")\n\n# or `client[\"mycol\"]` or `client.collection(\"mycol\")`\ncol = client.mycol\n\n# Can take an ID as `str`, an `ID` object, or a document (with the `_id` key)\ncol.delete(\"15f611f032ae804d668dd855\")\n```\n\n### Retrieving documents\n\n### Querying documents\n\n#### GET /api/docstore/{collection}{?sort_index,as_of}\n\n##### HTTP Request\n\n```shell\n$ http --auth :apikey get https://instance.com/api/docstore/{collection}\n```\n\n##### HTTP Response\n\n```json\n{\n    \"data\": [\n        {\n            \"_created\": \"2020-02-23T15:50:24Z\", \n            \"_id\": \"15f612d4f7715bdb28c93fd9\", \n            \"_updated\": \"2020-02-23T15:55:15Z\", \n            \"_version\": \"1582473315736447008\", \n            \"content\": \"lol2\"\n        }\n    ], \n    \"pagination\": {\n        \"count\": 1, \n        \"cursor\": \"ZG9jc3RvcmU6Y29sMToxNWY2MTJkNGY3NzE1YmRiMjhjOTNmZDg=\", \n        \"has_more\": false, \n        \"per_page\": 50\n    }, \n    \"pointers\": {}\n}\n```\n\n##### blobstash-python\n\n```python\nfrom blobstash.docstore import DocStoreClient\n\nclient = DocStoreClient(\"https://instance.com\", api_key=\"apikey\")\n\n# or `client[\"mycol\"]` or `client.collection(\"mycol\")`\ncol = client.mycol\n\ncol.query()\n#\n```\n\n### Sorting/indexes\n\nSorting can only be done through indexes.\n\n### MapReduce framework\n\n## BlobStash Use Cases\n\n### Backups from external servers\n\nSetup an API key with limited permissions (in blobstash.yaml), just enough to save a snapshot of a tree:\n\n```yaml\n# [...]\nauth:\n - id: 'my_backup_key'\n   password: 'my_api_key'\n   roles: 'backup_server1'\nroles:\n - name: 'backup_server1'\n   perms:\n    - action: 'action:stat:blob'\n      resource: 'resource:blobstore:blob:*'\n    - action: 'action:write:blob'\n      resource: 'resource:blobstore:blob:*'\n    - action: 'action:snapshot:fs'\n      resource: 'resource:filetree:fs:server1'\n    - action: 'action:write:kv'\n      resource: 'resource:kvstore:kv:_filetree:fs:server1'\n    - action: 'action:gc:namespace'\n      resource: 'resource:stash:namespace:server1'\n```\n\nThen on \"server1\":\n\n```bash\n$ export BLOBS_API_HOST=https://my-blobstash-instance.com BLOBS_API_KEY=my_api_key\n$ blobstash-uploader server1 /path/to/data\n```\n\n### Lua API\n\n#### Extra module\n\n- [`extra.glob(pattern, name)`](#extraglobpattern-name)\n\n##### extra.glob(pattern, name)\n\nParses the shell file name pattern/glob and reports wether the file name matches.\n\nUses go's [filepath.Match](https://godoc.org/path/filepath#Match).\n\n**Attributes**\n\n| Name    | Type   | Description |\n| ------- | ------ | ----------- |\n| pattern | String | Glob pattern |\n| name    | String | file name |\n\n**Returns**\n\nBoolean\n\n## Contribution\n\nPull requests are welcome but open an issue to start a discussion before starting something consequent.\n\nFeel free to open an issue if you have any ideas/suggestions!\n\n## License\n\nCopyright (c) 2014-2018 Thomas Sileo and contributors. Released under the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsileo%2Fblobstash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftsileo%2Fblobstash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsileo%2Fblobstash/lists"}