{"id":24355525,"url":"https://github.com/bobvawter/cacheroach","last_synced_at":"2025-04-10T03:13:26.701Z","repository":{"id":52891427,"uuid":"318343937","full_name":"bobvawter/cacheroach","owner":"bobvawter","description":"Cacheroach is a multi-tenant, multi-region, multi-cloud file store built using CockroachDB.","archived":false,"fork":false,"pushed_at":"2021-06-21T14:11:46.000Z","size":665,"stargazers_count":24,"open_issues_count":7,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-06-19T11:30:14.825Z","etag":null,"topics":["cockroachdb"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bobvawter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-03T23:09:31.000Z","updated_at":"2024-01-14T00:01:55.000Z","dependencies_parsed_at":"2022-08-24T05:31:59.704Z","dependency_job_id":null,"html_url":"https://github.com/bobvawter/cacheroach","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bobvawter%2Fcacheroach","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bobvawter%2Fcacheroach/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bobvawter%2Fcacheroach/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bobvawter%2Fcacheroach/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bobvawter","download_url":"https://codeload.github.com/bobvawter/cacheroach/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234545180,"owners_count":18850165,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cockroachdb"],"created_at":"2025-01-18T17:57:40.054Z","updated_at":"2025-01-18T17:57:41.520Z","avatar_url":"https://github.com/bobvawter.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cacheroach\n\n\u003e I just want to serve five terabytes of data, across multiple cloud providers.\n\n## TL;DR\n\nCacheroach may be of interest to you if you need a multi-tenant, multi-region, multi-cloud,\nfile-storage abstraction built on top of [CockroachDB](https://github.com/cockroachdb/cockroach). It\nhas both an HTTP and a gRPC API, a robust security model, and can be deployed within your favorite\nserverless or containerized runtime. For users with data-domiciling needs, Cacheroach's database\nschema is built to take advantage of CockroachDB's support for geo-partitioned workloads.\n\n![Hero image showing cacheroach deployment architecture](./doc/hero.png)\n\n## Quickstart\n\n```shell\ngit clone git@github.com:bobvawter/cacheroach\ncd cacheroach\n# To build the cacheroach CLI tool\ngo install \n# Launch demo stack consisting of a CRDB node and cacheroach\ndocker-compose -f docker-compose.quickstart.yml up\n# Create a root user, writing access data to root.cfg\ncacheroach -v -c root.cfg bootstrap --hmacKey @./cacheroach-data/hmac http://root@localhost:13013/\n# List tenants, should show the default scratchpad\ncacheroach -v -c root.cfg tenant ls\n# Create a working user, this will write a new configuration file.\n# You can add a --password flag to enable password-based login.\ncacheroach -v -c root.cfg principal create $USER --out $HOME/.cacheroach/config\n# Grant the new user read/write access to a tenant.\ncacheroach -v -c root.cfg session delegate --for \u003cPRINCIPAL\u003e --on tenant --id \u003cTENANT\u003e --capabilities read,write\n# Use the real user to list files\ncacheroach -v file ls -t \u003cTENANT\u003e\n# Set a default tenant id to minimize typing\ncacheroach tenant default \u003cTENANT\u003e\n# Upload a file\necho \"Hello World.\" \u003e hello.txt\ncacheroach -v file put / hello.txt\n# Look at HTTP vhost mapping\ncacheroach -v -c root.cfg vhost ls\n# Fetch a file using the default HTTP VHost endpoint\n# Use -H \"Authorization: Bearer \u003cJWT TOKEN\u003e\" for non-public tenants.\ncurl -O http://127.0.0.1:13013/cacheroach\n# Generate a signed retrieval URL.\ncacheroach file sign /hello.txt\n```\n\nSee the [CLI docs](./doc/cacheroach.md) for additional information.\n\n## Implementation details\n\n### Data model\n\nA CockroachDB cluster can easily hold many terabytes of data, however it is fundamentally designed\nto service \"Online Transaction Processing\" (OLTP) and System-of-Record (SoR) use-cases. The\nimplementation choices necessary to excel in providing consistent and reliable transactions at scale\nimpose [certain limitations](https://www.cockroachlabs.com/docs/stable/known-limitations.html) on\nthe queries that can be effectively processed. The chief limitations to overcome when building a\nfile store are the maximum row (512 MiB) and transaction (64 MiB) sizes.\n\nCacheroach breaks files into content-addressable chunks of 512 KiB which are assembled into\ncontent-addressable ropes. Ropes are then referenced by a filesystem abstraction which provides the\nnecessary metadata needed by clients. In this context, \"content-addressable\" means that we use the\ncryptographic hash of the contents of a chunk or of a rope in order to identify it.\n\nCacheroach foregoes the typical approach of \"one-request, one-transaction\" due to the aforementioned\nlimits on the size on any given data transaction. Rather, we use an idempotent approach to most\ndata-storage operations. Writing the same chunk twice is effectively a no-op. We build on this when\nmanipulating ropes or performing operations on filesystem manifests. The use of\nsingle-SQL-statement, implicit, transactions allows us to take advantage of CockroachDB's automatic\n[transaction retry](https://www.cockroachlabs.com/docs/stable/advanced-client-side-transaction-retries.html)\nmechanisms in cases where there is a transactional serialization conflict.\n\nOne extra layer of data organization is applied to chunks, ropes, and files: Tenancy. This allows a\nsingle Cacheroach service to service multiple use-cases. The database schema also allows CockroachDB\nzone configurations to be applied to better control where any given tenant's data lives.\n\n## Security model\n\nCacheroach uses a \"capability, delegate, target\" approach to authorization.\nA [Principal](./api/principal.proto) may have zero or more durable [Sessions](./api/session.proto)\nwhich grant the principal the permission to perform operations within the system.\n\nAutomatic principal provisioning can be enabled through OIDC integration. Cacheroach will request\nOIDC credentials with an offline scope. Principals are periodically re-validated using the OIDC\nrefresh token. A whitelist of email domains is provided as part of cacheroach's configuration to\nlimit access to specified users.\n\nSessions are exposed as signed [JWT tokens](https://jwt.io). Active sessions are maintained in a\ntable to facilitate occasional invalidation checks.\n\nThe API surface area uses a [declarative model](./api/capabilities.proto) to implement ACL checks in\na [centralized](./pkg/enforcer) manner. All access checks will have been performed by the time an\nRPC method has been invoked. The return values are also checked and elided. An RPC call will be\nrejected if a client \"says\" something that it's not allowed to \"say,\" and it cannot \"hear\" anything\nthat it could not \"say\" later.\n\n### OpenID Connect integration\n\nCacheroach supports using an OIDC provider that provides a discovery URL.  Pass\nan OAuth client id, secret, and the OIDC discovery URL to the cacheroach server\nto enable automatic principal provisioning. An email-domain principal can\nprovide automatic access delegation to newly-created principals.\n\nUsing your favorite OIDC provider of choice, create a new OAuth web-server\nintegration. Here's how you'd do it using [Google\nAccounts](https://developers.google.com/identity/protocols/oauth2/web-server),\nbut any OIDC implemention ought to work. You will need the discovery URL, the\nOAuth2 client id and secret, and a list of user email domains for which you\nwant to automatically provision accounts. You'll also need to configure an\nOAuth redirect URL, such as `https://your.cacheroach.server/_/oidc/receive`.\n\n```\ncacheroach start \n  --oidcIssuer https://accounts.google.com\n  --oidcDomains yourcompany.com\n  --oidcClientID xyzzy\n  --oidcClientSecret soupOrSecret\n  ....\n```\n\nOn the client side, run `cacheroach auth login\nhttps://your.cacheroach.server/`. You'll be prompted with a URL to open in a\nbrowser to authenticate with the OIDC provider. If all goes accordingly, your\nbrowser will connect to the running cacheroach instance to be redirected to the\nOIDC provider.  The provider will authenticate your browser and redirect back\nto the cacheroach server to complete the handoff. The cacheroach server will\nthen redirect you to an ephemeral webserver started by the local cacheroach CLI\nprocess to transfer the cacheroach session data.\n\nUsing a superuser token, you can create a cacheroach principal that represents\nall users with a given OIDC domain using `cacheroach principal create\n--emailDomain yourcompany.com \"Your Company\"`. Any grants provided to the\ndomain principal will be inherited by OIDC-provisioned principals.\n\n### Signed URLs\n\nCacheroach can generate durable signed URLs that allow an otherwise-unauthenticated client to\nretrieve a file through Cacheroach's HTTP endpoint.\n\n## Virtual Hosts\n\nA tenant's filesystem can be bound to a virtual hostname and served over a regular HTTP endpoint.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobvawter%2Fcacheroach","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbobvawter%2Fcacheroach","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobvawter%2Fcacheroach/lists"}