https://github.com/bobvawter/cacheroach
Cacheroach is a multi-tenant, multi-region, multi-cloud file store built using CockroachDB.
https://github.com/bobvawter/cacheroach
cockroachdb
Last synced: about 1 year ago
JSON representation
Cacheroach is a multi-tenant, multi-region, multi-cloud file store built using CockroachDB.
- Host: GitHub
- URL: https://github.com/bobvawter/cacheroach
- Owner: bobvawter
- License: apache-2.0
- Created: 2020-12-03T23:09:31.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-06-21T14:11:46.000Z (about 5 years ago)
- Last Synced: 2024-06-19T11:30:14.825Z (about 2 years ago)
- Topics: cockroachdb
- Language: Go
- Homepage:
- Size: 649 KB
- Stars: 24
- Watchers: 3
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Cacheroach
> I just want to serve five terabytes of data, across multiple cloud providers.
## TL;DR
Cacheroach may be of interest to you if you need a multi-tenant, multi-region, multi-cloud,
file-storage abstraction built on top of [CockroachDB](https://github.com/cockroachdb/cockroach). It
has both an HTTP and a gRPC API, a robust security model, and can be deployed within your favorite
serverless or containerized runtime. For users with data-domiciling needs, Cacheroach's database
schema is built to take advantage of CockroachDB's support for geo-partitioned workloads.

## Quickstart
```shell
git clone git@github.com:bobvawter/cacheroach
cd cacheroach
# To build the cacheroach CLI tool
go install
# Launch demo stack consisting of a CRDB node and cacheroach
docker-compose -f docker-compose.quickstart.yml up
# Create a root user, writing access data to root.cfg
cacheroach -v -c root.cfg bootstrap --hmacKey @./cacheroach-data/hmac http://root@localhost:13013/
# List tenants, should show the default scratchpad
cacheroach -v -c root.cfg tenant ls
# Create a working user, this will write a new configuration file.
# You can add a --password flag to enable password-based login.
cacheroach -v -c root.cfg principal create $USER --out $HOME/.cacheroach/config
# Grant the new user read/write access to a tenant.
cacheroach -v -c root.cfg session delegate --for --on tenant --id --capabilities read,write
# Use the real user to list files
cacheroach -v file ls -t
# Set a default tenant id to minimize typing
cacheroach tenant default
# Upload a file
echo "Hello World." > hello.txt
cacheroach -v file put / hello.txt
# Look at HTTP vhost mapping
cacheroach -v -c root.cfg vhost ls
# Fetch a file using the default HTTP VHost endpoint
# Use -H "Authorization: Bearer " for non-public tenants.
curl -O http://127.0.0.1:13013/cacheroach
# Generate a signed retrieval URL.
cacheroach file sign /hello.txt
```
See the [CLI docs](./doc/cacheroach.md) for additional information.
## Implementation details
### Data model
A CockroachDB cluster can easily hold many terabytes of data, however it is fundamentally designed
to service "Online Transaction Processing" (OLTP) and System-of-Record (SoR) use-cases. The
implementation choices necessary to excel in providing consistent and reliable transactions at scale
impose [certain limitations](https://www.cockroachlabs.com/docs/stable/known-limitations.html) on
the queries that can be effectively processed. The chief limitations to overcome when building a
file store are the maximum row (512 MiB) and transaction (64 MiB) sizes.
Cacheroach breaks files into content-addressable chunks of 512 KiB which are assembled into
content-addressable ropes. Ropes are then referenced by a filesystem abstraction which provides the
necessary metadata needed by clients. In this context, "content-addressable" means that we use the
cryptographic hash of the contents of a chunk or of a rope in order to identify it.
Cacheroach foregoes the typical approach of "one-request, one-transaction" due to the aforementioned
limits on the size on any given data transaction. Rather, we use an idempotent approach to most
data-storage operations. Writing the same chunk twice is effectively a no-op. We build on this when
manipulating ropes or performing operations on filesystem manifests. The use of
single-SQL-statement, implicit, transactions allows us to take advantage of CockroachDB's automatic
[transaction retry](https://www.cockroachlabs.com/docs/stable/advanced-client-side-transaction-retries.html)
mechanisms in cases where there is a transactional serialization conflict.
One extra layer of data organization is applied to chunks, ropes, and files: Tenancy. This allows a
single Cacheroach service to service multiple use-cases. The database schema also allows CockroachDB
zone configurations to be applied to better control where any given tenant's data lives.
## Security model
Cacheroach uses a "capability, delegate, target" approach to authorization.
A [Principal](./api/principal.proto) may have zero or more durable [Sessions](./api/session.proto)
which grant the principal the permission to perform operations within the system.
Automatic principal provisioning can be enabled through OIDC integration. Cacheroach will request
OIDC credentials with an offline scope. Principals are periodically re-validated using the OIDC
refresh token. A whitelist of email domains is provided as part of cacheroach's configuration to
limit access to specified users.
Sessions are exposed as signed [JWT tokens](https://jwt.io). Active sessions are maintained in a
table to facilitate occasional invalidation checks.
The API surface area uses a [declarative model](./api/capabilities.proto) to implement ACL checks in
a [centralized](./pkg/enforcer) manner. All access checks will have been performed by the time an
RPC method has been invoked. The return values are also checked and elided. An RPC call will be
rejected if a client "says" something that it's not allowed to "say," and it cannot "hear" anything
that it could not "say" later.
### OpenID Connect integration
Cacheroach supports using an OIDC provider that provides a discovery URL. Pass
an OAuth client id, secret, and the OIDC discovery URL to the cacheroach server
to enable automatic principal provisioning. An email-domain principal can
provide automatic access delegation to newly-created principals.
Using your favorite OIDC provider of choice, create a new OAuth web-server
integration. Here's how you'd do it using [Google
Accounts](https://developers.google.com/identity/protocols/oauth2/web-server),
but any OIDC implemention ought to work. You will need the discovery URL, the
OAuth2 client id and secret, and a list of user email domains for which you
want to automatically provision accounts. You'll also need to configure an
OAuth redirect URL, such as `https://your.cacheroach.server/_/oidc/receive`.
```
cacheroach start
--oidcIssuer https://accounts.google.com
--oidcDomains yourcompany.com
--oidcClientID xyzzy
--oidcClientSecret soupOrSecret
....
```
On the client side, run `cacheroach auth login
https://your.cacheroach.server/`. You'll be prompted with a URL to open in a
browser to authenticate with the OIDC provider. If all goes accordingly, your
browser will connect to the running cacheroach instance to be redirected to the
OIDC provider. The provider will authenticate your browser and redirect back
to the cacheroach server to complete the handoff. The cacheroach server will
then redirect you to an ephemeral webserver started by the local cacheroach CLI
process to transfer the cacheroach session data.
Using a superuser token, you can create a cacheroach principal that represents
all users with a given OIDC domain using `cacheroach principal create
--emailDomain yourcompany.com "Your Company"`. Any grants provided to the
domain principal will be inherited by OIDC-provisioned principals.
### Signed URLs
Cacheroach can generate durable signed URLs that allow an otherwise-unauthenticated client to
retrieve a file through Cacheroach's HTTP endpoint.
## Virtual Hosts
A tenant's filesystem can be bound to a virtual hostname and served over a regular HTTP endpoint.