An open API service indexing awesome lists of open source software.

https://github.com/tuist/kura

Leaderless distributed cache for productive software teams
https://github.com/tuist/kura

bazel cache gradle xcode

Last synced: about 15 hours ago
JSON representation

Leaderless distributed cache for productive software teams

Awesome Lists containing this project

README

          


Kura logo

# Kura

`Kura` is a Rust server for building low-latency cache meshes for tenants, handling distributed cache traffic for binary artifacts and metadata.

> [!NOTE]
> `Kura` comes from the Japanese word `θ”΅` (`kura`), which refers to a storehouse or warehouse. The name fits the system's role: keeping build artifacts and cache metadata stored durably and close at hand so they can be served with low latency.

## Summary ✨

- ⚑ Hot reads come from local disk
- πŸͺ¨ Local metadata, multipart state, and the replication outbox live in RocksDB
- πŸ” Blobs and cache metadata replicate to peer nodes with eventual consistency
- πŸ”Ž Nodes can discover peers through DNS and bootstrap themselves from already-running nodes
- πŸ“¦ The HTTP API covers key value entries, Xcode CAS artifacts, Gradle artifacts, multipart module uploads, Nx self-hosted cache artifacts, Metro cache artifacts, and namespace clean
- 🧰 The gRPC API exposes the Bazel Remote Execution cache services used by Bazel and Buck2
- πŸ“Š The local stack includes Grafana, Prometheus, Loki, Promtail, and Tempo traces

## Local stack πŸ§ͺ

Run:

```bash
docker compose up --build -d
```

Useful endpoints:

- `http://localhost:4101/up`
- `http://localhost:4102/up`
- `http://localhost:4103/up`
- `grpc://localhost:5101` for Bazel/Buck2 REAPI against `kura-us`
- `grpc://localhost:5102` for Bazel/Buck2 REAPI against `kura-eu`
- `grpc://localhost:5103` for Bazel/Buck2 REAPI against `kura-ap`
- `http://localhost:3000` for Grafana with `admin` / `admin`
- `http://localhost:9090` for Prometheus
- `http://localhost:3100` for Loki
- `http://localhost:3200` for Tempo

Supported cache protocols:

- `Bazel` and `Buck2`: Bazel Remote Execution API v2 over gRPC on `KURA_GRPC_PORT`
- `Nx`: self-hosted remote cache API on `GET/PUT /v1/cache/{hash}`
- `React Native Metro`: `HttpStore` / `HttpGetStore` on `GET/PUT /api/metro/cache/{cache_key}`

## Toolchain πŸ› οΈ

Install Rust from `mise.toml`:

```bash
mise trust mise.toml
mise install
```

Run tests:

```bash
mise x rust@1.94.1 -- cargo test
mise x shellspec@0.28.1 -- shellspec
```

Runtime configuration is summarized in the table under [Runtime Model And Limits](#-runtime-model-and-limits). Kura now derives sensible defaults for the main FD, memory, and metadata-store budgets at startup when you do not set them explicitly.

## πŸ—ΊοΈ Project Areas

Kura is easier to read by subsystem than by tutorial step. The sections below group the project by the main areas you operate or extend.

- πŸ”Œ [Protocol surfaces](#-protocol-surfaces)
- πŸ—„οΈ [Storage and replication](#-storage-and-replication)
- βš™οΈ [Runtime model and limits](#-runtime-model-and-limits)
- πŸ“Š [Observability](#-observability)
- πŸ“£ [Runtime analytics](#-runtime-analytics)
- ☸️ [Deployment options](#-deployment-options)
- 🧩 [Extensions and policy](#-extensions-and-policy)

## πŸ”Œ Protocol Surfaces

Kura exposes multiple cache protocols behind one service:

The following HTTP surfaces are Tuist-specific client protocols. They exist so Tuist clients can talk to Kura without changing their cache behavior:

- 🍎 `Xcode CAS`: `POST/GET /api/cache/cas/{id}?tenant_id=...&namespace_id=...`
- πŸ—‚οΈ `Keyvalue / action-cache style entries`: `PUT /api/cache/keyvalue?tenant_id=...&namespace_id=...`
- 🐘 `Gradle`: `PUT/GET /api/cache/gradle/{cache_key}?tenant_id=...&namespace_id=...`
- πŸ“¦ `Multipart module cache uploads`: `POST /api/cache/module/start?...`, `POST /api/cache/module/part?...`, `POST /api/cache/module/complete?...`, `HEAD/GET /api/cache/module/{id}?...`

Kura also exposes broader ecosystem protocols that are not specific to Tuist:

- 🧱 `Nx`: `PUT/GET /v1/cache/{hash}`
- πŸ“± `Metro`: `PUT/GET /api/metro/cache/{cache_key}`
- πŸ› οΈ `Bazel` and `Buck2`: REAPI over gRPC on `KURA_GRPC_PORT`

The local compose stack is still the quickest way to exercise all of those surfaces together:

```bash
docker compose up --build -d
```

Example Xcode artifact round trip:

```bash
curl -X POST \
"http://localhost:4101/api/cache/cas/artifact-1?tenant_id=acme&namespace_id=ios" \
-H "content-type: application/octet-stream" \
--data-binary "xcode-binary"

curl \
"http://localhost:4102/api/cache/cas/artifact-1?tenant_id=acme&namespace_id=ios"
```

Example keyvalue entry round trip:

```bash
curl -X PUT \
"http://localhost:4101/api/cache/keyvalue?tenant_id=acme&namespace_id=ios" \
-H "content-type: application/json" \
-d '{"cas_id":"cas-1","entries":[{"value":"hello"},{"value":"world"}]}'

curl \
"http://localhost:4103/api/cache/keyvalue/cas-1?tenant_id=acme&namespace_id=ios"
```

## πŸ—„οΈ Storage And Replication

Kura splits storage into two planes:

- πŸͺ¨ RocksDB stores metadata, keyvalue payloads, multipart state, tombstones, segment lifecycle state, and the replication outbox.
- πŸ“¦ Segment files store large immutable binary artifacts for the hot path.

Replication is leaderless and eventually consistent:

- πŸ” local writes become durable together with their outbox work
- 🌍 peers bootstrap by pulling manifests, tombstones, and artifact bodies
- πŸ”Ž DNS discovery can expand the peer set automatically
- 🧠 the outbox is processed incrementally so queue depth does not blow up heap usage during backlog

Peer-to-peer mTLS is available for the internal plane:

- `KURA_INTERNAL_PORT`
- `KURA_INTERNAL_TLS_CA_CERT_PATH`
- `KURA_INTERNAL_TLS_CERT_PATH`
- `KURA_INTERNAL_TLS_KEY_PATH`

When peer mTLS is enabled:

- πŸ”’ `KURA_NODE_URL` and every value in `KURA_PEERS` must use `https://...:`
- 🌍 the public API still stays on `KURA_PORT`
- 🧱 `/_internal/*` is only served on the internal mTLS listener
- πŸͺͺ the certificate configured through `KURA_INTERNAL_TLS_CERT_PATH` should be valid for both server and client auth
- 🏷️ the certificate SANs must cover the hostname used in `KURA_NODE_URL`

## βš™οΈ Runtime Model And Limits

Kura is designed around explicit resource budgets instead of relying on ambient process limits.

When `Optional` is `Yes`, the `Default` column shows what Kura uses today. `auto` means Kura derives the value at startup from detected file-descriptor limits, memory limits, or CPU count.

| Name | Description | Optional | Default |
| --- | --- | --- | --- |
| `KURA_PORT` | Public HTTP port. | No | `β€”` |
| `KURA_GRPC_PORT` | gRPC port for REAPI. | No | `β€”` |
| `KURA_TENANT_ID` | Default tenant identifier for the node. | No | `β€”` |
| `KURA_REGION` | Region label advertised in metrics and replication state. | No | `β€”` |
| `KURA_TMP_DIR` | Temporary directory for staged request bodies and multipart assembly. | No | `β€”` |
| `KURA_DATA_DIR` | Persistent directory for metadata state and segment files. | No | `β€”` |
| `KURA_NODE_URL` | Canonical URL other peers use to reach this node. | No | `β€”` |
| `KURA_PEERS` | Seed peer list used before discovery converges. | Yes | `KURA_NODE_URL` |
| `KURA_DISCOVERY_DNS_NAME` | DNS name to probe for automatic peer discovery. | Yes | disabled |
| `KURA_FILE_DESCRIPTOR_POOL_SIZE` | App-managed file-descriptor budget for request and background I/O. | Yes | auto |
| `KURA_FILE_DESCRIPTOR_ACQUIRE_TIMEOUT_MS` | How long a request waits before FD backpressure fails the checkout. | Yes | `5000` |
| `KURA_SEGMENT_HANDLE_CACHE_SIZE` | Maximum number of pinned segment read handles; must stay below the FD pool size. | Yes | auto |
| `KURA_MEMORY_SOFT_LIMIT_BYTES` | Soft watermark where Kura starts shedding optional memory use. | Yes | auto |
| `KURA_MEMORY_HARD_LIMIT_BYTES` | Hard watermark where Kura pauses replication work and trims hot caches aggressively. | Yes | auto |
| `KURA_MANIFEST_CACHE_MAX_BYTES` | Maximum size of the in-memory manifest hot cache. | Yes | auto |
| `KURA_MAX_KEYVALUE_BYTES` | Maximum per-request keyvalue payload size on public and replication APIs. | Yes | `1048576` |
| `KURA_METADATA_STORE_MAX_OPEN_FILES` | Descriptor budget reserved for the metadata store itself. | Yes | auto |
| `KURA_METADATA_STORE_MAX_BACKGROUND_JOBS` | Background flush and compaction concurrency for the metadata store. | Yes | auto |
| `KURA_METADATA_STORE_READ_CACHE_BYTES` | Capacity of the metadata-store read cache. | Yes | auto |
| `KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES` | Total memory budget reserved for metadata write buffering. | Yes | auto |
| `KURA_METADATA_STORE_WRITE_BUFFER_BYTES` | Size of each metadata write buffer before flush. | Yes | auto |
| `KURA_METADATA_STORE_MAX_WRITE_BUFFERS` | Maximum number of metadata write buffers kept in memory. | Yes | auto |

Auto-derived defaults currently follow these rules:

- `file_descriptor_limit` comes from `RLIMIT_NOFILE` when available, otherwise Kura falls back to a conservative host default.
- `memory_limit_bytes` comes from the cgroup memory limit when available, otherwise Kura falls back to physical host memory.
- `cpu_count` comes from detected parallelism via the runtime.
- `KURA_FILE_DESCRIPTOR_POOL_SIZE` is `usable_fds / 8`, clamped to `[64, 256]`, where `usable_fds` is the detected FD limit minus reserved headroom.
- `KURA_SEGMENT_HANDLE_CACHE_SIZE` is `KURA_FILE_DESCRIPTOR_POOL_SIZE / 4`, clamped to `[16, 64]`, and then capped below the FD pool so transient work keeps headroom.
- `KURA_MEMORY_SOFT_LIMIT_BYTES` is `70%` of detected memory, rounded down to MiB boundaries, with a minimum of `128 MiB`.
- `KURA_MEMORY_HARD_LIMIT_BYTES` is `85%` of detected memory, rounded down to MiB boundaries, and always at least `64 MiB` above the soft limit.
- `KURA_MANIFEST_CACHE_MAX_BYTES` is `KURA_MEMORY_SOFT_LIMIT_BYTES / 16`, rounded down to MiB boundaries and clamped to `[8 MiB, 64 MiB]`.
- `KURA_METADATA_STORE_MAX_OPEN_FILES` is `usable_fds / 2`, clamped to `[128, 1024]`.
- `KURA_METADATA_STORE_MAX_BACKGROUND_JOBS` is `cpu_count`, clamped to `[1, 8]`.
- `KURA_METADATA_STORE_READ_CACHE_BYTES` is `memory_limit_bytes / 32`, rounded down to MiB boundaries and clamped to `[16 MiB, 128 MiB]`.
- `KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES` follows the same `memory_limit_bytes / 32` rule as the metadata-store read cache.
- `KURA_METADATA_STORE_WRITE_BUFFER_BYTES` is `KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES / 4`, rounded down to MiB boundaries and clamped to `[4 MiB, 32 MiB]`.
- `KURA_METADATA_STORE_MAX_WRITE_BUFFERS` is `KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES / KURA_METADATA_STORE_WRITE_BUFFER_BYTES`, clamped to `[2, 8]`.
- `KURA_MAX_KEYVALUE_BYTES` defaults to `1048576`, and `KURA_FILE_DESCRIPTOR_ACQUIRE_TIMEOUT_MS` defaults to `5000`.

A minimal direct-binary deployment still looks like:

```bash
KURA_PORT=4000 \
KURA_GRPC_PORT=50051 \
KURA_TENANT_ID=default \
KURA_REGION=eu-central \
KURA_TMP_DIR=/tmp/kura \
KURA_DATA_DIR=/var/cache/kura \
KURA_NODE_URL=http://cache-1.internal:4000 \
KURA_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector:4318/v1/traces \
KURA_OTEL_SERVICE_NAME=kura-eu-central \
KURA_OTEL_DEPLOYMENT_ENVIRONMENT=production \
./target/release/kura
```

## πŸ“Š Observability

Kura ships with a fairly complete local observability story:

- πŸ“ˆ Prometheus metrics
- πŸ“‰ Grafana dashboards
- πŸͺ΅ Loki and Promtail logs
- 🧭 Tempo traces

Prometheus exposes live metadata-store memory gauges:

- `kura_rocksdb_block_cache_usage_bytes`
- `kura_rocksdb_block_cache_pinned_usage_bytes`
- `kura_rocksdb_block_cache_capacity_bytes`
- `kura_rocksdb_write_buffer_usage_bytes`
- `kura_rocksdb_write_buffer_capacity_bytes`

Kura also exports:

- πŸ“¦ artifact read and write counters by `kind`, `client`, `artifact_class`, and `result`
- πŸ” replication latency and result metrics
- πŸ’Ύ file descriptor pool pressure metrics
- 🧠 manifest cache occupancy and admission metrics

## πŸ“£ Runtime Analytics

Analytics webhooks are a separate optional subsystem that mirrors the older Tuist cache contract for Xcode and Gradle traffic.

When enabled:

- 🍎 Xcode upload and download events are sent to `/webhooks/cache`
- 🐘 Gradle upload and download events are sent to `/webhooks/gradle-cache`
- ✍️ requests are signed with `x-cache-signature`
- 🧭 requests also include `x-cache-endpoint`
- πŸͺΆ delivery stays in-memory and best-effort, so analytics never block the hot path
- 🧯 a per-pipeline circuit breaker opens after repeated delivery failures so Kura sheds analytics instead of backing up under a misbehaving upstream

Configure it with:

- `KURA_ANALYTICS_SERVER_URL`
- `KURA_ANALYTICS_SIGNING_KEY`
- optional `KURA_ANALYTICS_BATCH_SIZE` default `100`
- optional `KURA_ANALYTICS_BATCH_TIMEOUT_MS` default `5000`
- optional `KURA_ANALYTICS_QUEUE_CAPACITY` default `1000`
- optional `KURA_ANALYTICS_REQUEST_TIMEOUT_MS` default `5000`
- optional `KURA_ANALYTICS_CIRCUIT_BREAKER_FAILURE_THRESHOLD` default `5`
- optional `KURA_ANALYTICS_CIRCUIT_BREAKER_OPEN_MS` default `30000`

It also exposes analytics-specific runtime metrics for:

- πŸ“£ queue depth and drops
- πŸ“¦ batch sizes and flush outcomes
- 🧯 circuit-breaker state and open events

## ☸️ Deployment Options

### Helm And Kubernetes

The repository includes a Helm chart at `ops/helm/kura` that deploys Kura as a `StatefulSet` with:

- πŸ’Ύ one PVC per pod for metadata-state and segment storage
- 🧭 a headless service for stable pod DNS and peer discovery
- 🌐 a regular service exposing both HTTP and gRPC
- πŸšͺ optional ingress for the HTTP API
- 🧩 optional inline extension script mounting through a `ConfigMap`
- πŸ” optional peer mTLS for `/_internal/*` traffic via a mounted Kubernetes `Secret`

Lint and render the chart:

```bash
helm lint ops/helm/kura
helm template kura ops/helm/kura --namespace kura
```

Install it on a generic cluster:

```bash
helm upgrade --install kura ./ops/helm/kura \
--namespace kura \
--create-namespace \
--set image.repository=ghcr.io/tuist/kura \
--set image.tag=latest \
--set config.region=fr-par \
--set config.telemetry.otlpTracesEndpoint=http://otel-collector.monitoring.svc.cluster.local:4318/v1/traces
```

For a local kind smoke test, the repo includes:

```bash
./test/e2e/kura_helm_kind.sh
```

To enable peer mTLS in Kubernetes, set:

- `peerTls.enabled=true`
- `peerTls.internalPort=`
- `peerTls.secretName=`

The referenced secret should contain the files configured by:

- `peerTls.caCertFileName`
- `peerTls.certFileName`
- `peerTls.keyFileName`

When enabled, the chart advertises peer URLs over `https` on the internal port and mounts the secret into `/etc/kura/peer-tls`.

### Scaleway Kapsule

For Scaleway, start from the bundled overrides in `ops/helm/kura/values-scaleway.yaml`:

```bash
helm upgrade --install kura ./ops/helm/kura \
--namespace kura \
--create-namespace \
-f ./ops/helm/kura/values-scaleway.yaml \
--set image.repository=ghcr.io/tuist/kura \
--set image.tag=latest \
--set config.region=fr-par \
--set config.telemetry.otlpTracesEndpoint=http://otel-collector.monitoring.svc.cluster.local:4318/v1/traces
```

That values file does two important things:

- πŸšͺ uses a `LoadBalancer` service, which is the simplest way to expose Kura on Kapsule
- πŸ’Ύ pins persistence to `scw-bssd`, which Scaleway documents as the default block storage class for Kapsule multi-AZ clusters

## 🧩 Extensions And Policy

Kura can load one operator-provided extension script at startup to customize authentication, authorization, and response headers without recompiling the binary.

Core env vars:

- `KURA_EXTENSION_ENABLED=true`
- `KURA_EXTENSION_SCRIPT_PATH=/etc/kura/extensions/hooks.lua`
- `KURA_EXTENSION_HOOK_TIMEOUT_MS=25`
- `KURA_EXTENSION_AUTH_CACHE_ALLOW_TTL_SECONDS=600`
- `KURA_EXTENSION_AUTH_CACHE_DENY_TTL_SECONDS=3`
- `KURA_EXTENSION_FAIL_CLOSED_AUTHENTICATE=true`
- `KURA_EXTENSION_FAIL_CLOSED_AUTHORIZE=true`
- `KURA_EXTENSION_FAIL_OPEN_RESPONSE_HEADERS=true`

Generic host resources are also env-driven:

- ✍️ signers:
- `KURA_EXTENSION_SIGNER__ALGORITHM`
- `KURA_EXTENSION_SIGNER__SECRET`
- πŸͺͺ JWT verifiers:
- `KURA_EXTENSION_JWT_VERIFIER__ALGORITHM`
- `KURA_EXTENSION_JWT_VERIFIER__SECRET`
- `KURA_EXTENSION_JWT_VERIFIER__ISSUER`
- `KURA_EXTENSION_JWT_VERIFIER__AUDIENCES`
- 🌐 HTTP clients:
- `KURA_EXTENSION_HTTP_CLIENT__BASE_URL`
- `KURA_EXTENSION_HTTP_CLIENT__CONNECT_TIMEOUT_MS`
- `KURA_EXTENSION_HTTP_CLIENT__REQUEST_TIMEOUT_MS`

The script may define these hooks:

- `authenticate(ctx)`
- `authorize(ctx, principal)`
- `response_headers(ctx, principal)`

The runtime keeps decision caching, metrics, timeouts, and cryptographic primitives in Rust, while the script supplies policy.