https://github.com/rocky-data/rocky
Rust SQL transformation engine with branches, replay, column-level lineage, compile-time type safety, and per-model cost attribution. Single static binary; adapters for Databricks, Snowflake, BigQuery, DuckDB. Apache 2.0.
https://github.com/rocky-data/rocky
column-lineage dagster data-contracts data-engineering data-lineage data-pipeline data-platform data-quality dbt-alternative rust schema-drift sql
Last synced: 6 days ago
JSON representation
Rust SQL transformation engine with branches, replay, column-level lineage, compile-time type safety, and per-model cost attribution. Single static binary; adapters for Databricks, Snowflake, BigQuery, DuckDB. Apache 2.0.
- Host: GitHub
- URL: https://github.com/rocky-data/rocky
- Owner: rocky-data
- License: apache-2.0
- Created: 2026-04-13T22:23:28.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-16T00:43:26.000Z (8 days ago)
- Last Synced: 2026-05-16T00:52:48.206Z (8 days ago)
- Topics: column-lineage, dagster, data-contracts, data-engineering, data-lineage, data-pipeline, data-platform, data-quality, dbt-alternative, rust, schema-drift, sql
- Language: Rust
- Homepage: https://rocky-data.github.io/rocky/
- Size: 11.1 MB
- Stars: 252
- Watchers: 0
- Forks: 10
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
[](https://github.com/rocky-data/rocky/actions/workflows/engine-ci.yml)
[](https://github.com/rocky-data/rocky/actions/workflows/dagster-ci.yml)
[](https://github.com/rocky-data/rocky/actions/workflows/vscode-ci.yml)
[](LICENSE)
**Rocky** is a typed-program layer above the warehouse: branches, replay, column-level lineage, compile-time type safety, per-model cost attribution. Storage and compute stay with your warehouse — Databricks, Snowflake, BigQuery, or DuckDB. Apache 2.0.
## Try it in 60 seconds
```bash
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex
```
```bash
rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run
```
No credentials needed — the playground runs end-to-end on local DuckDB.
## See it in action
Each demo below is a self-contained POC in [`examples/playground/pocs/`](examples/playground/) — `cd` in, run `./run.sh`, reproduce locally.
### Detects schema drift the moment it happens
A source column type changes upstream. On the next run, Rocky diffs source vs. target, drops the target, and recreates it. No silent data corruption, no dbt-style quiet divergence.
[POC — `02-performance/06-schema-drift-recover`](examples/playground/pocs/02-performance/06-schema-drift-recover/)
### Enforces data contracts at compile time
Missing required columns, protected columns being removed, or unsafe type changes surface as diagnostic codes (`E010`, `E013`) before a single row is written.
[POC — `01-quality/01-data-contracts-strict`](examples/playground/pocs/01-quality/01-data-contracts-strict/)
### Named branches for risk-free experiments
Create a branch, run against it in an isolated schema, inspect, then drop or promote. Column-level lineage shows the downstream blast radius before you ship.
[POC — `00-foundations/06-branches-replay-lineage`](examples/playground/pocs/00-foundations/06-branches-replay-lineage/)
### Column-level lineage, not table-level
Trace a single column from a downstream fact back through its aggregations, all the way to the seed. Blast-radius analysis without reading every model.
[POC — `06-developer-experience/01-lineage-column-level`](examples/playground/pocs/06-developer-experience/01-lineage-column-level/)
### AI model generation with a compile-validate loop
Describe what you want in plain English. Rocky generates a Rocky DSL model, compiles it, and retries on parse failure — the `Attempts: 2` line shows the loop catching a first-pass error invisibly.
[POC — `03-ai/01-model-generation`](examples/playground/pocs/03-ai/01-model-generation/)
### PR-time blast-radius with `rocky lineage-diff`
Compare two git refs and get a per-changed-column readout of downstream consumers — pre-rendered Markdown drops straight into a GitHub PR comment. CODEOWNERS-style review tooling can't reach this granularity without a compiled engine.
[POC — `06-developer-experience/11-lineage-diff`](examples/playground/pocs/06-developer-experience/11-lineage-diff/)
### Classify columns, mask by environment, gate CI
Tag PII columns in the model sidecar; bind tags to mask strategies in `[mask]` / `[mask.]`. `rocky compliance --env prod --fail-on exception` exits 1 the moment a classified column has no resolved strategy — a one-line CI gate against accidentally-unmasked data.
[POC — `04-governance/05-classification-masking-compliance`](examples/playground/pocs/04-governance/05-classification-masking-compliance/)
### Incremental loads with persistent watermark state
`strategy = "incremental"` plus a `timestamp_column` is all it takes. Rocky writes the high-water mark to the embedded state store; subsequent runs only `INSERT … WHERE timestamp > watermark`. Append 25 rows after a 500-row load — run 2 still finishes in 0.2s.
[POC — `02-performance/01-incremental-watermark`](examples/playground/pocs/02-performance/01-incremental-watermark/)
## Subprojects
| Path | Artifact | Language | Description |
|---|---|---|---|
| [`engine/`](engine/) | `rocky` CLI binary | Rust | Core SQL transformation engine — 21-crate Cargo workspace |
| [`integrations/dagster/`](integrations/dagster/) | `dagster-rocky` PyPI wheel | Python | Dagster resource and component wrapping the Rocky CLI |
| [`editors/vscode/`](editors/vscode/) | Rocky VSIX | TypeScript | VS Code extension — LSP client + commands for AI features |
| [`examples/playground/`](examples/playground/) | (config only) | TOML / SQL | Self-contained DuckDB sample pipeline used for smoke tests and benchmarks |
Each subproject has its own README with detailed usage. The [`engine/README.md`](engine/README.md) is the canonical product reference for the Rocky CLI.
## Adapters
| Role | Adapter | Status | Notes |
|------|---------|--------|-------|
| Warehouse | Databricks | Production | SQL Statement API · Unity Catalog · `SHALLOW CLONE` for branches |
| Warehouse | Snowflake | Beta | REST connector · zero-copy `CLONE` for branches · masking policies |
| Warehouse | BigQuery | Beta | REST connector · `CREATE TABLE … COPY` for branches |
| Warehouse | DuckDB | Local / Testing | Embedded · powers `rocky playground` (no credentials needed) |
| Warehouse | Trino | Beta | REST `/v1/statement` polling client · Basic + JWT auth · Docker conformance harness behind `trino-conformance` feature |
| Source | Fivetran | Production | REST connector + table discovery |
| Source | Airbyte | Beta | Catalog discovery |
| Source | Iceberg | Beta | REST catalog discovery of namespaces and tables |
| Source | Manual | Production | Schema/table lists inline in `rocky.toml` |
Building a warehouse Rocky doesn't ship in-tree (ClickHouse, Redshift, …)? See the [Adapter SDK guide](https://rocky-data.dev/guides/adapter-sdk/) and the [Rust-native skeleton POC](examples/playground/pocs/07-adapters/06-rust-native-adapter-skeleton/).
## Building from source
```bash
git clone https://github.com/rocky-data/rocky.git
cd rocky
just build # builds engine + dagster wheel + vscode extension
just test # runs all test suites
just lint # cargo clippy/fmt + ruff + eslint
```
`just` is optional — you can also build each subproject directly. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for per-subproject build commands.
## Releases
Each artifact is released independently using a tag-namespaced scheme:
- `engine-v*` → Rocky CLI binary (cross-compiled, on GitHub Releases)
- `dagster-v*` → `dagster-rocky` wheel
- `vscode-v*` → Rocky VSIX
See [`CONTRIBUTING.md`](CONTRIBUTING.md#releases) for the full release flow.
## Documentation
Full documentation: **[rocky-data.dev](https://rocky-data.dev)** — concepts, guides, CLI reference, Dagster integration, adapter SDK.
## Contributing
See [`CONTRIBUTING.md`](CONTRIBUTING.md). Before opening a PR, please read the cross-project change guidance — schema and DSL changes must update consumers atomically.
## Sponsoring
Rocky is free and open source. If it saves your team time, consider [sponsoring the project](https://github.com/sponsors/hugocorreia90) so development can continue.
## License
[Apache 2.0](LICENSE)