{"id":48830094,"url":"https://github.com/rocky-data/rocky","last_synced_at":"2026-06-06T22:01:02.266Z","repository":{"id":351377240,"uuid":"1209895914","full_name":"rocky-data/rocky","owner":"rocky-data","description":"The typed graph between your code and whichever warehouse, table format, or query engine you've chosen — typed compiler, branches, replay, column-level lineage, compile-time contracts, per-model cost. Adapters: Databricks, Snowflake, BigQuery, DuckDB. Single static Rust binary. Apache 2.0.","archived":false,"fork":false,"pushed_at":"2026-06-01T01:54:56.000Z","size":25887,"stargazers_count":264,"open_issues_count":9,"forks_count":12,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T02:07:12.205Z","etag":null,"topics":["column-lineage","dagster","data-contracts","data-engineering","data-lineage","data-pipeline","data-platform","data-quality","dbt-alternative","rust","schema-drift","sql"],"latest_commit_sha":null,"homepage":"https://rocky-data.github.io/rocky/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rocky-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["hugocorreia90"]}},"created_at":"2026-04-13T22:23:28.000Z","updated_at":"2026-06-01T01:47:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rocky-data/rocky","commit_stats":null,"previous_names":["rocky-data/rocky"],"tags_count":176,"template":false,"template_full_name":null,"purl":"pkg:github/rocky-data/rocky","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocky-data%2Frocky","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocky-data%2Frocky/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocky-data%2Frocky/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocky-data%2Frocky/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rocky-data","download_url":"https://codeload.github.com/rocky-data/rocky/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rocky-data%2Frocky/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34001197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-06T02:00:07.033Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["column-lineage","dagster","data-contracts","data-engineering","data-lineage","data-pipeline","data-platform","data-quality","dbt-alternative","rust","schema-drift","sql"],"created_at":"2026-04-14T20:00:51.553Z","updated_at":"2026-06-06T22:01:02.230Z","avatar_url":"https://github.com/rocky-data.png","language":"Rust","funding_links":["https://github.com/sponsors/hugocorreia90"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/rocky-readme-dark.svg\" /\u003e\n    \u003cimg src=\"docs/rocky-readme-light.svg\" alt=\"Rocky\" /\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n[![Engine CI](https://github.com/rocky-data/rocky/actions/workflows/engine-ci.yml/badge.svg)](https://github.com/rocky-data/rocky/actions/workflows/engine-ci.yml)\n[![Dagster CI](https://github.com/rocky-data/rocky/actions/workflows/dagster-ci.yml/badge.svg)](https://github.com/rocky-data/rocky/actions/workflows/dagster-ci.yml)\n[![VS Code CI](https://github.com/rocky-data/rocky/actions/workflows/vscode-ci.yml/badge.svg)](https://github.com/rocky-data/rocky/actions/workflows/vscode-ci.yml)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\n**Rocky is the typed graph between your code and whichever warehouse, table format, or query engine you've chosen.**\n\nIt is a typed compiler that runs over your existing Databricks, Snowflake, BigQuery, or DuckDB and owns the graph between your code and your data: named branches, content-addressed run records, column-level lineage, compile-time contracts, and per-model cost. Storage and compute stay where they are, and Rocky works on the SQL you already have. The `.rocky` DSL is there when you want it. Apache 2.0.\n\nThe failures that cost data teams the most are invisible to the warehouse and out of scope for the templating layer above it: schema drift, column-rename blast radius, dialect divergence, cost spikes nobody can attribute. Rocky turns them into compile errors and blocked PRs.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-quickstart.gif\" alt=\"Rocky quickstart — create a project, compile, and run 3 models in under 15s\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n## Try it in 60 seconds\n\n```bash\n# macOS / Linux\ncurl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash\n\n# Windows (PowerShell)\nirm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex\n```\n\n```bash\nrocky playground my-first-project\ncd my-first-project\nrocky compile \u0026\u0026 rocky test \u0026\u0026 rocky run\n```\n\nNo credentials needed; the playground runs end-to-end on local DuckDB.\n\n`rocky run` is the one-step path for local iteration and automation. For production or PR-gated deploys, split it into `rocky plan` (builds and persists an auditable plan to `.rocky/plans/\u003cid\u003e.json`) followed by `rocky apply \u003cplan-id\u003e`.\n\n## Who Rocky is for\n\nRocky is built first for **data platform engineers running production-critical, multi-tenant pipelines on Databricks**, where silent failures cost real money and Dagster is already the orchestrator. That is the launch wedge, and where Rocky is most battle-tested.\n\nThe next ring out: **Snowflake and BigQuery shops** evaluating SQLMesh, who want correctness in the compiler rather than the planner and prefer SQL by default. Adapters are Beta today; see [Where Rocky is today](#where-rocky-is-today) below.\n\n## See it in action\n\nEach demo below is a self-contained POC in [`examples/playground/pocs/`](examples/playground/): `cd` in, run `./run.sh`, and reproduce it locally.\n\n### Detects schema drift the moment it happens\n\nA source column type changes upstream. On the next run, Rocky diffs source against target, drops the target, and recreates it. No silent data corruption.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-drift-recover.gif\" alt=\"rocky run detects source type change and recreates the target\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `02-performance/06-schema-drift-recover`](examples/playground/pocs/02-performance/06-schema-drift-recover/)\n\n### Enforces data contracts at compile time\n\nMissing required columns, protected columns being removed, or unsafe type changes surface as diagnostic codes (`E010`, `E013`) before a single row is written.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-data-contracts.gif\" alt=\"rocky compile flags E010 and E013 contract violations on broken_metrics\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `01-quality/01-data-contracts-strict`](examples/playground/pocs/01-quality/01-data-contracts-strict/)\n\n### Named branches for risk-free experiments\n\nCreate a branch, run against it in an isolated schema, inspect, then drop or promote. Column-level lineage shows the downstream blast radius before you ship.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-branches-replay.gif\" alt=\"rocky branch create, run on branch, and trace column lineage downstream\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `00-foundations/06-branches-replay-lineage`](examples/playground/pocs/00-foundations/06-branches-replay-lineage/)\n\n### Column-level lineage, not table-level\n\nTrace a single column from a downstream fact back through its aggregations, all the way to the seed. Blast-radius analysis without reading every model.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-column-lineage.gif\" alt=\"rocky lineage --column traces fct_revenue.total back to seeds.orders.amount\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `06-developer-experience/01-lineage-column-level`](examples/playground/pocs/06-developer-experience/01-lineage-column-level/)\n\n### AI model generation with a compile-validate loop\n\nDescribe what you want in plain English. Rocky generates a Rocky DSL model, compiles it, and retries on parse failure. The `Attempts: 2` line shows the loop catching a first-pass error.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-ai-model-generation.gif\" alt=\"rocky ai generates a .rocky model from natural language intent, Attempts: 2\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `03-ai/01-model-generation`](examples/playground/pocs/03-ai/01-model-generation/)\n\n### PR-time blast-radius with `rocky lineage-diff`\n\nCompare two git refs and get a per-changed-column readout of downstream consumers; the pre-rendered Markdown drops straight into a GitHub PR comment. CODEOWNERS-style review tooling can't reach this granularity without a compiled engine.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-lineage-diff.gif\" alt=\"rocky lineage-diff main lists added and removed columns across two models with downstream consumers per change\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `06-developer-experience/11-lineage-diff`](examples/playground/pocs/06-developer-experience/11-lineage-diff/)\n\n### Classify columns, mask by environment, gate CI\n\nTag PII columns in the model sidecar, and bind tags to mask strategies in `[mask]` / `[mask.\u003cenv\u003e]`. `rocky compliance --env prod --fail-on exception` exits 1 the moment a classified column has no resolved strategy: a one-line CI gate against accidentally unmasked data.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-classification-masking.gif\" alt=\"rocky compliance rolls up classification tags to mask strategies; --fail-on exception exits 1, gating CI on unmasked PII\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `04-governance/05-classification-masking-compliance`](examples/playground/pocs/04-governance/05-classification-masking-compliance/)\n\n### Incremental loads with persistent watermark state\n\n`strategy = \"incremental\"` plus a `timestamp_column` is all it takes. Rocky writes the high-water mark to the embedded state store, and subsequent runs only `INSERT … WHERE timestamp \u003e watermark`. Append 25 rows after a 500-row load, and run 2 still finishes in 0.2s.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-incremental-watermark.gif\" alt=\"rocky run with incremental strategy: run 1 copies 500 rows; appended 25 rows; run 2 only copies the delta in 0.2s\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `02-performance/01-incremental-watermark`](examples/playground/pocs/02-performance/01-incremental-watermark/)\n\n### Native BigQuery: materialize live, cost to the byte\n\nSwap the adapter to BigQuery and the same project materializes a full-refresh `CREATE TABLE AS` against the live warehouse. Rocky's run receipt reports `bytes_scanned` and `cost_usd`, and a cross-check confirms that `bytes_scanned` matches BigQuery's own `totalBytesBilled` for the job, to the byte. The same models run against Snowflake or Databricks by changing only the adapter.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/public/demo-bigquery.gif\" alt=\"rocky run materializes a full-refresh table in BigQuery with a cost receipt, then a cross-check shows bytes_scanned equals BigQuery's totalBytesBilled\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[POC: `07-adapters/05-bigquery-native-queries`](examples/playground/pocs/07-adapters/05-bigquery-native-queries/) — the live path requires BigQuery credentials\n\n## In your editor\n\nThe same compiler runs as a language server inside VS Code, so you catch drift, type errors, and contract violations where you write the code, not just in CI.\n\nYour `.rocky` models compile to SQL live as you type, with type-aware hover, inline column types, and go-to-definition across the graph.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"editors/vscode/media/demo-compiledSql.gif\" alt=\"A Rocky DSL model on the left and its compiled SQL on the right, updating live as you type\" width=\"900\" /\u003e\n\u003c/p\u003e\n\nThe Inspector turns any model into a trust dashboard: schema, column-level lineage, tests, per-model cost, and a governance card that flags classified columns and unmasked PII.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"editors/vscode/media/demo-inspector.gif\" alt=\"The Rocky Inspector's Overview as a model trust dashboard, its Governance card flagging two classified columns with one left unmasked\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n[Install the VS Code extension →](https://marketplace.visualstudio.com/items?itemName=rocky-data.rocky)\n\n## Where Rocky is today\n\nThe trust primitives (compiler, branches, replay, lineage, contracts, cost attribution) are production-grade on Databricks. The rest is in progress:\n\n- **Databricks is the production target for 2026.** Snowflake, BigQuery, and Trino adapters are Beta: connection, execution, and the core run loop work, but conformance coverage is still growing. If your enterprise warehouse is Snowflake or BigQuery and you need it production-grade today, talk to us.\n- **AI is a growing surface, not a finished product.** The compile-validate loop (generate, type-check, auto-fix, then land) is shipped. The broader story (mass refactor across the DAG, auto-migration from a column-type change, schema-aware assertion generation) is on the roadmap.\n- **Iceberg.** REST-catalog source discovery is Beta. Content-addressed writes round-trip as Iceberg through Delta UniForm, shipped end-to-end. First-class Iceberg-native writes without the Delta intermediate are on the 2026 roadmap.\n- **No built-in semantic layer.** Rocky's typed IR is the right home for one. Today, integrate with Cube, the dbt Semantic Layer, or your existing metric store.\n- **Orchestration: Dagster is first-class.** A `rocky serve` standalone path exists; native Airflow and Prefect integrations are not yet shipped, so they're called from the CLI like any other binary.\n\nIf those gaps are blockers for your team, [open a discussion](https://github.com/rocky-data/rocky/discussions). The roadmap is shaped by where production pipelines are actually getting hurt.\n\n## How it compares to dbt Core\n\n| Disaster | What dbt Core does | What Rocky does |\n|---|---|---|\n| Upstream changes a column type | Silent; surfaces as a downstream failure later | `E013` at compile, blocks the PR |\n| Required column dropped from a contract | Caught at build time via `contract: enforced` | `E010` at compile, blocks the PR |\n| Column rename with unknown blast radius | `dbt docs` is post-hoc and table-level; dbt Cloud Enterprise adds column lineage in the UI, also post-hoc and not PR-blocking | `rocky lineage-diff` at PR time, column-level, downstream consumers listed, blocks the merge |\n| `SELECT *` pulls a new column you didn't expect | Silent | `P002` warning, downstream consumers named |\n| Snowflake-only function written for a Databricks project | No dialect-portability lint; runs in dev, fails in prod | `P001` dialect-portability lint at compile |\n| Run cost doubles, no one knows which model | No per-model cost attribution; reconstruct it from warehouse query history | `RunOutput.cost_summary` per model, every run |\n| Auditor asks: who changed `fct_revenue.amount`, when, and why? | Run history in dbt Cloud, but no content-addressed record of code and output | `rocky replay \u003crun_id\u003e`: a content-addressed record of the exact code and the output it produced |\n| Sev-2 at 3 AM, half the pipeline already ran | `dbt retry` resumes from the failed model; no within-run checkpoint or circuit breaker | `rocky run --resume-latest`: checkpoint, three-state circuit breaker, skip what succeeded |\n\nEach row is a real failure mode and a Rocky command that turns it into a non-event. The same primitives back every row: typed compiler, content-addressed state, column-level lineage, per-model cost.\n\ndbt Core defined this category, and `rocky import-dbt` converts a vanilla dbt project to Rocky in one command. In June 2026 dbt Labs open-sourced the Fusion runtime as dbt Core v2.0 (Rust, Apache 2.0, alpha); the recommended **Fusion** distribution adds SQL type-checking and column-level lineage on top, though it still templates with Jinja and its build-failing strict checks are opt-in. Neither dbt Core v2.0 nor Fusion ships named branches, a content-addressed run record, per-model cost as a first-class column, a cross-warehouse dialect lint, or declarative RBAC and masking; dbt's governance and cost features live in its paid platform, while Rocky's are open source under Apache 2.0.\n\n## Subprojects\n\n| Path | Artifact | Language | Description |\n|---|---|---|---|\n| [`engine/`](engine/) | `rocky` CLI binary | Rust | Core SQL transformation engine, 23-crate Cargo workspace |\n| [`integrations/dagster/`](integrations/dagster/) | `dagster-rocky` PyPI wheel | Python | Dagster resource and component wrapping the Rocky CLI |\n| [`editors/vscode/`](editors/vscode/) | Rocky VSIX | TypeScript | VS Code extension; LSP client + commands for AI features |\n| [`examples/playground/`](examples/playground/) | (config only) | TOML / SQL | Self-contained DuckDB sample pipeline used for smoke tests and benchmarks |\n\nEach subproject has its own README with detailed usage. The [`engine/README.md`](engine/README.md) is the canonical product reference for the Rocky CLI.\n\n## Adapters\n\n| Role | Adapter | Status | Notes |\n|------|---------|--------|-------|\n| Warehouse | Databricks | Production | SQL Statement API · Unity Catalog · schema-prefix branches (`SHALLOW CLONE` is a follow-up) |\n| Warehouse | Snowflake | Beta | REST connector · GRANT/REVOKE reconciliation · schema-prefix branches (zero-copy `CLONE` is a follow-up) |\n| Warehouse | BigQuery | Beta | REST connector · schema-prefix branches |\n| Warehouse | DuckDB | Local / Testing | Embedded · powers `rocky playground` (no credentials needed) |\n| Warehouse | Trino | Beta | REST `/v1/statement` polling client · Basic + JWT auth · Docker conformance harness behind `trino-conformance` feature |\n| Source | Fivetran | Production | REST connector + table discovery |\n| Source | Airbyte | Beta | Catalog discovery |\n| Source | Iceberg | Beta | REST catalog discovery of namespaces and tables |\n| Source | Manual | Production | Schema/table lists inline in `rocky.toml` |\n\nBuilding a warehouse Rocky doesn't ship in-tree (ClickHouse, Redshift, …)? See the [Adapter SDK guide](https://rocky-data.dev/guides/adapter-sdk/) and the [Rust-native skeleton POC](examples/playground/pocs/07-adapters/06-rust-native-adapter-skeleton/).\n\n## Building from source\n\n```bash\ngit clone https://github.com/rocky-data/rocky.git\ncd rocky\njust build       # builds engine + dagster wheel + vscode extension\njust test        # runs all test suites\njust lint        # cargo clippy/fmt + ruff + eslint\n```\n\n`just` is optional; you can also build each subproject directly. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for per-subproject build commands.\n\n## Releases\n\nEach artifact is released independently using a tag-namespaced scheme:\n\n- `engine-v*` → Rocky CLI binary (cross-compiled, on GitHub Releases)\n- `dagster-v*` → `dagster-rocky` wheel\n- `vscode-v*` → Rocky VSIX\n\nSee [`CONTRIBUTING.md`](CONTRIBUTING.md#releases) for the full release flow.\n\n## Documentation\n\nFull documentation lives at **[rocky-data.dev](https://rocky-data.dev)**: concepts, guides, CLI reference, Dagster integration, and the adapter SDK.\n\n## Contributing\n\nSee [`CONTRIBUTING.md`](CONTRIBUTING.md). Before opening a PR, please read the cross-project change guidance: schema and DSL changes must update consumers atomically.\n\n## Sponsoring\n\nRocky is free and open source. If it saves your team time, consider [sponsoring the project](https://github.com/sponsors/hugocorreia90) so development can continue.\n\n## License\n\n[Apache 2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocky-data%2Frocky","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frocky-data%2Frocky","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocky-data%2Frocky/lists"}