https://github.com/padak/juncture-engine

Last synced: 12 days ago
JSON representation

Host: GitHub
URL: https://github.com/padak/juncture-engine
Owner: padak
License: other
Created: 2026-04-17T07:37:50.000Z (13 days ago)
Default Branch: main
Last Pushed: 2026-04-17T22:02:23.000Z (13 days ago)
Last Synced: 2026-04-17T22:34:44.091Z (13 days ago)
Language: Python
Size: 108 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Roadmap: docs/ROADMAP.md

Awesome Lists containing this project

awesome-github-projects - juncture-engine - No description provided ⭐1 `Python` 🔥 (🔧 Utilities & Miscellaneous)

README

# Juncture

> Multi-backend SQL + Python transformation engine. Local-first,
> DuckDB-native.

**Status:** `v0.40.0`, early beta. Engine runs real workloads on
DuckDB end-to-end (pilot migration: 208 parquet seeds × 374 SQL
statements). Snowflake / BigQuery / JDBC adapters are Phase 2 —
stub only today.

One engine that replaces Keboola's four legacy transformation
components (`snowflake-transformation`, `python-transformation`,
`duckdb-transformation`, `dbt-transformation`). Code lives in git, SQL
and Python share one DAG, data tests are first-class, and every
workflow is callable from a stable JSON CLI built for agents.

*The name: a juncture is a meeting point. SQL meets Python in one
DAG; local DuckDB meets production warehouses via SQLGlot; four
legacy Keboola components collapse into one engine.*

Juncture web UI — DAG tab with Metadata sidebar showing governance, reliability sparkline, PII ring propagation

The DAG tab: shape encodes kind (seed / SQL / Python), border encodes last-run status, the pink ring propagates PII from flagged seeds. Sidebar shows Metadata + Source + Schema + Tests for the selected model.

## Install

```bash
git clone git@github.com:padak/juncture-engine.git juncture
cd juncture
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,pandas]"
```

## See it work first (60 seconds)

The fastest way to understand what Juncture does is to run an example
and open the browser UI. No scaffolding, no warehouse credentials.

```bash
# Build all models + run data tests against DuckDB.
juncture run --project examples/tutorial_shop --test

# Open the DAG + source viewer + run history.
juncture web --project examples/tutorial_shop
# -> http://127.0.0.1:8765
```

Then try a runtime parameter override — no SQL edits:

```bash
juncture run --project examples/tutorial_shop \
--var as_of=2026-01-20 --var lookback_days=7
```

### Or: let Claude drive (Claude Code plugin)

This repo ships a **Claude Code plugin** that teaches any Claude session
every Juncture mechanic: project shape, all five materializations, the
migration repair loop (`continue-on-error → diagnostics → sanitize`),
profile-based dev/staging/prod, the full `juncture.yaml` + `schema.yml`
reference, and a troubleshooting recipe book. Progressive disclosure
under the hood — lean `SKILL.md` as a navigation hub plus five
references the agent loads only when the task touches them. No giant
prompt dump in your context window.

Install once, use from any directory:

```bash
# Add this repo as a plugin marketplace, then install the plugin:
claude plugin marketplace add padak/juncture-engine
claude plugin install juncture@juncture-engine
```

Then ask Claude things like:

- *"Scaffold a Juncture project for an e-commerce shop with daily revenue and cohort retention."*
- *"I have a Snowflake transformation in `kbagent sync pull` format. Migrate it to DuckDB and fix the EXECUTE errors."*
- *"Add a `prod` profile that targets Snowflake while keeping `dev` on local DuckDB."*

Plugin source under [`plugins/juncture/`](plugins/juncture/),
marketplace manifest in [`.claude-plugin/marketplace.json`](.claude-plugin/marketplace.json).
When you clone this repo, the same skill auto-loads via the
project-level `.claude/skills/juncture` symlink — no install needed
inside the repo itself.

## Build your own project

Open [`docs/TUTORIAL.md`](docs/TUTORIAL.md). Four levels, each adds one
idea on top of the previous:

| Level | What you learn |
|---|---|
| **L1** | `juncture init` → drop CSVs → first `ref()` → `juncture run` |
| **L2** | `@transform` Python models in the same DAG as SQL |
| **L3** | `macros/*.sql` (shared expressions) + ephemeral models (shared dimensions) |
| **L4** | External parameters via `--var` and `juncture.yaml vars:` |

The tutorial's companion project lives at
[`examples/tutorial_shop/`](examples/tutorial_shop/) so you can
copy-paste-compare while you read.

## What ships today

### Engine (runs on DuckDB)

- **SQL + Python in one DAG.** A Python `@transform` can `ref()` a SQL
model and vice versa. One `juncture run` builds everything.
- **Materializations:** `table`, `view`, `incremental` (with
`_juncture_state` checkpoint), `ephemeral`, `execute` (multi-statement
as-is — used by migration tooling).
- **Parallelism by default.** Independent models run concurrently, layer
by layer. Intra-script parallel EXECUTE available for migrated bodies
(known race — forced to `parallelism: 1` for now).
- **Data tests are first-class.** `not_null`, `unique`, `relationships`,
`accepted_values`, plus custom SQL tests under `tests/`.
- **Seeds.** Single CSV (`seeds/*.csv`) or parquet directories
(`seeds/bucket/table/*.parquet`) with hybrid full-scan / sample type
inference and a sentinel detector for Keboola-style string columns.
- **Jinja macros** (when `jinja: true`): every `{% macro %}` under
`macros/` is globally available without `{% import %}` — dbt-style UX.
- **Model disable toggle**: `disabled: true` in `schema.yml` or
`--disable a,b` / `--enable-only x,y` at runtime.
- **Governance fields** in `schema.yml`: `owner`, `team`, `criticality`,
`sla.freshness_hours`, `docs`, `consumers`. Seeds carry `pii`,
`retention_days`, `source_system`.

### Browser UI (`juncture web`)

Stdlib HTTP server, no extras dependency, vendored cytoscape + prism +
markdown-it. Tabs:

- **DAG** — kind-distinguishable shapes (seed = parallelogram, SQL =
blue, Python = green) + border-encoded last-run status + PII ring
propagation from flagged seeds. Click a node for a Metadata / Source
/ Schema / Tests drilldown.
- **Runs** — history table + per-model drawer with every
`statement_errors` entry classified into buckets (type_mismatch /
conversion / missing_object / …).

Juncture web UI — Runs tab with history table and per-model drawer expanded to show data tests

The Runs tab: click any model row to expand its drawer with statement-level errors and the data tests filtered to that model.

- **Seeds** — format, inferred types, sentinel cache, parquet file
count.
- **Portfolio** — model × owner × SLA × 30-day attainment with
freshness/success breach pills.
- **Reliability** — per-tier SLA attainment + slowest-10 + top
failure buckets.
- **Project** — parsed `juncture.yaml`, rendered README, git HEAD.
- Download buttons in the DAG toolbar: `manifest.json`,
`manifest.openlineage.json`, and **`llm-knowledge.json`** — a
single-shot project snapshot (config + full source + seeds + latest
run) for paste-into-Claude workflows.

### Migration tooling

- **`juncture migrate keboola`** — raw Keboola config JSON → Juncture
project.
- **`juncture migrate sync-pull`** — reads the `kbagent sync pull`
filesystem layout (symlinked parquet pools, SQL scripts) and produces
a Juncture project with `EXECUTE` materialization.
- **`juncture sql split-execute`** — rewrites a multi-statement EXECUTE
monolith into one `.sql` per CTAS target with auto-inferred `ref()`
dependencies.
- **`juncture sql translate`** — SQLGlot dialect translation
(Snowflake → DuckDB, etc.) with schema-aware type annotation and AST
passes (`harmonize_case_types`, `harmonize_binary_ops`,
`fix_timestamp_arithmetic`).
- **`juncture debug diagnostics`** — regex-based classifier for DuckDB
error messages; buckets + subcategories + fix hints for the next
repair iteration.

### Agent surface

- **Claude Code plugin** at [`plugins/juncture/`](plugins/juncture/) —
install via `claude plugin install juncture@juncture-engine` after
adding `padak/juncture-engine` as a marketplace. Skill content lives
at `plugins/juncture/skills/juncture/` (lean `SKILL.md` + 5
`references/`); also symlinked into `.claude/skills/` for
auto-loading inside this repo.
- **Stable JSON CLI**: `juncture compile --json`, `juncture run
--json`, structured manifest with DAG + columns + tests.
- **MCP server** skeleton under `juncture.mcp.server` (not yet shipping
— Phase 3).
- **`/api/llm-knowledge`** — one JSON with everything a model needs to
reason about the project (see Browser UI above).

## Development plan

Rationale in [`docs/VISION.md`](docs/VISION.md), task list in
[`docs/ROADMAP.md`](docs/ROADMAP.md).

### Phase 2 — adapters and Keboola

- **Warehouse adapters:** Snowflake, BigQuery, JDBC. Stubs exist;
real `materialize_sql`, `MERGE INTO` incrementals, and
Arrow-backed `fetch_ref` are the Phase 2 scope. Unlocks the
one-project-many-backends story below.
- **Keboola component:** Docker wrapper runs today against static
Storage exports; real SAPI output upload + dev/prod branch mapping
land here.
- **OpenLineage runtime emitter:** skeleton in
`juncture.observability.lineage`; static export via
`/api/manifest/openlineage` already works.

### Phase 4 — differentiators

- **Backend arbitrage via dialect translation.** The same project
runs on DuckDB locally and on Snowflake / BigQuery / JDBC in
production. SQLGlot handles the dialect diff; authors write one
SQL and let the engine target whichever backend fits the workload.
Ships with Phase 2 adapters.
- **Virtual data environments.** Hashes of model attributes create
snapshot tables; promoting to prod is a pointer swap. Dev branches
get a real isolated dataset with zero recompute until a model
changes. Combined with Keboola dev branches: every Keboola branch
becomes a real, free environment.
- **AI dialect arbitrage.** The engine auto-switches DuckDB ↔
warehouse based on data size and cost. Small slice fits in RAM
→ run on DuckDB for free. Slice grows → spill to Snowflake /
BigQuery transparently, same project, no user decision. A laptop
stays a laptop until the data genuinely needs a warehouse.
- **Semantic / metrics layer.** Express "active customer",
"monthly recurring revenue", "EU region" once; consume the same
definition from SQL models, Python models, and BI tools. The
metric and the transformation that produces it live in one file.
- **Agentic authoring.** A prompt like _"build me a daily orders
dashboard"_ scaffolds, runs, tests, and iterates a project
end-to-end. Builds on the already-shipping agent surface:
`juncture compile --json`, `/api/llm-knowledge` (one-JSON
project snapshot for LLM context), the Claude Agent Skill, and
the MCP server (Phase 3). Goal: a working pipeline from a
sentence, without opening a Jinja macro.

Each item ships after it's been tested on a real production workload.

## Example minimal Python model

```python
# models/revenue_summary.py
from juncture import transform

@transform(depends_on=["stg_orders"])
def revenue_summary(ctx):
orders = ctx.ref("stg_orders").to_pandas()
window = int(ctx.vars("lookback_days", 30))
return (
orders[orders["order_date"] >= orders["order_date"].max()
- pd.Timedelta(days=window)]
.groupby("country")["amount"]
.sum()
.reset_index()
)
```

The function receives a `TransformContext`; `ctx.ref(name)` returns an
Arrow Table, `ctx.vars(key, default)` reads the same external params
as SQL's `{{ var('key') }}`.

## Doc map

- [`docs/VISION.md`](docs/VISION.md) — what + why. Stable reference.
- [`docs/TUTORIAL.md`](docs/TUTORIAL.md) — L1→L4 onboarding narrative.
- [`docs/CONFIGURATION.md`](docs/CONFIGURATION.md) — `juncture.yaml`,
`.env`, `schema.yml`, seeds, macros, parallel EXECUTE.
- [`docs/DESIGN.md`](docs/DESIGN.md) — architecture (Project, DAG,
Adapter, Executor, Testing, Seeds, Migration).
- [`docs/ROADMAP.md`](docs/ROADMAP.md) — phased task list.

## License

Apache 2.0. See [`LICENSE`](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/padak/juncture-engine

Awesome Lists containing this project

README