{"id":49007459,"url":"https://github.com/padak/juncture-engine","last_synced_at":"2026-04-18T21:01:09.240Z","repository":{"id":352097602,"uuid":"1213256274","full_name":"padak/juncture-engine","owner":"padak","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-17T22:02:23.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-17T22:34:44.091Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/padak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-17T07:37:50.000Z","updated_at":"2026-04-17T08:10:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/padak/juncture-engine","commit_stats":null,"previous_names":["padak/juncture-engine"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/padak/juncture-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padak%2Fjuncture-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padak%2Fjuncture-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padak%2Fjuncture-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padak%2Fjuncture-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/padak","download_url":"https://codeload.github.com/padak/juncture-engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/padak%2Fjuncture-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31984557,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"ssl_error","status_checked_at":"2026-04-18T20:23:29.375Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-18T21:01:01.960Z","updated_at":"2026-04-18T21:01:09.233Z","avatar_url":"https://github.com/padak.png","language":"Python","readme":"# Juncture\n\n\u003e Multi-backend SQL + Python transformation engine. Local-first,\n\u003e DuckDB-native.\n\n**Status:** `v0.40.0`, early beta. Engine runs real workloads on\nDuckDB end-to-end (pilot migration: 208 parquet seeds × 374 SQL\nstatements). Snowflake / BigQuery / JDBC adapters are Phase 2 —\nstub only today.\n\nOne engine that replaces Keboola's four legacy transformation\ncomponents (`snowflake-transformation`, `python-transformation`,\n`duckdb-transformation`, `dbt-transformation`). Code lives in git, SQL\nand Python share one DAG, data tests are first-class, and every\nworkflow is callable from a stable JSON CLI built for agents.\n\n*The name: a juncture is a meeting point. SQL meets Python in one\nDAG; local DuckDB meets production warehouses via SQLGlot; four\nlegacy Keboola components collapse into one engine.*\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/images/web1.png\" alt=\"Juncture web UI — DAG tab with Metadata sidebar showing governance, reliability sparkline, PII ring propagation\" width=\"880\"\u003e\n  \u003cbr\u003e\u003cem\u003eThe DAG tab: shape encodes kind (seed / SQL / Python), border encodes last-run status, the pink ring propagates PII from flagged seeds. Sidebar shows Metadata + Source + Schema + Tests for the selected model.\u003c/em\u003e\n\u003c/p\u003e\n\n## Install\n\n```bash\ngit clone git@github.com:padak/juncture-engine.git juncture\ncd juncture\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e \".[dev,pandas]\"\n```\n\n## See it work first (60 seconds)\n\nThe fastest way to understand what Juncture does is to run an example\nand open the browser UI. No scaffolding, no warehouse credentials.\n\n```bash\n# Build all models + run data tests against DuckDB.\njuncture run --project examples/tutorial_shop --test\n\n# Open the DAG + source viewer + run history.\njuncture web --project examples/tutorial_shop\n# -\u003e http://127.0.0.1:8765\n```\n\nThen try a runtime parameter override — no SQL edits:\n\n```bash\njuncture run --project examples/tutorial_shop \\\n  --var as_of=2026-01-20 --var lookback_days=7\n```\n\n### Or: let Claude drive (Claude Code plugin)\n\nThis repo ships a **Claude Code plugin** that teaches any Claude session\nevery Juncture mechanic: project shape, all five materializations, the\nmigration repair loop (`continue-on-error → diagnostics → sanitize`),\nprofile-based dev/staging/prod, the full `juncture.yaml` + `schema.yml`\nreference, and a troubleshooting recipe book. Progressive disclosure\nunder the hood — lean `SKILL.md` as a navigation hub plus five\nreferences the agent loads only when the task touches them. No giant\nprompt dump in your context window.\n\nInstall once, use from any directory:\n\n```bash\n# Add this repo as a plugin marketplace, then install the plugin:\nclaude plugin marketplace add padak/juncture-engine\nclaude plugin install juncture@juncture-engine\n```\n\nThen ask Claude things like:\n\n- *\"Scaffold a Juncture project for an e-commerce shop with daily revenue and cohort retention.\"*\n- *\"I have a Snowflake transformation in `kbagent sync pull` format. Migrate it to DuckDB and fix the EXECUTE errors.\"*\n- *\"Add a `prod` profile that targets Snowflake while keeping `dev` on local DuckDB.\"*\n\nPlugin source under [`plugins/juncture/`](plugins/juncture/),\nmarketplace manifest in [`.claude-plugin/marketplace.json`](.claude-plugin/marketplace.json).\nWhen you clone this repo, the same skill auto-loads via the\nproject-level `.claude/skills/juncture` symlink — no install needed\ninside the repo itself.\n\n## Build your own project\n\nOpen [`docs/TUTORIAL.md`](docs/TUTORIAL.md). Four levels, each adds one\nidea on top of the previous:\n\n| Level | What you learn |\n|---|---|\n| **L1** | `juncture init` → drop CSVs → first `ref()` → `juncture run` |\n| **L2** | `@transform` Python models in the same DAG as SQL |\n| **L3** | `macros/*.sql` (shared expressions) + ephemeral models (shared dimensions) |\n| **L4** | External parameters via `--var` and `juncture.yaml vars:` |\n\nThe tutorial's companion project lives at\n[`examples/tutorial_shop/`](examples/tutorial_shop/) so you can\ncopy-paste-compare while you read.\n\n## What ships today\n\n### Engine (runs on DuckDB)\n\n- **SQL + Python in one DAG.** A Python `@transform` can `ref()` a SQL\n  model and vice versa. One `juncture run` builds everything.\n- **Materializations:** `table`, `view`, `incremental` (with\n  `_juncture_state` checkpoint), `ephemeral`, `execute` (multi-statement\n  as-is — used by migration tooling).\n- **Parallelism by default.** Independent models run concurrently, layer\n  by layer. Intra-script parallel EXECUTE available for migrated bodies\n  (known race — forced to `parallelism: 1` for now).\n- **Data tests are first-class.** `not_null`, `unique`, `relationships`,\n  `accepted_values`, plus custom SQL tests under `tests/`.\n- **Seeds.** Single CSV (`seeds/*.csv`) or parquet directories\n  (`seeds/bucket/table/*.parquet`) with hybrid full-scan / sample type\n  inference and a sentinel detector for Keboola-style string columns.\n- **Jinja macros** (when `jinja: true`): every `{% macro %}` under\n  `macros/` is globally available without `{% import %}` — dbt-style UX.\n- **Model disable toggle**: `disabled: true` in `schema.yml` or\n  `--disable a,b` / `--enable-only x,y` at runtime.\n- **Governance fields** in `schema.yml`: `owner`, `team`, `criticality`,\n  `sla.freshness_hours`, `docs`, `consumers`. Seeds carry `pii`,\n  `retention_days`, `source_system`.\n\n### Browser UI (`juncture web`)\n\nStdlib HTTP server, no extras dependency, vendored cytoscape + prism +\nmarkdown-it. Tabs:\n\n- **DAG** — kind-distinguishable shapes (seed = parallelogram, SQL =\n  blue, Python = green) + border-encoded last-run status + PII ring\n  propagation from flagged seeds. Click a node for a Metadata / Source\n  / Schema / Tests drilldown.\n- **Runs** — history table + per-model drawer with every\n  `statement_errors` entry classified into buckets (type_mismatch /\n  conversion / missing_object / …).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/images/web2.png\" alt=\"Juncture web UI — Runs tab with history table and per-model drawer expanded to show data tests\" width=\"880\"\u003e\n  \u003cbr\u003e\u003cem\u003eThe Runs tab: click any model row to expand its drawer with statement-level errors and the data tests filtered to that model.\u003c/em\u003e\n\u003c/p\u003e\n\n- **Seeds** — format, inferred types, sentinel cache, parquet file\n  count.\n- **Portfolio** — model × owner × SLA × 30-day attainment with\n  freshness/success breach pills.\n- **Reliability** — per-tier SLA attainment + slowest-10 + top\n  failure buckets.\n- **Project** — parsed `juncture.yaml`, rendered README, git HEAD.\n- Download buttons in the DAG toolbar: `manifest.json`,\n  `manifest.openlineage.json`, and **`llm-knowledge.json`** — a\n  single-shot project snapshot (config + full source + seeds + latest\n  run) for paste-into-Claude workflows.\n\n### Migration tooling\n\n- **`juncture migrate keboola`** — raw Keboola config JSON → Juncture\n  project.\n- **`juncture migrate sync-pull`** — reads the `kbagent sync pull`\n  filesystem layout (symlinked parquet pools, SQL scripts) and produces\n  a Juncture project with `EXECUTE` materialization.\n- **`juncture sql split-execute`** — rewrites a multi-statement EXECUTE\n  monolith into one `.sql` per CTAS target with auto-inferred `ref()`\n  dependencies.\n- **`juncture sql translate`** — SQLGlot dialect translation\n  (Snowflake → DuckDB, etc.) with schema-aware type annotation and AST\n  passes (`harmonize_case_types`, `harmonize_binary_ops`,\n  `fix_timestamp_arithmetic`).\n- **`juncture debug diagnostics`** — regex-based classifier for DuckDB\n  error messages; buckets + subcategories + fix hints for the next\n  repair iteration.\n\n### Agent surface\n\n- **Claude Code plugin** at [`plugins/juncture/`](plugins/juncture/) —\n  install via `claude plugin install juncture@juncture-engine` after\n  adding `padak/juncture-engine` as a marketplace. Skill content lives\n  at `plugins/juncture/skills/juncture/` (lean `SKILL.md` + 5\n  `references/`); also symlinked into `.claude/skills/` for\n  auto-loading inside this repo.\n- **Stable JSON CLI**: `juncture compile --json`, `juncture run\n  --json`, structured manifest with DAG + columns + tests.\n- **MCP server** skeleton under `juncture.mcp.server` (not yet shipping\n  — Phase 3).\n- **`/api/llm-knowledge`** — one JSON with everything a model needs to\n  reason about the project (see Browser UI above).\n\n## Development plan\n\nRationale in [`docs/VISION.md`](docs/VISION.md), task list in\n[`docs/ROADMAP.md`](docs/ROADMAP.md).\n\n### Phase 2 — adapters and Keboola\n\n- **Warehouse adapters:** Snowflake, BigQuery, JDBC. Stubs exist;\n  real `materialize_sql`, `MERGE INTO` incrementals, and\n  Arrow-backed `fetch_ref` are the Phase 2 scope. Unlocks the\n  one-project-many-backends story below.\n- **Keboola component:** Docker wrapper runs today against static\n  Storage exports; real SAPI output upload + dev/prod branch mapping\n  land here.\n- **OpenLineage runtime emitter:** skeleton in\n  `juncture.observability.lineage`; static export via\n  `/api/manifest/openlineage` already works.\n\n### Phase 4 — differentiators\n\n- **Backend arbitrage via dialect translation.** The same project\n  runs on DuckDB locally and on Snowflake / BigQuery / JDBC in\n  production. SQLGlot handles the dialect diff; authors write one\n  SQL and let the engine target whichever backend fits the workload.\n  Ships with Phase 2 adapters.\n- **Virtual data environments.** Hashes of model attributes create\n  snapshot tables; promoting to prod is a pointer swap. Dev branches\n  get a real isolated dataset with zero recompute until a model\n  changes. Combined with Keboola dev branches: every Keboola branch\n  becomes a real, free environment.\n- **AI dialect arbitrage.** The engine auto-switches DuckDB ↔\n  warehouse based on data size and cost. Small slice fits in RAM\n  → run on DuckDB for free. Slice grows → spill to Snowflake /\n  BigQuery transparently, same project, no user decision. A laptop\n  stays a laptop until the data genuinely needs a warehouse.\n- **Semantic / metrics layer.** Express \"active customer\",\n  \"monthly recurring revenue\", \"EU region\" once; consume the same\n  definition from SQL models, Python models, and BI tools. The\n  metric and the transformation that produces it live in one file.\n- **Agentic authoring.** A prompt like _\"build me a daily orders\n  dashboard\"_ scaffolds, runs, tests, and iterates a project\n  end-to-end. Builds on the already-shipping agent surface:\n  `juncture compile --json`, `/api/llm-knowledge` (one-JSON\n  project snapshot for LLM context), the Claude Agent Skill, and\n  the MCP server (Phase 3). Goal: a working pipeline from a\n  sentence, without opening a Jinja macro.\n\nEach item ships after it's been tested on a real production workload.\n\n## Example minimal Python model\n\n```python\n# models/revenue_summary.py\nfrom juncture import transform\n\n\n@transform(depends_on=[\"stg_orders\"])\ndef revenue_summary(ctx):\n    orders = ctx.ref(\"stg_orders\").to_pandas()\n    window = int(ctx.vars(\"lookback_days\", 30))\n    return (\n        orders[orders[\"order_date\"] \u003e= orders[\"order_date\"].max()\n               - pd.Timedelta(days=window)]\n        .groupby(\"country\")[\"amount\"]\n        .sum()\n        .reset_index()\n    )\n```\n\nThe function receives a `TransformContext`; `ctx.ref(name)` returns an\nArrow Table, `ctx.vars(key, default)` reads the same external params\nas SQL's `{{ var('key') }}`.\n\n## Doc map\n\n- [`docs/VISION.md`](docs/VISION.md) — what + why. Stable reference.\n- [`docs/TUTORIAL.md`](docs/TUTORIAL.md) — L1→L4 onboarding narrative.\n- [`docs/CONFIGURATION.md`](docs/CONFIGURATION.md) — `juncture.yaml`,\n  `.env`, `schema.yml`, seeds, macros, parallel EXECUTE.\n- [`docs/DESIGN.md`](docs/DESIGN.md) — architecture (Project, DAG,\n  Adapter, Executor, Testing, Seeds, Migration).\n- [`docs/ROADMAP.md`](docs/ROADMAP.md) — phased task list.\n\n## License\n\nApache 2.0. See [`LICENSE`](LICENSE).\n","funding_links":[],"categories":["🔧 Utilities \u0026 Miscellaneous"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpadak%2Fjuncture-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpadak%2Fjuncture-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpadak%2Fjuncture-engine/lists"}