{"id":30749195,"url":"https://github.com/vim89/flowforge","last_synced_at":"2026-04-14T00:02:33.589Z","repository":{"id":310606578,"uuid":"1040213026","full_name":"vim89/flowforge","owner":"vim89","description":"Let's be honest - most data pipeline frameworks treat types as suggestions. Config files are strings. Schemas are \"validated\" at runtime. Data quality is an afterthought. So, let's do differently","archived":false,"fork":false,"pushed_at":"2025-08-19T05:59:05.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-19T07:18:04.286Z","etag":null,"topics":["archetype","data","data-contracts","data-engineering","data-pipelines","data-quality","data-science","database","dataengineering","datapipeline","etl","etl-framework","pipelines","scala","scalability","spark","spark-sql","spark-streaming"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vim89.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-18T16:15:54.000Z","updated_at":"2025-08-19T05:59:08.000Z","dependencies_parsed_at":"2025-08-19T07:18:36.421Z","dependency_job_id":"2b52a2ca-0107-415e-96cb-f0a64dc9450c","html_url":"https://github.com/vim89/flowforge","commit_stats":null,"previous_names":["vim89/flowforge"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vim89/flowforge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fflowforge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fflowforge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fflowforge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fflowforge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vim89","download_url":"https://codeload.github.com/vim89/flowforge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vim89%2Fflowforge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273561322,"owners_count":25127396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archetype","data","data-contracts","data-engineering","data-pipelines","data-quality","data-science","database","dataengineering","datapipeline","etl","etl-framework","pipelines","scala","scalability","spark","spark-sql","spark-streaming"],"created_at":"2025-09-04T06:02:58.571Z","updated_at":"2026-04-14T00:02:33.584Z","avatar_url":"https://github.com/vim89.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# flowforge - Type‑safe-first Data Engineering\n\n\u003c!-- CI/CD Status --\u003e\n![Build](https://img.shields.io/github/actions/workflow/status/vim89/flowforge/ci.yml?branch=main\u0026label=CI\u0026logo=github)\n[![Nightly](https://img.shields.io/github/actions/workflow/status/vim89/flowforge/nightly.yml?branch=main\u0026label=nightly\u0026logo=github)](https://github.com/vim89/flowforge/actions/workflows/nightly.yml)\n[![Security](https://img.shields.io/github/actions/workflow/status/vim89/flowforge/security.yml?branch=main\u0026label=security\u0026logo=github)](https://github.com/vim89/flowforge/actions/workflows/security.yml)\n[![Docs Lint](https://img.shields.io/github/actions/workflow/status/vim89/flowforge/docs-lint.yml?branch=main\u0026label=docs\u0026logo=github)](https://github.com/vim89/flowforge/actions/workflows/docs-lint.yml)\n\n\u003c!-- Code Quality --\u003e\n[![codecov](https://codecov.io/gh/vim89/flowforge/graph/badge.svg)](https://codecov.io/gh/vim89/flowforge)\n[![Core](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=core\u0026label=core\u0026logo=codecov)](https://app.codecov.io/gh/vim89/flowforge/flags/core)\n[![Contracts](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=contracts\u0026label=contracts\u0026logo=codecov)](https://app.codecov.io/gh/vim89/flowforge/flags/contracts)\n[![Connectors](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=connectors\u0026label=connectors\u0026logo=codecov)](https://app.codecov.io/gh/vim89/flowforge/flags/connectors)\n[![Infrastructure](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=infrastructure\u0026label=infrastructure\u0026logo=codecov)](https://app.codecov.io/gh/vim89/flowforge/flags/infrastructure)\n\n\u003c!-- Release \u0026 Distribution --\u003e\n![Release](https://img.shields.io/github/v/release/vim89/flowforge?include_prereleases\u0026label=release\u0026logo=github)\n[![Maven Central](https://img.shields.io/maven-central/v/com.flowforge/core_2.13?label=maven)](https://search.maven.org/search?q=g:com.flowforge)\n[![Docker](https://img.shields.io/badge/docker-ghcr.io-blue?logo=docker)](https://github.com/vim89/flowforge/pkgs/container/flowforge)\n\n\u003c!-- Documentation --\u003e\n[![Scaladoc](https://img.shields.io/badge/api-Scaladoc-informational?logo=scala)](https://vim89.github.io/flowforge/api/)\n[![Changelog](https://img.shields.io/badge/changelog-Keep%20a%20Changelog-blue)](CHANGELOG.md)\n[![Docs](https://img.shields.io/badge/docs-start--here-blue)](docs/start-here.md)\n\n\u003c!-- Tech Stack --\u003e\n![Scala](https://img.shields.io/badge/Scala-2.13-red?logo=scala)\n![sbt](https://img.shields.io/badge/sbt-1.9%2B-blue)\n![JDK](https://img.shields.io/badge/JDK-17%2B-orange?logo=openjdk)\n![License](https://img.shields.io/github/license/vim89/flowforge)\n\n\u003e Build pipelines that won’t even compile when contracts drift. Keep transformations pure, put effects at the edges, and run on Spark and Flink.\n\n## Why (beliefs)\n\n- Runtime schema drift burns weekends. We believe failures should move left - into the compiler.\n- Side‑effects inside transforms amplify retries/speculation. We believe effects belong at the edges and must be idempotent.\n- Engineers deserve fast, local feedback. We believe pure transformations and compile‑fail tests make data engineering joyful again.\n\n### _A story:_ \n\"A partner team removed a nullable column late Friday. We couldn’t roll back in time; both teams were up all night. If that change had been a compile error, we would have slept.\"\n\n### For Python/ETL folks (dbt/Airflow/Informatica/Talend): \nThink \"contracts like Pydantic/Avro - but enforced before jobs run,\" \"pure functions you can test without a cluster,\" and \"connectors/engines that make IO explicit and safe.\"\n\n### For EMs / Staff Data Architects: \nYou get compile‑time guarantees (not CI or runtime heuristics), a small opinionated surface, and batteries‑included defaults with escape hatches.\n\n## How (principles)\n\n- **Compile‑time contracts:** `SchemaConforms[Out, Contract, Policy]` proves compatibility; policies include **Exact**, **Backward**, **Forward** (+ Ordered/CI/ByPosition). See [docs/how-it-fails.md](docs/how-it-fails.md).\n- **Typestate builder:** `build()` exists only when source, transforms, and sink are present. Incomplete pipelines are unbuildable.\n- **Pure vs effect boundary:** transforms are pure functions; `F[_]` only at IO edges; engines plug into a single algebra.\n- **Pictures over prose:** see [flowchart.svg](docs/diagrams/compile-time-contracts/flowchart.svg) and [optionality.md](docs/diagrams/compile-time-contracts/optionality.md).\n\n## What (The framework)\n\n- Core: contracts, builder, EffectSystem, DataAlgebra.\n- Engines: Spark (primary 1.0), Flink (2.12 only).\n- Connectors: filesystem, JDBC, GCS (more coming).\n- Data Quality: native checks by default; optional Deequ when present.\n- Template: flowforge.g8 for new projects.\n\n## Diagrams (pictures \u003e words)\n\n![Compile‑time contracts flow](docs/diagrams/compile-time-contracts/flowchart.svg)\n\n![Field vs Element Optionality](docs/diagrams/compile-time-contracts/optionality.svg)\n\n![Scala 2 Magnolia UML](docs/diagrams/compile-time-contracts/scala2-uml.svg)\n\n![Scala 3 Mirrors UML](docs/diagrams/compile-time-contracts/scala3-uml.svg)\n\n## Quick links\n- Getting started quick: [docs/getting-started.md](docs/getting-started.md)\n- Full start: [docs/getting-started.md](docs/getting-started.md)\n- Public API: [docs/public-api.md](docs/public-api.md)\n- How it fails (error anatomy): [docs/how-it-fails.md](docs/how-it-fails.md)\n- Framework behaviors (non‑negotiables): [docs/design/framework-behaviors.md](docs/design/framework-behaviors.md)\n- Cut a release: [docs/release/how-to-cut-a-release.md](docs/release/how-to-cut-a-release.md)\n\n### Module status (coverage)\n\n\u003eNightly runs provide broader integration coverage.\n\n- Core: [![Core Coverage](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=core\u0026label=core)](https://app.codecov.io/gh/vim89/flowforge/flags/core)\n- Contracts: [![Contracts Coverage](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=contracts\u0026label=contracts)](https://app.codecov.io/gh/vim89/flowforge/flags/contracts)\n- Connectors: [![Connectors Coverage](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=connectors\u0026label=connectors)](https://app.codecov.io/gh/vim89/flowforge/flags/connectors)\n- Infrastructure: [![Infrastructure Coverage](https://img.shields.io/codecov/c/github/vim89/flowforge?flag=infrastructure\u0026label=infrastructure)](https://app.codecov.io/gh/vim89/flowforge/flags/infrastructure)\n\n## Guarantees (Non‑negotiables)\n\n- Compile‑fail contracts for typed endpoints under policy lattice\n- Typestate builder: `build()` only when complete - incomplete pipelines can’t compile\n- Pure transforms; effectful edges; idempotent side‑effects by design\n- See: docs/design/framework-behaviors.md\n\n## 10‑Minute quickstart\n\nPrereq: JDK 17+, sbt 1.9+\n\n**1) Clone \u0026 build**\n```bash\ngit clone https://github.com/vim89/flowforge.git \u0026\u0026 cd flowforge\nsbt compile\n```\n\n**2) See a compile‑time contract failure (red → green)**\n```scala\n// Paste in REPL or a scratch test to feel it\nimport com.flowforge.core.contracts._\nfinal case class Out(id: Long)\nfinal case class Contract(id: Long, email: String)\nimplicitly[SchemaConforms[Out, Contract, SchemaPolicy.Exact]] // ❌ compile‑time error (missing email)\n```\nRelax the policy to Backward (allows extra producer fields and missing optional/defaults):\n```scala\nimplicitly[SchemaConforms[Out, Contract, SchemaPolicy.Backward]] // ✅\n```\n\n**3) Build a pipeline - typestate forbids incomplete builds**\n```scala\nimport cats.effect.IO\nimport com.flowforge.core.PipelineBuilder\nimport com.flowforge.core.types._\nimport com.flowforge.core.contracts._\n\nfinal case class User(id: Long, email: String)\nval src  = TypedSource[User](LocalDataSource(\"/tmp/in\", DataFormat.Parquet))\nval sink = TypedSink[User](LocalDataSink(\"/tmp/out\", DataFormat.Parquet))\n\nPipelineBuilder[IO](\"demo\")\n  .addTypedSource[User, User, SchemaPolicy.Exact](src, _ =\u003e IO.pure(User(1, \"a@b\")))\n  .noTransform\n  .addTypedSink[User, SchemaPolicy.Exact](sink, (_, _) =\u003e IO.unit)\n  .build() // ✅ build is available only now\n```\n\n**4) Explore diagrams and failure messages**\n- Diagrams: [flowchart.svg](docs/diagrams/compile-time-contracts/flowchart.svg), [optionality.md](docs/diagrams/compile-time-contracts/optionality.md)\n- Failure anatomy: [docs/how-it-fails.md](docs/how-it-fails.md)\n\n### Quickstart paths\n\n| Path | Goal | Commands |\n|------|------|----------|\n| A - Examples | Try locally (no cluster) | `sbt ffDev` (compile + focused tests), `sbt ffRunSpark` (Spark local[*]) |\n| B - Red→Green | See compile‑time error then fix | Use the snippet above; run `sbt compile` |\n| C - New project | Scaffold with g8 | `sbt new flowforge.g8 --name=\"ff-demo\" --organization=\"com.acme\"` then `sbt test` / `sbt run` |\n\n## Compatibility\n\n| Component | Version | Notes |\n|-----------|---------|-------|\n| JDK | 17+ | CI pinned to 17; Spark 3.5.x compatibility |\n| sbt | 1.9+ |  |\n| Scala | 2.13 (primary) | Scala 3 for core only (no Spark deps) |\n| Spark | 3.5.x | Runs on Java 17 |\n| Flink | Scala 2.12 only | Scala API constraints |\n\n### Flink (2.12)\n\nFlink’s Scala API is 2.12‑only. The root build excludes Flink from the default aggregate so that `+compile`, `+test:compile`, and `+test` stay green for 2.13 (and Scala 3 where applicable). Build/test Flink explicitly when you need it:\n\n```\n# Compile Flink (Scala 2.12)\nsbt \"++2.12.* enginesFlink/compile\"\n\n# Run Flink tests (Scala 2.12)\nsbt \"++2.12.* enginesFlink/test\"\n```\n\nReferences: Flink documents binary incompatibility across Scala lines and the need to select the matching `_2.12` artifacts for the Scala API. See Flink’s docs on Scala versions and sbt cross‑build guidance. \n\n## Architecture (at a glance)\n\nThe diagrams above summarize derivation and policy checks; see also [docs/diagrams/compile-time-contracts/guide.md](docs/diagrams/compile-time-contracts/guide.md) for narrative.\n\n## Examples \u0026 demos\n\n- Examples module: [modules/examples](modules/examples) (runnable demos)\n- Optional Deequ mode: add `-Dff.quality.mode=deequ` (auto‑enables when on classpath)\n\n## Documentation map\n\n- Start here: [docs/start-here.md](docs/start-here.md); quick: [docs/getting-started.md](docs/getting-started.md)\n- Why compile‑time: [docs/why-compile-time.md](docs/why-compile-time.md)\n- How it fails: [docs/how-it-fails.md](docs/how-it-fails.md)\n- Public API: [docs/public-api.md](docs/public-api.md)\n- ADR index: [docs/adr/INDEX.md](docs/adr/INDEX.md)\n- Evidence: [docs/evidence](docs/evidence) (e.g., [scala3-alignment.md](docs/evidence/scala3-alignment.md))\n- Plan \u0026 Readiness: [docs/plan/v1.0-readiness.md](docs/plan/v1.0-readiness.md), [docs/quality/release-criteria.md](docs/plan/release-criteria.md)\n- Talks: [docs/talks](docs/talks) (WHY→HOW→WHAT outline)\n\n## Release \u0026 versioning\n\n- CHANGELOG: [CHANGELOG.md](CHANGELOG.md)\n- Security: [SECURITY.md](SECURITY.md)\n- v1.0 Plan/Readiness: [docs/plan/v1.0-readiness.md](docs/plan/v1.0-readiness.md), [docs/quality/release-criteria.md](docs/plan/release-criteria.md)\n\n## FAQ\n\n- Scala 3?\n  - Core compiles on Scala 3; engines depend on Spark/Flink ecosystem (Spark 3.x limits Scala 3 today).\n- Why compile‑time vs tests?\n  - Tests are sampled; compile‑time proofs are exhaustive for shapes and policy compatibility.\n- How does this compare to Databricks DLT/Dagster/dbt?\n  - They perform runtime/CI checks; FlowForge enforces compile‑time gates and typestate builder. See docs/evidence for deeper comparisons.\n\n## Contributing\n\nWe welcome folks from Python/ETL backgrounds and JVM veterans alike. Start with [docs/contributing/HANDBOOK.md](docs/contributing/HANDBOOK.md), then pick an issue. Please run `sbt scalafmtAll` and `sbt \"scalafixAll\"` before submitting.\n\n## License\n\n\n[Apache 2.0](LICENSE)\n\n---\n### Flowforge Hybrid Licensing Model\n\nFlowforge adopts a hybrid licensing structure combining open innovation and IP protection.\n\n- **Legacy / historical releases** remain under MIT (for transparency and ecosystem continuity).\n- **Active and future releases** (v1.0 and onward) are licensed under **AGPLv3** with additional Flowforge terms (“RESTRICTED COMMERCIAL \u0026 DERIVATIVE TERMS FOR FLOWFORGE” in `LICENSE`).\n- **Commercial usage** (offering as SaaS, embedding in proprietary systems, or internal closed-source deployments) requires a separate **commercial license**. See `COMMERCIAL_LICENSE.md` for template.\n- **Contributor License Agreement (CLA)** in `CLA.md` governs contribution terms, ensuring compatibility with the hybrid licensing framework.\n- **Commercial exceptions** and **dual-licensing** are handled directly by Vitthal Mirji for partners and enterprise use.\n\nThe goal: protect Flowforge’s compile-time innovation while keeping community use free and open.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvim89%2Fflowforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvim89%2Fflowforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvim89%2Fflowforge/lists"}