{"id":50335603,"url":"https://github.com/databricks-solutions/agentic-bdd-uc-functions","last_synced_at":"2026-05-29T13:30:27.008Z","repository":{"id":355557810,"uuid":"1226440957","full_name":"databricks-solutions/agentic-bdd-uc-functions","owner":"databricks-solutions","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-04T07:13:50.000Z","size":6247,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-04T08:30:24.033Z","etag":null,"topics":["bdd","behave","databricks","lakeflow","python","testing","unity-catalog"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks-solutions.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS.txt","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE.md","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-01T12:00:11.000Z","updated_at":"2026-05-04T07:13:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/databricks-solutions/agentic-bdd-uc-functions","commit_stats":null,"previous_names":["databricks-solutions/agentic-bdd-uc-functions"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/databricks-solutions/agentic-bdd-uc-functions","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fagentic-bdd-uc-functions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fagentic-bdd-uc-functions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fagentic-bdd-uc-functions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fagentic-bdd-uc-functions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks-solutions","download_url":"https://codeload.github.com/databricks-solutions/agentic-bdd-uc-functions/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks-solutions%2Fagentic-bdd-uc-functions/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33655440,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bdd","behave","databricks","lakeflow","python","testing","unity-catalog"],"created_at":"2026-05-29T13:30:24.882Z","updated_at":"2026-05-29T13:30:27.002Z","avatar_url":"https://github.com/databricks-solutions.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# agentic-bdd-uc-functions\n\nEnd-to-end example of BDD testing for Databricks Unity Catalog functions, using [Behave](https://behave.readthedocs.io/) and the [Statement Execution API](https://docs.databricks.com/api/workspace/statementexecution).\n\nTests run on a standard CI runner (GitHub Actions) — no Spark session, no local cluster, no Java. Each scenario is one HTTP call to a real UC function.\n\n\u003e **Status:** community-contributed reference implementation. Not officially supported by Databricks. Issues and PRs are welcome but response times are best-effort. Use at your own risk; review the code against your own security and compliance requirements before using in production.\n\n📖 **[Read the deep dive blog post →](docs/blog.md)** — the why behind this pattern, where it fits, and where it doesn't.\n\n## How it works\n\nA Unity Catalog SQL function is the single source of truth for the business rule. Both callers reference it without duplication:\n\n```\nUC function (sql/)\n    ├── BDD test suite (tests/bdd/)   — validates the contract via Statement Execution API\n    └── Lakeflow SDP pipeline (pipelines/)  — calls the same function in production\n```\n\nThe BDD suite gates pipeline promotion in CI. If a code change breaks a scenario, the pipeline never runs.\n\n## Prerequisites\n\n- Databricks workspace with Unity Catalog enabled\n- A running SQL warehouse (Serverless recommended)\n- Python 3.10+, [`uv`](https://docs.astral.sh/uv/), [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) (Go-based, v0.200+)\n\nVerify CLI auth before starting:\n\n```bash\ndatabricks current-user me\n```\n\n## Project structure\n\n```\ndatabricks-bdd/\n├── src/\n│   └── compliance_bdd/         # Python package — BDD utilities\n│       ├── spark_rules.py       # Statement Execution API wrapper\n│       └── fixtures.py          # Domain → UC function argument translators\n│\n├── pipelines/\n│   └── compliance_pipeline.py  # Lakeflow SDP source file (workspace artifact, not packaged)\n│\n├── sql/\n│   └── check_back_to_back_promo.sql  # UC function definition — the shared contract\n│\n├── scripts/\n│   └── deploy_function.py      # Deploys UC function to target catalog/schema\n│\n├── tests/\n│   └── bdd/\n│       ├── environment.py       # behave hooks (.env loading, env var validation)\n│       ├── features/\n│       │   └── *.feature        # Gherkin scenarios — human-readable rule contracts\n│       └── steps/\n│           └── *_steps.py       # Step definitions — thin wiring between Gherkin and call_rule()\n│\n├── resources/\n│   ├── pipeline.yml             # DABs pipeline resource (Lakeflow SDP)\n│   └── jobs.yml                 # DABs job resources\n│\n├── .github/\n│   └── workflows/\n│       └── bdd.yml              # CI: bundle deploy → BDD tests (gate) → pipeline run\n│\n├── databricks.yml               # Asset Bundle config with dev/staging/prod targets\n├── pyproject.toml               # Python project — packages src/ only\n├── behave.ini                   # behave configuration\n└── Makefile                     # Command interface\n```\n\n### Why `src/` layout?\n\nThe `src/compliance_bdd/` package is the only thing that gets built into a wheel and deployed as a library dependency. The `pipelines/` directory contains Lakeflow SDP source files that the pipeline runtime discovers and executes directly — they can't be installed as a package entry point. Keeping them separate makes the distinction explicit and prevents `setuptools` from accidentally packaging workspace artifacts.\n\nSee `pyproject.toml`:\n\n```toml\n[tool.setuptools.packages.find]\nwhere = [\"src\"]  # only package src/ — pipelines/ stays out of the wheel\n```\n\n## Quick start\n\n```bash\n# 1. Clone and install\ngit clone https://github.com/you/databricks-bdd\ncd databricks-bdd\ncp .env.example .env          # fill in DATABRICKS_WAREHOUSE_ID and target catalog/schema\nmake install\n\n# 2. Deploy the UC function\nmake setup\n\n# 3. Run the BDD suite\nmake test\n```\n\nExpected output on a warm warehouse:\n\n```\nFeature: Back-to-Back Promotion Compliance\n  Rule: Products must have a minimum 4-week gap between promotions\n\n.........  9 scenarios (9 passed)\nTook 0m14.2s\n```\n\n## Databricks Asset Bundle integration\n\nThe bundle has three targets. `BDD_CATALOG` and `BDD_SCHEMA` are driven by bundle variables — the test suite always points at the same catalog/schema as the deployed pipeline.\n\n| Target | Catalog | Schema |\n|--------|---------|--------|\n| `dev` | `dev` | `compliance_\u003cyour-username\u003e` |\n| `staging` | `staging` | `compliance_staging` |\n| `prod` | `main` | `compliance` |\n\n```bash\n# Validate the bundle config\nmake validate\n\n# Deploy to your personal dev schema\nmake deploy-dev\n\n# Deploy to staging (what CI does)\nmake deploy-staging\n```\n\n### CI/CD sequence\n\nOn push to `main`, the workflow enforces a strict gate sequence:\n\n```\nbundle deploy --target staging\n    ↓  deploys UC function + pipeline definition\n\nbehave (BDD gate)\n    ↓  calls real UC functions in staging catalog\n    ↓  green → proceed  |  red → stop\n\nbundle run compliance_pipeline --target staging\n    ↓  pipeline runs against validated functions\n```\n\nOn pull requests: deploy + BDD only (pipeline run is skipped).\n\n## GitHub Actions setup\n\nAdd these secrets to your repository (`Settings → Secrets and variables → Actions`):\n\n| Secret | Value |\n|--------|-------|\n| `DATABRICKS_HOST` | Your workspace URL, e.g. `https://adb-xxx.azuredatabricks.net` |\n| `DATABRICKS_TOKEN` | Service principal PAT (not a personal token) |\n| `DATABRICKS_WAREHOUSE_ID` | SQL warehouse ID from the Connection Details tab |\n\nGrant the service principal: `USE CATALOG`, `USE SCHEMA`, `EXECUTE` on the target schemas, and `CAN_USE` on the warehouse.\n\n## Adding a new rule\n\n1. Write the SQL function in `sql/` and deploy it with `make setup`\n2. Create `tests/bdd/features/\u003crule_name\u003e.feature` with Gherkin scenarios\n3. Add a fixture translator in `src/compliance_bdd/fixtures.py` if the function has a non-trivial argument shape\n4. Add step definitions in `tests/bdd/steps/\u003crule_name\u003e_steps.py`\n5. Run `make test`\n\nThe production pipeline calls the same UC function — add it to `pipelines/compliance_pipeline.py` and it's covered by the existing BDD gate.\n\n## Cost\n\nEach scenario is one warehouse query. The suite runs 9 scenarios — negligible at Serverless SQL pricing. A Scenario Outline with hundreds of rows adds up on a busy PR queue; consider batching via `VALUES` clauses if cost becomes a concern.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-solutions%2Fagentic-bdd-uc-functions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks-solutions%2Fagentic-bdd-uc-functions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks-solutions%2Fagentic-bdd-uc-functions/lists"}