{"id":35310129,"url":"https://github.com/twolffpiggott/databricks-cicd","last_synced_at":"2026-04-14T00:31:36.998Z","repository":{"id":233412017,"uuid":"623827736","full_name":"twolffpiggott/databricks-cicd","owner":"twolffpiggott","description":"Workflows for CICD on the Databricks platform","archived":false,"fork":false,"pushed_at":"2023-04-05T07:10:14.000Z","size":2,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-16T04:48:37.612Z","etag":null,"topics":["automation","cicd","databricks","github","github-actions","workflows"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/twolffpiggott.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-05T07:09:15.000Z","updated_at":"2024-04-16T04:48:39.066Z","dependencies_parsed_at":"2024-04-16T04:58:42.433Z","dependency_job_id":null,"html_url":"https://github.com/twolffpiggott/databricks-cicd","commit_stats":null,"previous_names":["twolffpiggott/databricks-cicd"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/twolffpiggott/databricks-cicd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twolffpiggott%2Fdatabricks-cicd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twolffpiggott%2Fdatabricks-cicd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twolffpiggott%2Fdatabricks-cicd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twolffpiggott%2Fdatabricks-cicd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/twolffpiggott","download_url":"https://codeload.github.com/twolffpiggott/databricks-cicd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twolffpiggott%2Fdatabricks-cicd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31776875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T00:11:49.126Z","status":"ssl_error","status_checked_at":"2026-04-14T00:10:29.837Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","cicd","databricks","github","github-actions","workflows"],"created_at":"2025-12-30T17:39:45.356Z","updated_at":"2026-04-14T00:31:36.980Z","avatar_url":"https://github.com/twolffpiggott.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Databricks CICD\n\n## Automating custom package installation on Databricks clusters\n\nThis GitHub Actions workflow is designed to automate the process of building, uploading, and installing a Python package on a Databricks cluster, as well as updating a Databricks repository. The high-level purpose of each step is:\n\n1. **Checkout**: Retrieve the source code from the repository.\n2. **Set up Python 3.9**: Prepare the runner environment with Python 3.9, and cache the dependencies to speed up future builds.\n3. **Build wheel**: Build the Python package as a wheel file (a distributable package format).\n4. **Install Databricks CLI**: Install the Databricks command-line interface (CLI) to interact with the Databricks environment.\n5. **Copy wheel to DBFS**: Upload the built wheel file to the Databricks File System (DBFS) under the `/libraries` folder.\n6. **Install wheel on cluster**: Install the uploaded wheel file on the specified Databricks cluster.\n7. **Update Databricks repo** (optional): Update the Databricks repository with the latest changes from the main branch.\n\nThe workflow is triggered by a push event or manually using the workflow_dispatch event.\n\n```mermaid\nflowchart TD\n    subgraph \"Databricks Environment\"\n        dbfs[(DBFS)]\n        cluster{{Compute Cluster}}\n    end\n\n    subgraph \"GitHub Environment\"\n        subgraph \"Runner\"\n            A[1. Checkout Repository] --\u003e B[2. Set up Python 3.9]\n            B --\u003e C[3. Build wheel]\n            C --\u003e D[4. Install Databricks CLI]\n            D --\u003e E[5. Copy wheel to DBFS]\n            E --\u003e F[6. Install wheel on cluster]\n            F --\u003e G[7. End Workflow]\n        end\n    end\n\n    E -.-\u003e dbfs\n    F -.-\u003e cluster\n```\n\n### Prerequisites\n\n1. Create a well-formed Python package ([for example](https://menziess.github.io/howto/create/python-packages/#1-packaging-setup)) in a repo on Github\n2. Generate a [Databricks personal access token (PAT)](https://docs.databricks.com/dev-tools/auth.html#pat); this is the `DATABRICKS_TOKEN` env var\n3. Identify your Databricks workspace URL; this is the `DATABRICKS_HOST` env var (`https://\u003cinstance-name\u003e.cloud.databricks.com`)\n4. Install the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) locally\n5. Identify the relevant Databricks cluster ID (where you want to install the package) using the CLI: `databricks clusters list`\n6. Identify (optional) the relevant Databricks repo ID (corresponding to the repo for which you are building the package) using the CLI: `databricks repos list`\n7. Create the following Github [repository secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets) your repo:\n   1. `DATABRICKS_HOST`\n   2. `DATABRICKS_TOKEN`\n   3. `DATABRICKS_REPO_ID` (optional; remove the \"Update databricks repo\" step if not relevant)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwolffpiggott%2Fdatabricks-cicd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwolffpiggott%2Fdatabricks-cicd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwolffpiggott%2Fdatabricks-cicd/lists"}