{"id":18400640,"url":"https://github.com/databricks/upload-dbfs-temp","last_synced_at":"2025-04-07T06:33:35.850Z","repository":{"id":40661011,"uuid":"472384164","full_name":"databricks/upload-dbfs-temp","owner":"databricks","description":null,"archived":false,"fork":false,"pushed_at":"2023-06-23T23:31:42.000Z","size":220,"stargazers_count":12,"open_issues_count":0,"forks_count":5,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-03T00:59:00.713Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-21T14:52:56.000Z","updated_at":"2023-10-06T10:19:22.000Z","dependencies_parsed_at":"2023-01-06T15:45:14.318Z","dependency_job_id":null,"html_url":"https://github.com/databricks/upload-dbfs-temp","commit_stats":{"total_commits":14,"total_committers":2,"mean_commits":7.0,"dds":0.1428571428571429,"last_synced_commit":"eab762b274451d7524461da4468ac7eecd336167"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fupload-dbfs-temp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fupload-dbfs-temp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fupload-dbfs-temp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fupload-dbfs-temp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/upload-dbfs-temp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247607764,"owners_count":20965945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T02:35:37.532Z","updated_at":"2025-04-07T06:33:32.910Z","avatar_url":"https://github.com/databricks.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# upload-dbfs-temp v0\n\n# Overview\nGiven a file on the local filesystem, this Action uploads the file to a temporary path in \nDBFS (docs:\n[AWS](https://docs.databricks.com/data/databricks-file-system.html) |\n[Azure](https://docs.microsoft.com/en-us/azure/databricks/data/databricks-file-system) |\n[GCP](https://docs.gcp.databricks.com/data/databricks-file-system.html)), returns the\npath of the DBFS tempfile as an Action output, and cleans up the DBFS tempfile at the end of the current\nGitHub Workflow job.\n\nYou can use this Action in combination with [databricks/run-notebook](https://github.com/databricks/run-notebook) to \ntrigger code execution on Databricks for CI (e.g. on pull requests) or CD (e.g. on pushes to master).\n  \n# Prerequisites\nTo use this Action, you need a Databricks REST API token to upload your file to DBFS and delete it at the end of \nworkflow job execution. We recommend that you store the token in [GitHub Actions secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets)\nto pass it into your GitHub Workflow. The following section lists recommended approaches for token creation by cloud.\n\n## AWS\nFor security reasons, we recommend creating and using a Databricks service principal API token. You can\n[create a service principal](https://docs.databricks.com/dev-tools/api/latest/scim/scim-sp.html#create-service-principal),\ngrant the Service Principal\n[token usage permissions](https://docs.microsoft.com/en-us/azure/databricks/administration-guide/access-control/tokens#control-who-can-use-or-create-tokens),\nand [generate an API token](https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token) on its behalf.\n\n## Azure\nFor security reasons, we recommend using a Databricks service principal AAD token.\n\n### Create an Azure Service Principal\nYou can:\n* Install the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)\n* Run `az login` to authenticate with Azure\n* Run `az ad sp create-for-rbac -n \u003cyour-service-principal-name\u003e --sdk-auth --scopes /subscriptions/\u003cazure-subscription-id\u003e/resourceGroups/\u003cresource-group-name\u003e --sdk-auth --role contributor`,\n  specifying the subscription and resource group of your Azure Databricks workspace, to create a service principal and client secret.\n  Store the resulting JSON output as a GitHub Actions secret named e.g. `AZURE_CREDENTIALS`\n* Get the application id of your new service principal by running `az ad sp show --id \u003cclientId from previous command output\u003e`, using\n  the `clientId` field from the JSON output of the previous step.\n* [Add your service principal](https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/scim-sp#add-service-principal) to your workspace. Use the\n  `appId` output field of the previous step as the `applicationId` of the service principal in the `add-service-principal` payload.\n* **Note**: The generated Azure token has a default life span of **60 minutes**.\n  If you expect your Databricks notebook to take longer than 60 minutes to finish executing, then you must create a [token lifetime policy](https://docs.microsoft.com/en-us/azure/active-directory/develop/configure-token-lifetimes)\n  and attach it to your service principal.\n\n### Use the Service Principal in your GitHub Workflow\n* Add the following steps to the start of your GitHub workflow.\n  This will create a new AAD token and save its value in the `DATABRICKS_TOKEN`\n  environment variable for use in subsequent steps.\n\n  ```yaml\n  - name: Log into Azure\n    uses: Azure/login@v1\n    with:\n      creds: ${{ secrets.AZURE_CREDENTIALS }}\n  - name: Generate and save AAD token\n    id: generate-token\n    run: |\n      echo \"DATABRICKS_TOKEN=$(az account get-access-token \\\n      --resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \\\n      --query accessToken -o tsv)\" \u003e\u003e $GITHUB_ENV\n  ```\n\n## GCP\nFor security reasons, we recommend inviting a service user to your Databricks workspace and using their API token.\nYou can invite a [service user to your workspace](https://docs.gcp.databricks.com/administration-guide/users-groups/users.html#add-a-user),\nlog into the workspace as the service user, and [create a personal access token](https://docs.gcp.databricks.com/dev-tools/api/latest/authentication.html) \nto pass into your GitHub Workflow.\n\n# Usage\n\nSee [action.yml](action.yml) for the latest interface and docs.\n\n### Run a notebook using library dependencies in the current repo\nIn the workflow below, we build Python code in the current repo into a wheel, use ``upload-dbfs-temp`` to upload it to\na tempfile in DBFS, then use the [databricks/run-notebook](https://github.com/databricks/run-notebook) Action to run a\nnotebook that depends on the wheel.\n\n```yaml\nname: Upload Python Wheel to DBFS then run notebook using whl.\n\non:\n  pull_request\n\nenv:\n  DATABRICKS_HOST: https://adb-XXXX.XX.azuredatabricks.net\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checks out the repo\n        uses: actions/checkout@v2\n      # Obtain an AAD token and use it to upload to Databricks.\n      # Note: If running on AWS or GCP, you can directly pass your service principal\n      # token via the databricks-host input instead\n      - name: Log into Azure\n        uses: Azure/login@v1\n        with:\n          creds: ${{ secrets.AZURE_CREDENTIALS }}\n      # Get an AAD token for the service principal,\n      # and store it in the DATABRICKS_TOKEN environment variable for use in subsequent steps.\n      # We set the `resource` parameter to the programmatic ID for Azure Databricks.\n      # See https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token#--get-an-azure-ad-access-token for details.\n      - name: Generate and save AAD token\n        id: generate-token\n        run: |\n          echo \"DATABRICKS_TOKEN=$(az account get-access-token \\\n          --resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \\\n          --query accessToken -o tsv)\" \u003e\u003e $GITHUB_ENV\n      - name: Setup python\n        uses: actions/setup-python@v2\n      - name: Build wheel\n        run:\n          python setup.py bdist_wheel\n      - name: Upload Wheel\n        uses: databricks/upload-dbfs-temp@v0\n        id: upload_wheel\n        with:\n          local-path: dist/my-project.whl\n      - name: Trigger model training notebook from PR branch\n        uses: databricks/run-notebook@v0\n        with:\n          local-notebook-path: notebooks/deployments/MainNotebook\n          # Install the wheel built in the previous step as a library\n          # on the cluster used to run our notebook\n          libraries-json: \u003e\n            [\n              { \"whl\": \"${{ steps.upload_wheel.outputs.dbfs-file-path }}\" }\n            ]\n          # The cluster JSON below is for Azure Databricks. On AWS and GCP, set\n          # node_type_id to an appropriate node type, e.g. \"i3.xlarge\" for\n          # AWS or \"n1-highmem-4\" for GCP\n          new-cluster-json: \u003e\n            {\n              \"num_workers\": 1,\n              \"spark_version\": \"10.4.x-scala2.12\",\n              \"node_type_id\": \"Standard_D3_v2\"\n            }\n          # Grant all users view permission on the notebook results, so that they can\n          # see the result of our CI notebook\n          access-control-list-json: \u003e\n            [\n              {\n                \"group_name\": \"users\",\n                \"permission_level\": \"CAN_VIEW\"\n              }\n            ]\n```\n\n# License\n\nThe scripts and documentation in this project are released under the [Apache License, Version 2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fupload-dbfs-temp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fupload-dbfs-temp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fupload-dbfs-temp/lists"}