{"id":18400709,"url":"https://github.com/databricks/sbt-databricks","last_synced_at":"2025-04-07T06:33:46.029Z","repository":{"id":30462260,"uuid":"34016145","full_name":"databricks/sbt-databricks","owner":"databricks","description":"An sbt plugin for deploying code to Databricks Cloud","archived":false,"fork":false,"pushed_at":"2018-07-08T21:07:07.000Z","size":135,"stargazers_count":71,"open_issues_count":14,"forks_count":27,"subscribers_count":353,"default_branch":"master","last_synced_at":"2025-04-03T00:59:00.041Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://go.databricks.com/register-for-dbc","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-15T20:12:14.000Z","updated_at":"2023-07-18T04:13:38.000Z","dependencies_parsed_at":"2022-09-08T11:01:39.644Z","dependency_job_id":null,"html_url":"https://github.com/databricks/sbt-databricks","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fsbt-databricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fsbt-databricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fsbt-databricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fsbt-databricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/sbt-databricks/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247607783,"owners_count":20965945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T02:36:13.745Z","updated_at":"2025-04-07T06:33:41.017Z","avatar_url":"https://github.com/databricks.png","language":"Scala","readme":"sbt-databricks [![Build Status](https://travis-ci.org/databricks/sbt-databricks.svg)](http://travis-ci.org/databricks/sbt-databricks)\n--------------\n\nsbt plugin to deploy your projects to Databricks!\n\nhttp://go.databricks.com/register-for-dbc\n\nRequirements\n============\n1. An Account on Databricks: [Sign up for a free trial.](https://accounts.cloud.databricks.com/registration.html#signup)\n\nInstallation\n============\n\nJust add the following line to `project/plugins.sbt`:\n\n```\naddSbtPlugin(\"com.databricks\" %% \"sbt-databricks\" % \"0.1.5\")\n```\n\n*If you are running Databricks version 2.18 or greater you must use sbt-databricks version 0.1.5*\n\n*If you are running Databricks version 2.8 or greater you must use sbt-databricks version 0.1.3*\n\n#### Enable sbt-databricks for all your projects\n\n`sbt-databricks` can be enabled as a [global plugin](http://www.scala-sbt.org/0.13/tutorial/Using-Plugins.html#Global+plugins)\n for use in all of your projects in two easy steps:\n\n1. Add the following line to `~/.sbt/0.13/plugins/build.sbt`:\n\n    ```\n    addSbtPlugin(\"com.databricks\" %% \"sbt-databricks\" % \"0.1.5\")\n    ```\n\n2. Set the settings defined [here](#settings) in `~/.sbt/0.13/databricks.sbt`. You'll have to add the line\n\n    ```\n    import sbtdatabricks.DatabricksPlugin.autoImport._\n    ```\n\n    to that file in order to import this plugin's settings into that configuration file.\n\nUsage\n=====\n\n### Cluster Controls\n\nThere are three primary cluster related actions: Create, Resize and Delete.\n\nCreating a cluster\n```scala\ndbcCreateCluster // Attempts to create a cluster on DBC\n// The following parameters must be set when attempting to create a cluster\ndbcNumWorkerContainers := // Integer: The desired size of the cluster (in worker containers). \ndbcSpotInstance := // Boolean for choosing whether to use Spot or On-Demand instances\ndbcSparkVersion := // String: The Spark version to be used e.g. \"1.6.x\"\ndbcZoneId := // String: AWS zone e.g. ap-southeast-2\ndbcClusters := // See notes below regarding this parameter\n```\n\nResizing a cluster\n```scala\ndbcResizeCluster // Attempts to resize a cluster on DBC\n// The following parameters must be set when attempting to resize a cluster\ndbcNumWorkerContainers := // Integer: The desired size of the cluster (in worker containers). \ndbcClusters := // See notes below regarding this parameter\n```\n\nDeleting a cluster\n```scala\ndbcDeleteCluster // Attempts to delete a cluster on DBC\n// The following parameters must be set when attempting to resize a cluster\ndbcClusters := // See notes below regarding this parameter\n```\n\n### Deployment\n\n\nThere are four major commands that can be used. Please check the next section for mandatory\nsettings before running these commands.:\n - `dbcDeploy`: Uploads your Library to Databricks Cloud, attaches it to specified clusters,\n  and restarts the clusters if a previous version of the library was attached. This method\n  encapsulates the following commands. Only libraries with `SNAPSHOT` versions will be deleted\n  and re-uploaded as it is assumed that dependencies will not change very frequently. If you\n  change the version of one of your dependencies, that dependency must be deleted manually in\n  Databricks Cloud to prevent unexpected behavior.\n - `dbcUpload`: Uploads your Library to Databricks Cloud. Deletes the older version.\n - `dbcAttach`: Attaches your Library to the specified clusters.\n - `dbcRestartClusters`: Restarts the specified clusters.\n\n### Command Execution\n\n```scala\n`dbcExecuteCommand` // Runs a command on a specified DBC Cluster\n// The context/command language that will be employed when dbcExecuteCommand is called\ndbcExecutionLanguage := // One of DBCScala, DBCPython, DBCSQL\n// The file containing the code that is to be processed on the DBC cluster\ndbcCommandFile := // Type File\n```\n\nAn example, using just an sbt invocation is below\n```\n$ sbt\n\u003e set dbcClusters := Seq(\"CLUSTER_NAME\")\n\u003e set dbcCommandFile := new File(\"/Path/to/file.py\")\n\u003e set dbcExecutionLanguage := DBCPython\n\u003e dbcExecuteCommand\n```\n\n### Other\n\nOther helpful commands are:\n - `dbcListClusters`: View the states of available clusters.\n\n### \u003ca name=\"settings\"\u003eSettings\u003c/a\u003e\n\nThere are a few configuration settings that need to be made in the build file.\nPlease set the following parameters according to your setup:\n\n```scala\n// Your username to login to Databricks Cloud\ndbcUsername := // e.g. \"admin\"\n\n// Your password (Can be set as an environment variable)\ndbcPassword := // e.g. \"admin\" or System.getenv(\"DBCLOUD_PASSWORD\")\n\n// The URL to the Databricks Cloud DB Api.!\n// Note: this plugin currently does not support the /api/2.0 endpoint, so values using that\n// endpoint will be automatically rewritten to use /api/1.2.\ndbcApiUrl := // https://organization.cloud.databricks.com/api/1.2\n\n// Add any clusters that you would like to deploy your work to. e.g. \"My Cluster\"\n// or run dbcExecuteCommand\ndbcClusters += // Add \"ALL_CLUSTERS\" if you want to attach your work to all clusters\n```\n\nWhen using dbcDeploy, if you wish to upload an assembly jar instead of every library by itself,\nyou may override dbcClasspath as follows:\n\n```scala\ndbcClasspath := Seq(assembly.value)\n```\n\nOther optional parameters are:\n```\n// The location to upload your libraries to in the workspace e.g. \"/Users/alice\"\ndbcLibraryPath := // Default is \"/\"\n\n// Whether to restart the clusters every time a new version is uploaded to Databricks Cloud\ndbcRestartOnAttach := // Default true\n```\n\n### SBT Tips and Tricks (FAQ)\n\nHere are some SBT tips and tricks to improve your experience with sbt-databricks.\n\n1. I have a multi-project build. I don't want to upload the entire project to Databricks Cloud.\nWhat should I do?\n\n    In a multi-project build, you may run an SBT task (such as dbcDeploy, dbcUpload, etc...) just for\n    that project by [*scoping*](http://www.scala-sbt.org/0.13/docs/Tasks.html#Task+Scope) the task.\n    You may *scope* the task by using the project id before that task.\n\n    For example, assume we have a project with sub-projects `core`, `ml`, and `sql`. Assume `ml` depends\n    on `core` and `sql`, `sql` only depends on `core` and `core` doesn't depend on anything. Here is\n    what would happen for the following commands:\n\n    ```scala\n    \u003e dbcUpload          // Uploads core, ml, and sql\n    \u003e core/dbcUpload     // Uploads only core\n    \u003e sql/dbcUpload      // Uploads core and sql\n    \u003e ml/dbcUpload       // Uploads core, ml, and sql\n    ```\n\n2. I want to pass parameters to `dbcDeploy`. For example, in my build file `dbcClusters` is set as\n`clusterA` but I want to deploy to `clusterB` once in a while. What should I do?\n\n    In the SBT console, one way of overriding settings for your session is by using the `set` command.\n    Using the example above.\n\n    ```scala\n    \u003e core/dbcDeploy   // Deploys core to clusterA (clusterA was set inside the build file)\n    \u003e set dbcClusters := Seq(\"clusterB\")  // change cluster to clusterB\n    \u003e ml/dbcDeploy     // Deploys core, sql, and ml to clusterB\n    ```\n\n3. I want to upload an assembly jar rather than tens of individual jars. How can I do that?\n\n    You may override `dbcClasspath` such as:\n\n    ```scala\n    dbcClasspath := Seq(assembly.value)\n    ```\n\n    ... in your build file, (or using set on the console) in order to upload a single fat jar instead\n    of many individual ones. Beware of dependency conflicts\\!\n\n4. Hey, I followed \\#3, but I'm still uploading `core`, and `sql` individually after`sql/dbcUpload`.\n What's going on\\!?\n\n    Remember scoping tasks? You will need to scope both `dbcClasspath` and `assembly` as follows:\n\n    ```scala\n    dbcClasspath in sql := Seq((assembly in sql).value)\n    ```\n\n    Then `sql/dbcUpload` should upload an assembly jar of `core` and `sql`.\n\nTests\n=====\n\nRun tests using:\n```\ndev/run-tests\n```\n\nIf the very last line starts with `[success]`, then that means that the tests have passed.\n\nRun scalastyle checks using:\n```\ndev/lint\n```\n\nContributing\n============\n\nIf you encounter bugs or want to contribute, feel free to submit an issue or pull request.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fsbt-databricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fsbt-databricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fsbt-databricks/lists"}