{"id":17175677,"url":"https://github.com/saurfang/sbt-spark-submit","last_synced_at":"2026-03-11T13:20:03.202Z","repository":{"id":33864346,"uuid":"37572030","full_name":"saurfang/sbt-spark-submit","owner":"saurfang","description":"sbt plugin for spark-submit","archived":false,"fork":false,"pushed_at":"2017-11-02T00:04:08.000Z","size":39,"stargazers_count":96,"open_issues_count":5,"forks_count":29,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-13T16:49:51.009Z","etag":null,"topics":["sbt","spark"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saurfang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-17T04:13:59.000Z","updated_at":"2023-09-08T16:58:46.000Z","dependencies_parsed_at":"2022-09-03T01:07:12.305Z","dependency_job_id":null,"html_url":"https://github.com/saurfang/sbt-spark-submit","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/saurfang/sbt-spark-submit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saurfang%2Fsbt-spark-submit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saurfang%2Fsbt-spark-submit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saurfang%2Fsbt-spark-submit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saurfang%2Fsbt-spark-submit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saurfang","download_url":"https://codeload.github.com/saurfang/sbt-spark-submit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saurfang%2Fsbt-spark-submit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30382670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T12:49:11.341Z","status":"ssl_error","status_checked_at":"2026-03-11T12:46:41.342Z","response_time":84,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sbt","spark"],"created_at":"2024-10-14T23:57:19.990Z","updated_at":"2026-03-11T13:20:03.182Z","avatar_url":"https://github.com/saurfang.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sbt-spark-submit\n\n[![Join the chat at https://gitter.im/saurfang/sbt-spark-submit](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/saurfang/sbt-spark-submit?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n[![Build Status](https://travis-ci.org/saurfang/sbt-spark-submit.svg?branch=master)](https://travis-ci.org/saurfang/sbt-spark-submit)\n\nThis sbt plugin provides customizable sbt tasks to fire Spark jobs against local or remote Spark clusters.\nIt allows you submit Spark applications without leaving your favorite development environment.\nThe reactive nature of sbt makes it possible to integrate this with your Spark clusters whether it is a standalone\ncluster, [YARN cluster](examples/sbt-assembly-on-yarn), [clusters run on EC2](examples/sbt-assembly-on-ec2) and etc.\n\n## Motivation\n\nAs an awesome Scala developer, your Spark development experience is probably as follows:\n```bash\n# create assembly jar upon code change\nsbt assembly\n# coffee break as Scala builds\n# transfer the jar to a cluster co-located host\nscp target/scala-2.10/myproject-version-assembly.jar sparkcluster:myworkspace\n# ssh into that launcher host\nssh sparkcluster\ncd myworkspace\n# fire spark-submit\n$SPARK_HOME/bin/spark-submit --class not.memorable.package.applicaiton.class --master yarn --num-executor 10 \\\n  --conf some.crazy.config=xyz --executor-memory=lotsG \\\n  myproject-version-assembly.jar \\\n  \u003cglorious-application-arguments...\u003e\n```\nBut it doesn't have to be that hard. With this plugin you can reduce above steps into:\n```bash\nsbt \"sparkSubmitMyClass \u003cadditional custom app arguments...\u003e\"\n```\n\n## Feature\n\nThis AutoPlugin automatically adds a `sparkSubmit` task to every project in your build, the usage is as follows:\n```shell\nsbt \"sparkSubmit \u003cspark arguments\u003e -- \u003capplication arguments\u003e\"\n```\nFor example\n```shell\nsbt \"sparkSubmit --class SparkPi --\"\nsbt \"sparkSubmit --class SparkPi -- 10\"\nsbt \"sparkSubmit --master local[2] --class SparkPi --\"\n```\n\nYou can also define specialized SparkSubmit task, we recommend create a `project/SparkSubmit.scala`:\n```scala\nimport sbtsparksubmit.SparkSubmitPlugin.autoImport._\n\nobject SparkSubmit {\n  lazy val settings =\n    SparkSubmitSetting(\"sparkPi\",\n      Seq(\"--class\", \"SparkPi\")\n    )\n}\n```\nThen in the `build.sbt`, import the settings by:\n```scala\nSparkSubmit.settings\n```\nWith that you just gained a new sbt task called `sparkPi` which you can run by `sbt sparkPi`. \nThe task automatically recompiles and repackages the JAR as needed. It starts the SparkPi example in local\nmode. You can change the default Spark master by specifying `--master` as you would with *spark-submit*.\nYou can embed default Spark and/or Application arguments in the sbt task to cover you most common\nuse cases. Please see [below](#define-custom-sparksubmit-task) for more details for custom spark-submit task.\n\n\n## Setup\n\nFor sbt 0.13.6+ add sbt-spark-submit to your `project/plugins.sbt` or `~/.sbt/0.13/plugins/plugins.sbt` file:\n\n```scala\naddSbtPlugin(\"com.github.saurfang\" % \"sbt-spark-submit\" % \"0.0.4\")\n```\n\nNaturally you will need to have spark dependency in your project itself such as:\n\n```scala\nlibraryDependencies += \"org.apache.spark\" %% \"spark-core\" % \"1.4.0\" % \"provided\"\n```\n\n`\"provided\"` is recommended as Spark is pretty huge and you don't need to include in your fat jar during deployment.\n\n### YARN\n\nIf you are running on YARN, you also need to add [spark-yarn](http://mvnrepository.com/artifact/org.apache.spark/spark-yarn_2.10).\nFor example:\n\n```scala\nlibraryDependencies += \"org.apache.spark\" %% \"spark-yarn\" % \"1.4.0\" % \"provided\"\n```\n\nIf you are submitting cross platform (e.g. from Windows to Linux), you need Hadoop 2.4+ which support platform\nneutral classpath separator. In those cases, you might need to exclude Hadoop dependencies from Spark first. For example:\n```scala\nlibraryDependencies ++= Seq(\n  \"org.apache.spark\" %% \"spark-yarn\" % \"1.4.0\" % \"provided\" excludeAll ExclusionRule(organization = \"org.apache.hadoop\"),\n  \"org.apache.hadoop\" % \"hadoop-client\" % \"2.4.0\" % \"provided\",\n  \"org.apache.hadoop\" % \"hadoop-yarn-client\" % \"2.4.0\" % \"provided\"\n)\n```\n\nFinally you should use \n```scala\nenablePlugins(SparkSubmitYARN)\n```\nto enable default YARN settings. This defaults the master to `yarn-cluster` whenever appropriate and append \n`HADOOP_CONF_DIR/YARN_CONF_DIR` to launcher classpath so YARN resource manager can be correctly determined. \nSee below for more details.\n\n## Define Custom SparkSubmit Tasks\n\nTo create multiple tasks, you can wrap them with `SparkSubmitSetting` again like this:\n```scala\n  lazy val settings = SparkSubmitSetting(\n    SparkSubmitSetting(\"spark1\",\n      Seq(\"--class\", \"Main1\")\n    ),\n    SparkSubmitSetting(\"spark2\",\n      Seq(\"--class\", \"Main2\")\n    ),\n    SparkSubmitSetting(\"spark2Other\",\n      Seq(\"--class\", \"Main2\"),\n      Seq(\"hello.txt\")\n    )\n  )\n```\n\nNotice here are two differently named tasks run the same class but with different application arguments.\n\nOf course, you can still append additional arguments in this task. For example:\n```shell\nsbt \"spark2 hello.txt\"\nsbt spark2Other\n```\nwould be equivalent.\n\n`SparkSubmitSetting` has three `apply` functions:\n```scala\ndef apply(name: String): SparkSubmitSetting\ndef apply(name: String, sparkArgs: Seq[String] = Seq(), appArgs: Seq[String] = Seq()): SparkSubmitSetting\ndef apply(sparkSubmitSettings: SparkSubmitSetting*): Seq[Def.Setting[_]]\n```\nThe first creates a simple `SparkSubmitSetting` object with a custom task name. The object itself has `setting` function\nthat allows you to blend in additional settings that is specific to this task.\n\nBecause the most common use case of custom task is to provide custom default Spark and Application arguments,\nthe second variant allow you provide those directly.\n\nThere is already an implicit conversion from `SparkSubmitSetting` to `Seq[Def.Setting[_]]` which allows you to\nappend itself to your project. When there are multiple settings, the third variant allows you to aggregate all\nof them without additional type hinting for implicit to work.\n\nSee [`src/sbt-test/sbt-spark-submit/multi-main`](src/sbt-test/sbt-spark-submit/multi-main) for examples.\n\n\n## Multi-project builds\n\nIf you are really awesome to have a multi-project builds, be careful that `sbt sparkSubmit` will trigger aggregation\nthus firing multiple instances each for every project. You can do `sbt projectA/sparkSubmit` to restrict the project\nscope.\n\nHowever if you define additional sparkSubmit tasks with unique names, this becomes very friendly. For example,\nsay you have two projects `A` and `B`, for which you define `sparkA1`, `sparkA2` and `sparkB` tasks respectively.\nAs long as you attach the `sparkA1` and `sparkA2` to project `A` and `sparkB` to project `B`, `sbt sparkA1` and `sbt sparkA2`\nwill correctly trigger build on project A while `sparkB` will do the same for project `B` even though you didn't\nselect any specific project. \n\nOf course, `sparkB` task won't even trigger a build on `A` unless `B` depends on `A` thanks to the magic of sbt.\n\nSee [`src/sbt-test/sbt-spark-submit/multi-project`](src/sbt-test/sbt-spark-submit/multi-project) for examples.\n\n\n## Customization\n\nBelow we go into details about various keys that controls the default behavior of this task.\n\n\n### Application JAR\n`sparkSubmitJar` specifies the application JAR used in submission. By default this is simply the JAR created by\n`package` task. This will be sufficient to run in local mode.\n\nMore advanced techniques include but not limited to:\n\n1. Use one-jar plugins such as `sbt-assembly` to create a fat jar for deployment.\n2. While YARN automatically uploads the application jar, it doesn't seem to be the case for Spark Standalone\ncluster. So you can inject a JAR uploading process inside this key and returns the uploaded JAR instead. See\n[sbt-assembly-on-ec2](examples/sbt-assembly-on-ec2) for an example.\n\n### Spark and Application Arguments\n`sparkSubmitSparkArgs` and `sparkSubmitAppArgs` represents the arguments for Spark and Application respectively.\nSpark arguments are things like `--class`, `--conf` and etc. Application arguments are for the Spark application\nbeing submitted.\n\n### Application Master\n`sparkSubmitMaster` specifies the default master to use if `--master` is not already supplied. This takes a function\nof the form `(sparkArgs: Seq[String], appArgs: Seq[String]) =\u003e String`. By default it blindly maps to `local`.\n\nMore interesting ones may be:\n\n1. If there is `--help` in `appArgs` you will want to run as `local` to see the usage information immediately.\n2. For YARN deployment, `yarn-cluster` is appropriate especially if you are submitting to a remote cluster from IDE.\n3. For EC2 deployment, you can use `spark-ec2` script to figure out the correct address of Spark master. See\n[sbt-assembly-on-ec2](examples/sbt-assembly-on-ec2) for an example.\n\n### Default Properties File\n`sparkSubmitPropertiesFile` specifies the default properties file to use if `--properties-file` is not already supplied.\n\nThis can be especially useful for YARN deployment by pointing the Spark assembly to a JAR on HDFS via `spark.yarn.jar`\nproperty so as to avoid the overhead of uploading Spark assembly jar every time application is submitted. See\n[sbt-assembly-on-ec2](examples/sbt-assembly-on-yarn) for an example.\n\nOther interesting settings include driver/executor memory/cores, RDD compression/serialization and etc.\n\n### Classpath\n`sparkSubmitClassPath` sets the classpath to use for Spark application deployment. Currently this is only relevant for\nYARN deployment as I couldn't get `yarn-site.xml` correctly picked up even when `HADOOP_CONF_DIR` is properly set.\nIn this case, you can add:\n```scala\nsparkSubmitClasspath := {\n  new File(sys.env.getOrElse(\"HADOOP_CONF_DIR\", \"\")) +:\n    data((fullClasspath in Compile).value)\n}\n```\nNote: This is already automatically injected once you `enablePlugins(SparkSubmitYARN)`\n\n### SparkSubmit inputKey\n`sparkSubmit` is a generic `inputKey` and we will show you how to define additional tasks that have\ndifferent default behavior in terms of parameters. As for the inputKey itself, it parses\nspace delimited arguments. If `--` is present, the former part gets appended to `sparkSubmitSparkArgs` and\nthe latter part gets appended to `sparkSubmitAppArgs`. If `--` is missing, then all arguments are assumed\nto be application arguments.\n\nIf `--master` is missing in `sparkSubmitSparkArgs`, then `sparkSubmitMaster` is used to assign a default\napplication master.\n\nIf `--properties-file` is missing in `sparkSubmitSparkArgs` and `sparkSubmitPropertiesFile` is not `None`,\nthen it will be included.\n\nFinally it runs the Spark application deploy process using the specified Classpath and specified JAR with\nabove mentioned arguments.\n\n\n## Resources\n\nFor more information and working examples, see projects under [`examples`](examples) and [`src/sbt-test`](src/sbt-test).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaurfang%2Fsbt-spark-submit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaurfang%2Fsbt-spark-submit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaurfang%2Fsbt-spark-submit/lists"}