{"id":15045522,"url":"https://github.com/hindog/grid-executor","last_synced_at":"2026-03-16T16:36:10.472Z","repository":{"id":57724973,"uuid":"78470043","full_name":"hindog/grid-executor","owner":"hindog","description":"Library for remote JVM ExecutorService with only dependency being password-less SSH -- Run clustered Hadoop/Spark jobs from IDE -- IDE-pimped Spark shell with full auto-completion!","archived":false,"fork":false,"pushed_at":"2021-02-11T08:35:18.000Z","size":206,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-20T13:23:46.197Z","etag":null,"topics":["cloud","grid","hadoop","ide","jvm","spark-shell"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hindog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-09T21:19:51.000Z","updated_at":"2021-02-11T08:35:55.000Z","dependencies_parsed_at":"2022-09-10T23:56:34.397Z","dependency_job_id":null,"html_url":"https://github.com/hindog/grid-executor","commit_stats":null,"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hindog%2Fgrid-executor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hindog%2Fgrid-executor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hindog%2Fgrid-executor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hindog%2Fgrid-executor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hindog","download_url":"https://codeload.github.com/hindog/grid-executor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243451605,"owners_count":20293168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud","grid","hadoop","ide","jvm","spark-shell"],"created_at":"2024-09-24T20:51:58.687Z","updated_at":"2025-12-29T16:43:41.599Z","avatar_url":"https://github.com/hindog.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Spark and JVM Remote Execution ##\n\nThis project essentially allows you to replicate a local JVM process on one or more remote hosts using only SSH.  You can then execute any local code on the remote hosts with full STDOUT/STDERR streaming back to the local process. \n\n**This is extremely handy for Spark jobs, because it allows running Spark jobs from your local IDE on a remote cluster just like any other application.** \n\n\n#### Features ####\n\n* Zero-deployment remote JVM execution.  Automatically replicates the local classpath to the remote target(s) while also caching JAR's on the remote host for faster execution on repeated runs. \n* Support for remote Spark/Hadoop execution from IDE for fast, iterative development and feedback (ie: `spark-submit` or `hadoop` on a hadoop gateway box, without manually uploading jars).\n* Implements `ExecutorService` to support submitting `Runnable` and/or `Callable[T]` to the grid nodes.\n* Contains hooks for Scala `Future[T]` to allow for transparent grid execution by wrapping the `GridExecutor` in a Scala `ExecutionContext`.\n* By default, the library will bind remote STDOUT/STDERR to local STDOUT/STDERR and optionally STDIN can be bound as well.\n* Can be integrated with [JClouds](https://jclouds.apache.org/) to provision grids on-the-fly \n* Open-Source, Apache 2.0 License\n* Support for \"IDE-pimped\" `spark-shell` that gives you full power of the IDE's completion/import/copy-paste support while interacting with a Spark shell running remotely on the cluster! (See [SparkShellExample.scala](https://github.com/hindog/grid-executor/blob/master/grid-executor-examples/src/main/scala/com/hindog/grid/examples/SparkShellExample.scala) for instructions)\n\n\n#### Import ####\n\nUsing SBT:\n\n```\nlibraryDependencies += \"com.hindog.grid\" %% \"grid-executor-core\" % \"2.0.7\"\n```\n\nUsing Maven:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.hindog.grid\u003c/groupId\u003e\n    \u003cartifactId\u003egrid-executor-core_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e2.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nPackage Import:\n\n```scala\nimport com.hindog.grid._\n```\n\n#### Configuration ####\n\nConfiguration is provided via the `GridConfig` builder methods or a properties file (or both).  Properties are scoped by `grid.`.  \n\n```\n\n# Configures your remote username (if different from your local username)\n\ngrid.remote\\:account.\u003clocal username\u003e=\u003cremote username\u003e\n\n# Adds JVM arg to ALL remote executions\ngrid.jvm\\:xx\\:permgen=-XX:MaxPermSize=768M\n \n# Sets GridKit's \"remote-runtime:jar-cache\" property to determine where to store jars remotely (here we override the default [/tmp/nanocloud] to the user's [~/.jar-cache] on the remote box)\ngrid.remote-runtime\\:jar-cache=.jar-cache\n \n# Sets GridKit's \"node:config-trace\" property to dump ViEngine config on startup\ngrid.node\\:config-trace=true\n \n# Adds JVM args specific to remote executions on 'myGrid' nodes\ngrid.jvm\\:xx\\:mx.myGrid=-Xmx8g\ngrid.jvm\\:exec-command.myGrid=/path/to/java\n \n# User override to enable remote debug server that client can connect to\n#grid.jvm\\:xx\\:debug.myGrid.ahiniker=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5004\n \n# User override to enable remote debug client that will connect to our IDE on startup (replace the IP with your laptop's IP)\n#grid.jvm\\:xx\\:debug.myGrid.ahiniker=-agentlib:jdwp=transport=dt_socket,server=n,address=10.170.1.45:5004,suspend=y\n```\n\n#### Examples ####\n\nAll of the examples below assume the following imports/base trait to provide some default grid definitions.  Also, it references a classpath resource of `grid.properties` that contains any configuration properties as outlined above.\n\nNOTE: the grid definitions refer to `server1.example.com` and `server2.example.com`, these should be replaced with hostnames configured on your network.  Password-less SSH needs to be configured for each host. \n\n```scala\npackage com.hindog.grid.examples\n\nimport java.lang.management.ManagementFactory\nimport java.util.concurrent.Callable\n\nimport com.hindog.grid.GridConfigurable.Hook\nimport com.hindog.grid._\nimport scala.concurrent.duration._\nimport scala.concurrent._\n\ntrait GridExampleApp extends App {\n\n\tdef message(msg: String = \"Hello!\") = s\"$msg [thread: \" + Thread.currentThread().getId + \" within process: \" + ManagementFactory.getRuntimeMXBean().getName() + \"]\"\n\n\tval configOneRemote: GridConfig = GridConfig(\n\t\t\"myGrid\",\n\t\tRemoteNodeConfig(\"server1.example.com\", \"server1\") // host + alias\n\t).withPropertyOverrides(System.getProperties).withPropertyOverrides(properties)\n\n\tval configTwoRemote: GridConfig = GridConfig(\n\t\t\"myGrid\",\n\t\tRemoteNodeConfig(\"server1.example.com\", \"server1\"), // host + alias\n\t\tRemoteNodeConfig(\"server2.example.com\", \"server2\") // host + alias\n\t).withPropertyOverrides(System.getProperties).withPropertyOverrides(properties)\n}\n\n```\n\n#### Scala ExecutionContext / Future Example ####\n\nDemonstrates how we can use GridExecutor with Scala's `Future[T]` natively without any GridExecutor specific code.  Parallel collections are not supported.\n\n```scala\nobject GridExecutorScalaFutureExample extends GridExampleApp {\n\n\tval remoteNodeConfig = configOneRemote.nodes.head\n\n\tval remoteNode1 = remoteNodeConfig.withName(remoteNodeConfig.name + \"-1\")\n\tval remoteNode2 = remoteNodeConfig.withName(remoteNodeConfig.name + \"-2\")\n\tval localNode1 =  LocalNodeConfig(\"local-1\")\n\tval localNode2 =  LocalNodeConfig(\"local-2\")\n\n\t// Set our config to use 2 remote and 2 local execution slots (4 total)\n\tval config2 = configOneRemote.withNodes(remoteNode1, remoteNode2, localNode1, localNode2)\n\n\t// create an implicit ExecutionContext to execute against\n\timplicit val ec = ExecutionContext.fromExecutorService(GridExecutor(config2))\n\n\t// No references to GridExecutor are present in the following code\n\t// This will use scala's Future to run tasks in parallel using remote JVMs\n\t// We throw a Thread.sleep to simulate real work, total time should reflect parallel execution completed\n\t// the work in less time than sequential execution\n\tval start = System.currentTimeMillis()\n\n\tval futures = (0 to 20).map(i =\u003e Future {\n\t\tprintln(message(s\"executing task $i\"))\n\t\tThread.sleep(1000)\n\t\ts\"result $i\"\n\t})\n\n\tval results = Await.result(Future.sequence(futures), Duration.Inf)\n\tprintln(s\"results = $results\")\n\tprintln(\"total time: \" + (System.currentTimeMillis() - start) + \"ms\")\n\tec.shutdown()\n}\n```\n\n#### Single-use Example ####\n\nDemonstates how to initialize a grid whose life-cycle is scoped to a single task.\n\nNOTE: the overhead in instantiating the cloud will be incurred on each invocation (as part of the future), but once the jars have sync'ed then subsequent invocations will have reduced overhead.\n\n```scala\nobject GridExecutorSingleFutureExample extends GridExampleApp {\n\n\timport scala.collection.JavaConverters._\n\timport scala.concurrent.ExecutionContext.Implicits.global\n\n\tval fut = GridExecutor.future(configOneRemote) {\n\t\tprintln(message())\n\t\tSystem.getenv().asScala.toSeq.sortBy(_._1)\n\t}\n\n\tAwait.result(fut, Duration.Inf).foreach(kv =\u003e println(kv._1 + \"=\" + kv._2))\n}\n```\n\n#### Multi-use Example ####\n\nDemonstates how to initialize a grid whose life-cycle is scoped to `thunk`.  Can be used to submit multiple tasks in an ad-hoc fashion.\n\n```scala\nobject GridExecutorScopedMultiUseExample extends GridExampleApp {\n\n\t// Submit 2 tasks and print their results\n\tGridExecutor.withInstance(configTwoRemote) { executor =\u003e\n\t\tval fut1 = executor.submit(new Callable[String] {\n\t\t\toverride def call(): String = {\n\t\t\t\tprintln(\"started task A\")\n\t\t\t\tThread.sleep(5000)\n\t\t\t\tmessage(\"result A\")\n\t\t\t}\n\t\t})\n\n\t\tval fut2 = executor.submit(new Callable[String] {\n\t\t\toverride def call(): String = {\n\t\t\t\tprintln(\"started task B\")\n\t\t\t\tThread.sleep(5000)\n\t\t\t\tmessage(\"result B\")\n\t\t\t}\n\t\t})\n\n\t\tprintln(\"Future 1 result: \" + fut1.get())\n\t\tprintln(\"Future 2 result: \" + fut2.get())\n\t}\n\n}\n```\n\n#### Startup/Shutdown Hooks Example ####\n\nStartup/Shutdown hooks allow arbirary code to be registered for execution as part of each node's startup or shutdown sequence.\n\n```scala\nobject GridExecutorScopedWithInitializationExample extends Logging {\n\tvar globalValue: String = \"default value\"\n\n\tdef main(args: Array[String]) = {\n\n\t\t// Define an initialization process by using 'addStartupHook(new Hook(\"name\") {...})'\n\n\t\t// below we will add a hook to set the 'globalValue' on the remote box on startup\n\t\tval baseConfig: GridConfig = GridConfig(\n\t\t\t\"myGrid\",\n\t\t\tRemoteNodeConfig(\"server1.example.com\", \"server1\")\n\t\t).withPropertyOverrides(System.getProperties)\n\n\t\tval config = baseConfig.addStartupHook(new Hook(\"my init hook\") {\n\t\t\toverride def run(): Unit = {\n\t\t\t\t// modify our global variable\n\t\t\t\tprintln(\"running initialization...\")\n\t\t\t\tprintln(GridExecutorScopedWithInitializationExample.globalValue)\n\t\t\t\tGridExecutorScopedWithInitializationExample.globalValue = \"initialized value\"\n\t\t\t}\n\t\t}).addShutdownHook(new Hook(\"my shutdown hook\") {\n\t\t\toverride def run(): Unit = {\n\t\t\t\tinfo(\"running delay\")\n\t\t\t\tThread.sleep(1000)\n\t\t\t}\n\t\t})\n\n\t\tval fut = GridExecutor.future(config) {\n\t\t\t// should return the value set via our initialization hook\n\t\t\tGridExecutorScopedWithInitializationExample.globalValue\n\t\t}\n\n\t\t// should reflect the init'ed value\n\t\tprintln(\"remote globalValue: \" + Await.result(fut, Duration.Inf))\n\t\t// local value should be unchanged\n\t\tprintln(\"local globalValue: \" + globalValue)\n\t}\n}\n```\n\n#### Local Fork Example ####\n\nDemonstrates how to configure a local node that can be used for running code in a forked JVM.\n\n```scala\nobject GridExecutorLocalForkExample extends App {\n\timport scala.concurrent.ExecutionContext.Implicits.global\n\n\tprintln(\"host jvm: \" + ManagementFactory.getRuntimeMXBean.getName)\n\n\tval config1: GridConfig = GridConfig.localFork(\"fork 1\").withMaxHeap(\"20m\").withMinHeap(\"20m\")\n\tval config2: GridConfig = GridConfig.localFork(\"fork 2\").withMaxHeap(\"40m\").withMinHeap(\"40m\")\n\n\tval fut1: Future[Unit] = GridExecutor.future(config1) {\n\t\tprintln(\"forked jvm 1: \" + ManagementFactory.getRuntimeMXBean.getName)\n\t\tprintln(\"total memory 1: \" + Runtime.getRuntime.totalMemory())\n\t\tThread.sleep(5000)\n\t}\n\n\tval fut2: Future[Unit] = GridExecutor.future(config2) {\n\t\tprintln(\"forked jvm 2: \" + ManagementFactory.getRuntimeMXBean.getName)\n\t\tprintln(\"total memory 2: \" + Runtime.getRuntime.totalMemory())\n\t\tThread.sleep(5000)\n\t}\n\n\tAwait.result(Future.sequence(Seq(fut1, fut2)), 10 seconds)\n\n}\n```\n\n### Gotchas ###\n\n#### Spark / Hadoop Dependencies ####\nFor remote Spark/Hadoop execution, if your `App` class contains method signatures that reference classes from `provided` cluster jars, then the execution will fail unless those libraries are configured for `compile` scope.  Another work-around is to remove all traces of such classes in your `App` class method/field signatures and delegate to another class with your job's logic from within the body of the `run` method (method bodies aren't validated by the JVM on startup).  This will be addressed in an upcoming `2.0` release.\n\n#### Auth Errors ####\nIf you experience a `JSchAuthCancelException` or similar when running, it is most likely because your SSH key is not of the required minimum length (2048 bits).  Try generating a new key that is at least 2048 bits in length. \n\n### TODO ###\n\n#### Spark Shell ####\nTutorial, video or animated GIF that shows how to configure the IDE-pimped shell.\n\n#### TypeSafe Config Support ####\nUpcoming `2.x` release will have an overhauled configuration process that allows for nested/inherited grid configs.  This will minimize the effort required for configuration while also providing good flexibility for per-grid, per-host, per-job, per-user configuration options, etc.\n\n#### Tutorials / Documentation ####\n***Coming Soon***","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhindog%2Fgrid-executor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhindog%2Fgrid-executor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhindog%2Fgrid-executor/lists"}