Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hindog/grid-executor
Library for remote JVM ExecutorService with only dependency being password-less SSH -- Run clustered Hadoop/Spark jobs from IDE -- IDE-pimped Spark shell with full auto-completion!
https://github.com/hindog/grid-executor
cloud grid hadoop ide jvm spark-shell
Last synced: 19 days ago
JSON representation
Library for remote JVM ExecutorService with only dependency being password-less SSH -- Run clustered Hadoop/Spark jobs from IDE -- IDE-pimped Spark shell with full auto-completion!
- Host: GitHub
- URL: https://github.com/hindog/grid-executor
- Owner: hindog
- License: apache-2.0
- Created: 2017-01-09T21:19:51.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-02-11T08:35:18.000Z (almost 4 years ago)
- Last Synced: 2024-11-19T10:57:15.096Z (3 months ago)
- Topics: cloud, grid, hadoop, ide, jvm, spark-shell
- Language: Scala
- Homepage:
- Size: 201 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Spark and JVM Remote Execution ##
This project essentially allows you to replicate a local JVM process on one or more remote hosts using only SSH. You can then execute any local code on the remote hosts with full STDOUT/STDERR streaming back to the local process.
**This is extremely handy for Spark jobs, because it allows running Spark jobs from your local IDE on a remote cluster just like any other application.**
#### Features ####
* Zero-deployment remote JVM execution. Automatically replicates the local classpath to the remote target(s) while also caching JAR's on the remote host for faster execution on repeated runs.
* Support for remote Spark/Hadoop execution from IDE for fast, iterative development and feedback (ie: `spark-submit` or `hadoop` on a hadoop gateway box, without manually uploading jars).
* Implements `ExecutorService` to support submitting `Runnable` and/or `Callable[T]` to the grid nodes.
* Contains hooks for Scala `Future[T]` to allow for transparent grid execution by wrapping the `GridExecutor` in a Scala `ExecutionContext`.
* By default, the library will bind remote STDOUT/STDERR to local STDOUT/STDERR and optionally STDIN can be bound as well.
* Can be integrated with [JClouds](https://jclouds.apache.org/) to provision grids on-the-fly
* Open-Source, Apache 2.0 License
* Support for "IDE-pimped" `spark-shell` that gives you full power of the IDE's completion/import/copy-paste support while interacting with a Spark shell running remotely on the cluster! (See [SparkShellExample.scala](https://github.com/hindog/grid-executor/blob/master/grid-executor-examples/src/main/scala/com/hindog/grid/examples/SparkShellExample.scala) for instructions)#### Import ####
Using SBT:
```
libraryDependencies += "com.hindog.grid" %% "grid-executor-core" % "2.0.7"
```Using Maven:
```xml
com.hindog.grid
grid-executor-core_2.11
2.0.7```
Package Import:
```scala
import com.hindog.grid._
```#### Configuration ####
Configuration is provided via the `GridConfig` builder methods or a properties file (or both). Properties are scoped by `grid.`.
```
# Configures your remote username (if different from your local username)
grid.remote\:account.=
# Adds JVM arg to ALL remote executions
grid.jvm\:xx\:permgen=-XX:MaxPermSize=768M
# Sets GridKit's "remote-runtime:jar-cache" property to determine where to store jars remotely (here we override the default [/tmp/nanocloud] to the user's [~/.jar-cache] on the remote box)
grid.remote-runtime\:jar-cache=.jar-cache
# Sets GridKit's "node:config-trace" property to dump ViEngine config on startup
grid.node\:config-trace=true
# Adds JVM args specific to remote executions on 'myGrid' nodes
grid.jvm\:xx\:mx.myGrid=-Xmx8g
grid.jvm\:exec-command.myGrid=/path/to/java
# User override to enable remote debug server that client can connect to
#grid.jvm\:xx\:debug.myGrid.ahiniker=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5004
# User override to enable remote debug client that will connect to our IDE on startup (replace the IP with your laptop's IP)
#grid.jvm\:xx\:debug.myGrid.ahiniker=-agentlib:jdwp=transport=dt_socket,server=n,address=10.170.1.45:5004,suspend=y
```#### Examples ####
All of the examples below assume the following imports/base trait to provide some default grid definitions. Also, it references a classpath resource of `grid.properties` that contains any configuration properties as outlined above.
NOTE: the grid definitions refer to `server1.example.com` and `server2.example.com`, these should be replaced with hostnames configured on your network. Password-less SSH needs to be configured for each host.
```scala
package com.hindog.grid.examplesimport java.lang.management.ManagementFactory
import java.util.concurrent.Callableimport com.hindog.grid.GridConfigurable.Hook
import com.hindog.grid._
import scala.concurrent.duration._
import scala.concurrent._trait GridExampleApp extends App {
def message(msg: String = "Hello!") = s"$msg [thread: " + Thread.currentThread().getId + " within process: " + ManagementFactory.getRuntimeMXBean().getName() + "]"
val configOneRemote: GridConfig = GridConfig(
"myGrid",
RemoteNodeConfig("server1.example.com", "server1") // host + alias
).withPropertyOverrides(System.getProperties).withPropertyOverrides(properties)val configTwoRemote: GridConfig = GridConfig(
"myGrid",
RemoteNodeConfig("server1.example.com", "server1"), // host + alias
RemoteNodeConfig("server2.example.com", "server2") // host + alias
).withPropertyOverrides(System.getProperties).withPropertyOverrides(properties)
}```
#### Scala ExecutionContext / Future Example ####
Demonstrates how we can use GridExecutor with Scala's `Future[T]` natively without any GridExecutor specific code. Parallel collections are not supported.
```scala
object GridExecutorScalaFutureExample extends GridExampleApp {val remoteNodeConfig = configOneRemote.nodes.head
val remoteNode1 = remoteNodeConfig.withName(remoteNodeConfig.name + "-1")
val remoteNode2 = remoteNodeConfig.withName(remoteNodeConfig.name + "-2")
val localNode1 = LocalNodeConfig("local-1")
val localNode2 = LocalNodeConfig("local-2")// Set our config to use 2 remote and 2 local execution slots (4 total)
val config2 = configOneRemote.withNodes(remoteNode1, remoteNode2, localNode1, localNode2)// create an implicit ExecutionContext to execute against
implicit val ec = ExecutionContext.fromExecutorService(GridExecutor(config2))// No references to GridExecutor are present in the following code
// This will use scala's Future to run tasks in parallel using remote JVMs
// We throw a Thread.sleep to simulate real work, total time should reflect parallel execution completed
// the work in less time than sequential execution
val start = System.currentTimeMillis()val futures = (0 to 20).map(i => Future {
println(message(s"executing task $i"))
Thread.sleep(1000)
s"result $i"
})val results = Await.result(Future.sequence(futures), Duration.Inf)
println(s"results = $results")
println("total time: " + (System.currentTimeMillis() - start) + "ms")
ec.shutdown()
}
```#### Single-use Example ####
Demonstates how to initialize a grid whose life-cycle is scoped to a single task.
NOTE: the overhead in instantiating the cloud will be incurred on each invocation (as part of the future), but once the jars have sync'ed then subsequent invocations will have reduced overhead.
```scala
object GridExecutorSingleFutureExample extends GridExampleApp {import scala.collection.JavaConverters._
import scala.concurrent.ExecutionContext.Implicits.globalval fut = GridExecutor.future(configOneRemote) {
println(message())
System.getenv().asScala.toSeq.sortBy(_._1)
}Await.result(fut, Duration.Inf).foreach(kv => println(kv._1 + "=" + kv._2))
}
```#### Multi-use Example ####
Demonstates how to initialize a grid whose life-cycle is scoped to `thunk`. Can be used to submit multiple tasks in an ad-hoc fashion.
```scala
object GridExecutorScopedMultiUseExample extends GridExampleApp {// Submit 2 tasks and print their results
GridExecutor.withInstance(configTwoRemote) { executor =>
val fut1 = executor.submit(new Callable[String] {
override def call(): String = {
println("started task A")
Thread.sleep(5000)
message("result A")
}
})val fut2 = executor.submit(new Callable[String] {
override def call(): String = {
println("started task B")
Thread.sleep(5000)
message("result B")
}
})println("Future 1 result: " + fut1.get())
println("Future 2 result: " + fut2.get())
}}
```#### Startup/Shutdown Hooks Example ####
Startup/Shutdown hooks allow arbirary code to be registered for execution as part of each node's startup or shutdown sequence.
```scala
object GridExecutorScopedWithInitializationExample extends Logging {
var globalValue: String = "default value"def main(args: Array[String]) = {
// Define an initialization process by using 'addStartupHook(new Hook("name") {...})'
// below we will add a hook to set the 'globalValue' on the remote box on startup
val baseConfig: GridConfig = GridConfig(
"myGrid",
RemoteNodeConfig("server1.example.com", "server1")
).withPropertyOverrides(System.getProperties)val config = baseConfig.addStartupHook(new Hook("my init hook") {
override def run(): Unit = {
// modify our global variable
println("running initialization...")
println(GridExecutorScopedWithInitializationExample.globalValue)
GridExecutorScopedWithInitializationExample.globalValue = "initialized value"
}
}).addShutdownHook(new Hook("my shutdown hook") {
override def run(): Unit = {
info("running delay")
Thread.sleep(1000)
}
})val fut = GridExecutor.future(config) {
// should return the value set via our initialization hook
GridExecutorScopedWithInitializationExample.globalValue
}// should reflect the init'ed value
println("remote globalValue: " + Await.result(fut, Duration.Inf))
// local value should be unchanged
println("local globalValue: " + globalValue)
}
}
```#### Local Fork Example ####
Demonstrates how to configure a local node that can be used for running code in a forked JVM.
```scala
object GridExecutorLocalForkExample extends App {
import scala.concurrent.ExecutionContext.Implicits.globalprintln("host jvm: " + ManagementFactory.getRuntimeMXBean.getName)
val config1: GridConfig = GridConfig.localFork("fork 1").withMaxHeap("20m").withMinHeap("20m")
val config2: GridConfig = GridConfig.localFork("fork 2").withMaxHeap("40m").withMinHeap("40m")val fut1: Future[Unit] = GridExecutor.future(config1) {
println("forked jvm 1: " + ManagementFactory.getRuntimeMXBean.getName)
println("total memory 1: " + Runtime.getRuntime.totalMemory())
Thread.sleep(5000)
}val fut2: Future[Unit] = GridExecutor.future(config2) {
println("forked jvm 2: " + ManagementFactory.getRuntimeMXBean.getName)
println("total memory 2: " + Runtime.getRuntime.totalMemory())
Thread.sleep(5000)
}Await.result(Future.sequence(Seq(fut1, fut2)), 10 seconds)
}
```### Gotchas ###
#### Spark / Hadoop Dependencies ####
For remote Spark/Hadoop execution, if your `App` class contains method signatures that reference classes from `provided` cluster jars, then the execution will fail unless those libraries are configured for `compile` scope. Another work-around is to remove all traces of such classes in your `App` class method/field signatures and delegate to another class with your job's logic from within the body of the `run` method (method bodies aren't validated by the JVM on startup). This will be addressed in an upcoming `2.0` release.#### Auth Errors ####
If you experience a `JSchAuthCancelException` or similar when running, it is most likely because your SSH key is not of the required minimum length (2048 bits). Try generating a new key that is at least 2048 bits in length.### TODO ###
#### Spark Shell ####
Tutorial, video or animated GIF that shows how to configure the IDE-pimped shell.#### TypeSafe Config Support ####
Upcoming `2.x` release will have an overhauled configuration process that allows for nested/inherited grid configs. This will minimize the effort required for configuration while also providing good flexibility for per-grid, per-host, per-job, per-user configuration options, etc.#### Tutorials / Documentation ####
***Coming Soon***