https://github.com/mendhak/docker-spark-experimental
Experimental Spark master and worker with spark-shell and pyspark
https://github.com/mendhak/docker-spark-experimental
Last synced: 5 months ago
JSON representation
Experimental Spark master and worker with spark-shell and pyspark
- Host: GitHub
- URL: https://github.com/mendhak/docker-spark-experimental
- Owner: mendhak
- Created: 2018-07-11T20:42:17.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-07-11T21:36:25.000Z (over 7 years ago)
- Last Synced: 2025-02-12T06:38:47.440Z (11 months ago)
- Size: 3.91 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Docker container with `spark-shell` and `pyspark` shell. Based on Ubuntu 16.04, with Spark 2.3.1 and Hadoop 2.7.
### Prepare containers
First rebuild the image
docker build -t spark .
In one terminal, start the master and worker
docker-compose up
Browse to the master: http://127.0.0.1:8080/
Browse to the worker: http://127.0.0.1:8081/
### Try spark-shell
In another terminal
docker-compose exec spark-master bash
In this master's bash, start a new spark shell and specify the master.
spark-shell --master spark://spark-master:7077
Try this code sample to see it working
val NUM_SAMPLES = 100000000
val count = sc.parallelize(1 to NUM_SAMPLES).filter { _ =>
val x = math.random
val y = math.random
x*x + y*y < 1
}.count()
println(s"Pi is roughly ${4.0 * count / NUM_SAMPLES}")
While it runs, watch it on the master's web page.
Exit the scala shell using `:quit`
### Try pyspark
In the master's bash, start a new pyspark shell and specify the master.
pyspark --master spark://spark-master:7077
Try this code sample to see it working
import random
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
While it runs, watch it on the master's web page.
Exit the pyspark shell using `quit()`