Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/melezhik/sparrowdo-spark
Quick Spark Installer for CentOS and Docker
https://github.com/melezhik/sparrowdo-spark
centos spark sparrowdo
Last synced: 18 days ago
JSON representation
Quick Spark Installer for CentOS and Docker
- Host: GitHub
- URL: https://github.com/melezhik/sparrowdo-spark
- Owner: melezhik
- Created: 2017-10-18T09:48:48.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-10-18T15:05:41.000Z (about 7 years ago)
- Last Synced: 2024-11-05T21:50:01.179Z (2 months ago)
- Topics: centos, spark, sparrowdo
- Homepage:
- Size: 18.6 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Synopsis
Instal Spark cluster with Docker and Sparrowdo and CentOS.
# Caveat
CentOS is a only supported platform for the moment. Tested against [official CentOS Docker image](https://hub.docker.com/_/centos/).
# Install
$ zef install Sparrowdo Sparrowo::RemoteFile Sparrowdo:Archive
$ git clone https://github.com/melezhik/sparrowdo-spark.git
$ cd sparrowdo-spark # all the following commands will be run from here# Usage
## Setup docker
First of all you should create a dedicated network for all Spark instances:
$ docker network create --subnet=172.18.0.0/16 spark-net
## Install master
Run Docker container for master. You should call Spark master container as `master`, it is obligatory:
$ docker run --entrypoint init --net spark-net --ip 172.18.0.2 -t -d --name master centos
Deploy Spark master on running container:
$ sparrowdo \
--docker=master \
--no_sudo \
--sparrowfile=sparrowfile-master \
--format=production --bootstrap
## Install slavesRun Docker container for slave:
$ docker run --privileged --entrypoint init -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--net spark-net --ip 172.18.0.4 -t -d --name worker1 centosDeploy Spark slave on running container:
$ sparrowdo \
--docker=worker1 \
--no_sudo \
--sparrowfile=sparrowfile-slave \
--format=production --bootstrapAnd so on, launch as many slaves as you wish.
## Picking up new slaves
Once you've created a master and some slaves, you need to run [cluster launch script](https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts)
so that master found its new slaves.$ nano config.pl6
{
master => '172.18.0.2',
workers => (
'172.18.0.4',
'172.18.0.5',
'172.18.0.6'
)
}$ sparrowdo \
--docker=master \
--no_sudo \
--sparrowfile=sparrowfile-cluster-launch \
--format=production --bootstrapWait for awhile, let Spark do its job and then visit Spark web UI to check that both master and slaves get run successfully:
$ firefox 172.18.0.2:8080
# See also
https://spark.apache.org/docs/latest/spark-standalone.html