Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/savvydatainsights/spark
Apache Spark cluster lab.
https://github.com/savvydatainsights/spark
ansible apache-spark apache-spark-cluster vagrant
Last synced: 12 days ago
JSON representation
Apache Spark cluster lab.
- Host: GitHub
- URL: https://github.com/savvydatainsights/spark
- Owner: savvydatainsights
- License: mit
- Created: 2019-05-22T09:43:11.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-27T13:33:13.000Z (almost 2 years ago)
- Last Synced: 2024-11-17T07:21:32.996Z (2 months ago)
- Topics: ansible, apache-spark, apache-spark-cluster, vagrant
- Language: Java
- Size: 7.1 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Spark
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[Apache Spark](https://spark.apache.org) cluster lab.
## Setup
`vagrant up`
## WebUIs
| Node | URL |
| ---- | --- |
| master | |
| slave1 | |
| slave2 | |## Submitting the sample application
The sample application counts how many times each word appears in a [lorem ipsum](https://www.lipsum.com) text.
In order to submit it, execute:
`ansible-playbook submit-spark-application.yml`
The output then can be viewed under the *output* folder.
### Manually from the master
The application can also be submitted manually from the master host of the cluster.
First, SSH into the master: `vagrant ssh master`.
Once inside the master, become root: `sudo su -`.
After that, build the application: `mvn install -f /vagrant`.
Finally, submit the application:
```bash
/opt/spark/bin/spark-submit --master spark://192.168.33.10:7077 \
--conf spark.driver.host=192.168.33.10 \
--class uk.co.savvydatainsights.WordCount \
/vagrant/target/spark-examples-1.0-SNAPSHOT.jar \
/vagrant/input/lorem-ipsum.txt
```## Submitting your own Java application
You can also submit to the Spark cluster your own Java application, by setting the parameters *repo* and *class*, like in the example:
```bash
ansible-playbook submit-spark-application.yml \
-e "repo=https://github.com/project/repo.git" \
-e "class=com.domain.spark.JavaApplication"
```You will be prompted to inform your repository credentials. Then, the application will be cloned, built and submitted to the Spark master instance.
Requirements:
- The project must be a [Maven](https://maven.apache.org) project;
- It is expected one input file under the *src/main/resources* folder.