https://github.com/smola/spark-glusterfs-example
An example of Apache Spark integration with GlusterFS.
https://github.com/smola/spark-glusterfs-example
example-project glusterfs maven scala spark
Last synced: about 1 month ago
JSON representation
An example of Apache Spark integration with GlusterFS.
- Host: GitHub
- URL: https://github.com/smola/spark-glusterfs-example
- Owner: smola
- License: apache-2.0
- Created: 2018-10-18T14:51:26.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-18T15:05:56.000Z (over 7 years ago)
- Last Synced: 2025-04-03T12:14:51.564Z (about 1 year ago)
- Topics: example-project, glusterfs, maven, scala, spark
- Language: Scala
- Size: 11.7 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spark-glusterfs-example
This is a simple example of [Apache Spark](https://spark.apache.org/) working with [Gluster](https://www.gluster.org/), using [glusterfs-hadoop](https://github.com/gluster/glusterfs-hadoop).
## Build
To build the project, just run:
```
./mvnw package
```
The application jar will be written to `target/spark-gluster-example-.jar`.
## Run
A working Gluster cluster is required. If you are looking for a simple way to test locally, we recommend using [carmstrong/multinode-glusterfs-vagrant](https://github.com/carmstrong/multinode-glusterfs-vagrant).
For this example, we will assume that:
* `SPARK_HOME` environment variable contains the path to an Apache Spark distribution.
* `HADOOP_CONF_DIR` points to a directory with a Hadoop `core-site.xml`, containing some Gluster configuration. In [`conf/core`](https://github.com/smola/spark-glusterfs-example/blob/master/conf/core-site.xml) you will find a minimal working example, assuming an existing Gluster volume named `gv0` and mounted in `/mnt/gv0`.
Now you can run:
```
$SPARK_HOME/bin/spark-submit \
--master 'local[2]' \
target/spark-gluster-example-0.1.0-SNAPSHOT.jar
```
This will generate numbers from 1 to 100000 and write them to Gluster in Parquet format. Then it will read them back and compare to the original. If the result is correct, it will print `OK` and exit with status 0, otherwise, it will print `KO` and exit with status 1.
## License
This example is released under the terms of the [Apache 2 License](LICENSE).