https://github.com/smola/spark-glusterfs-example

An example of Apache Spark integration with GlusterFS.
https://github.com/smola/spark-glusterfs-example

example-project glusterfs maven scala spark

Last synced: about 1 month ago
JSON representation

An example of Apache Spark integration with GlusterFS.

Host: GitHub
URL: https://github.com/smola/spark-glusterfs-example
Owner: smola
License: apache-2.0
Created: 2018-10-18T14:51:26.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-18T15:05:56.000Z (over 7 years ago)
Last Synced: 2025-04-03T12:14:51.564Z (about 1 year ago)
Topics: example-project, glusterfs, maven, scala, spark
Language: Scala
Size: 11.7 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# spark-glusterfs-example

This is a simple example of [Apache Spark](https://spark.apache.org/) working with [Gluster](https://www.gluster.org/), using [glusterfs-hadoop](https://github.com/gluster/glusterfs-hadoop).

## Build

To build the project, just run:

```
./mvnw package
```

The application jar will be written to `target/spark-gluster-example-.jar`.

## Run

A working Gluster cluster is required. If you are looking for a simple way to test locally, we recommend using [carmstrong/multinode-glusterfs-vagrant](https://github.com/carmstrong/multinode-glusterfs-vagrant).

For this example, we will assume that:
* `SPARK_HOME` environment variable contains the path to an Apache Spark distribution.
* `HADOOP_CONF_DIR` points to a directory with a Hadoop `core-site.xml`, containing some Gluster configuration. In [`conf/core`](https://github.com/smola/spark-glusterfs-example/blob/master/conf/core-site.xml) you will find a minimal working example, assuming an existing Gluster volume named `gv0` and mounted in `/mnt/gv0`.

Now you can run:

```
$SPARK_HOME/bin/spark-submit \
--master 'local[2]' \
target/spark-gluster-example-0.1.0-SNAPSHOT.jar
```

This will generate numbers from 1 to 100000 and write them to Gluster in Parquet format. Then it will read them back and compare to the original. If the result is correct, it will print `OK` and exit with status 0, otherwise, it will print `KO` and exit with status 1.

## License

This example is released under the terms of the [Apache 2 License](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smola/spark-glusterfs-example

Awesome Lists containing this project

README