https://github.com/duyet/spark-example-scala
https://github.com/duyet/spark-example-scala
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/duyet/spark-example-scala
- Owner: duyet
- Created: 2023-02-22T08:23:28.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-12-15T02:41:12.000Z (over 2 years ago)
- Last Synced: 2025-01-26T12:13:06.389Z (over 1 year ago)
- Language: Scala
- Size: 1.69 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Table of Contents
- [Prerequisites](#prerequisites)
- [Building Spark Apps](#build-spark-apps)
- [Example 1: Spark Scala](#example-1-spark-scala)
- [Example 2: Spark Scala Calling Rust Binary](#example-2-spark-scala-calling-rust-binary)
# Prerequisites
Make sure you have installed all of the following prerequisites on your development machine:
- Scala + SBT
```bash
brew install sbt
```
- Rust and Cargo
```bash
curl https://sh.rustup.rs -sSf | sh
```
- Spark
```bash
curl -o ~/Downloads/spark-3.3.2-bin-hadoop3.tgz https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
tar zxvf ~/Downloads/spark-3.3.2-bin-hadoop3.tgz -C ~/Downloads
```

`spark-submit` should available at `~/Downloads/spark-3.3.2-bin-hadoop3/bin/spark-submit`.
# Build Spark Apps
Build all Spark Apps into single .jar file using
```bash
sbt package
```

# Example 1: Spark Scala
Simple SparkPI
```bash
~/Downloads/spark-3.3.2-bin-hadoop3/bin/spark-submit \
--class "example.SparkPI" \
--master "local[*]" \
target/scala-2.12/spark-example_2.12-0.1.0-SNAPSHOT.jar 9999
```

# Example 2: Spark Scala calling Rust binary
1. Build Rust
```bash
(cd rust && cargo build --release && ls -lp target/release | grep -v /)
```
Test the Rust simple pipe processing:
```bash
echo '{"id": "duyet", "a":1, "b": 2}' | ./rust/target/release/process-simple-line
# {"id":"duyet","result":3}
```

2. Submit Spark + Rust as the following:
```bash
~/Downloads/spark-3.3.2-bin-hadoop3/bin/spark-submit \
--class "example.SparkRustSimpleLine" \
--master "local[*]" \
--files rust/target/release/process-simple-line \
target/scala-2.12/spark-example_2.12-0.1.0-SNAPSHOT.jar
```
