Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/librity/rtjvm_spark_optimizations
Rock The JVM - Spark Optimizations with Scala
https://github.com/librity/rtjvm_spark_optimizations
optimization scala spark
Last synced: about 1 month ago
JSON representation
Rock The JVM - Spark Optimizations with Scala
- Host: GitHub
- URL: https://github.com/librity/rtjvm_spark_optimizations
- Owner: librity
- License: mit
- Created: 2022-11-13T19:16:29.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-11-18T03:10:57.000Z (about 2 years ago)
- Last Synced: 2024-11-10T20:14:49.147Z (3 months ago)
- Topics: optimization, scala, spark
- Language: Scala
- Homepage: https://rockthejvm.com/p/spark-optimization
- Size: 25.9 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Rock The JVM - Spark Optimizations with Scala
Master Spark optimization techniques with Scala.
- https://rockthejvm.com/p/spark-optimization
- https://github.com/rockthejvm/spark-optimization
- https://github.com/rockthejvm/spark-optimization/releases/tag/start## Certificate
![Certificate of Completion](.github/certificate.png)
## Sections
1. [Scala and Spark Recap](src/main/scala/section1)
2. [Spark Performance Foundations](src/main/scala/section2)
3. [Optimizing DataFrame Transformations](src/main/scala/section3)
4. [Optimizing RDD Transformations](src/main/scala/section4)
5. [Optimizing Key-Value RDDs](src/main/scala/section5)## Setup
### IntelliJ IDEA
Install IntelliJ IDEA with the Scala plugin.
- https://www.jetbrains.com/idea/
### Docker
Install Docker:
- https://docs.docker.com/desktop/install/ubuntu/
- https://docs.docker.com/engine/install/ubuntu/#set-up-the-repositoryBuild images:
```bash
$ cd spark-cluster
$ chmod +x build-images.sh
$ ./build-images.sh
```Start dockerized Spark cluster:
```bash
$ docker compose up --scale spark-worker=3
```Access each container:
```bash
# List active containers
$ docker ps
# Get a shell in any container
$ docker exec -it CONTAINER_NAME bash
```