https://github.com/joomcode/spark-platform
Basic Spark utilities
https://github.com/joomcode/spark-platform
Last synced: 6 months ago
JSON representation
Basic Spark utilities
- Host: GitHub
- URL: https://github.com/joomcode/spark-platform
- Owner: joomcode
- License: mit
- Created: 2022-08-25T13:20:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-20T14:34:08.000Z (11 months ago)
- Last Synced: 2025-05-07T20:07:56.676Z (9 months ago)
- Language: Scala
- Size: 190 KB
- Stars: 11
- Watchers: 35
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Joom Spark Platform
This repository is the home for foundational Spark tools used by the analytics Platform at Joom.
## Installation
The packages for Spark 3.2.2 with both Scala 2.12 and 2.13 are available from the maven central repository.
When using sbt:
```
libraryDependencies += "com.joom.spark" % "spark-platform_2.12" % "0.1.2"
```
When using Gradle:
```
implementation group: 'com.joom.spark', name: 'spark-platform_2.12', version: '0.1.2'
```
## Explicit repartitionioning
The `ExplicitRepartition` class is used to explicitly control which partition a given row
should be in - and is immune to hash collisions. The usage is as simple as
```
val rdf = df
.withColumn("desired_partition", ...your expression...)
.explicitRepartition(8, $"desired_partition")
```
Please see the [testcase](https://github.com/joomcode/spark-platform/blob/main/lib/src/test/scala/com/joom/spark/ExplicitRepartitionSpec.scala) for
the complete example, and for details and motivation, please see the companion blog post:
[Spark Partitioning: Full Control](https://medium.com/@vladimir.prus/spark-partitioning-full-control-3c72cea2d74d)