https://github.com/joomcode/spark-platform

Basic Spark utilities
https://github.com/joomcode/spark-platform

Last synced: 6 months ago
JSON representation

Basic Spark utilities

Host: GitHub
URL: https://github.com/joomcode/spark-platform
Owner: joomcode
License: mit
Created: 2022-08-25T13:20:26.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-02-20T14:34:08.000Z (11 months ago)
Last Synced: 2025-05-07T20:07:56.676Z (9 months ago)
Language: Scala
Size: 190 KB
Stars: 11
Watchers: 35
Forks: 4
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Joom Spark Platform

This repository is the home for foundational Spark tools used by the analytics Platform at Joom.

## Installation

The packages for Spark 3.2.2 with both Scala 2.12 and 2.13 are available from the maven central repository.

When using sbt:

```
libraryDependencies += "com.joom.spark" % "spark-platform_2.12" % "0.1.2"
```

When using Gradle:

```
implementation group: 'com.joom.spark', name: 'spark-platform_2.12', version: '0.1.2'
```

## Explicit repartitionioning

The `ExplicitRepartition` class is used to explicitly control which partition a given row
should be in - and is immune to hash collisions. The usage is as simple as
```
val rdf = df
.withColumn("desired_partition", ...your expression...)
.explicitRepartition(8, $"desired_partition")
```

Please see the [testcase](https://github.com/joomcode/spark-platform/blob/main/lib/src/test/scala/com/joom/spark/ExplicitRepartitionSpec.scala) for
the complete example, and for details and motivation, please see the companion blog post:
[Spark Partitioning: Full Control](https://medium.com/@vladimir.prus/spark-partitioning-full-control-3c72cea2d74d)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/joomcode/spark-platform

Awesome Lists containing this project

README