Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hammerlab/spark-util
low-level helpers for Apache Spark libraries and tests
https://github.com/hammerlab/spark-util
hadoop kryo scala spark
Last synced: 3 months ago
JSON representation
low-level helpers for Apache Spark libraries and tests
- Host: GitHub
- URL: https://github.com/hammerlab/spark-util
- Owner: hammerlab
- License: apache-2.0
- Created: 2016-11-13T16:29:40.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-12-29T00:18:51.000Z (about 6 years ago)
- Last Synced: 2024-09-29T07:04:09.091Z (3 months ago)
- Topics: hadoop, kryo, scala, spark
- Language: Scala
- Homepage:
- Size: 77.1 KB
- Stars: 16
- Watchers: 11
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spark-util
[![Build Status](https://travis-ci.org/hammerlab/spark-util.svg?branch=master)](https://travis-ci.org/hammerlab/spark-util)
[![Coverage Status](https://coveralls.io/repos/github/hammerlab/spark-util/badge.svg)](https://coveralls.io/github/hammerlab/spark-util)
[![Maven Central](https://img.shields.io/maven-central/v/org.hammerlab/spark-util_2.11.svg?maxAge=1800)](http://search.maven.org/#search%7Cga%7C1%7Cspark-util)Spark, Hadoop, and Kryo utilities
## Kryo registration
Classes that implement the [Registrar](src/main/scala/org/hammerlab/kryo/Registrar.scala) interface can use various shorthands for registering classes with Kryo.
Adapted from [RegistrationTest](src/test/scala/org/hammerlab/kryo/RegistrationTest.scala):```scala
register(
cls[A], // comes with an AlsoRegister that loops in other classes
arr[Foo], // register a class and an Array of that class
cls[B] → BSerializer(), // use a custom Serializer
CDRegistrar // register all of another Registrar's registrations
)
```- custom `Serializer`s and [`AlsoRegister`s](src/main/scala/org/hammerlab/kryo/AlsoRegister.scala) are picked up implicitly if not provided explicitly.
- `AlsoRegister`s are recursive, allowing for much easier and more robust accountability about what is registered and why, and ensurance that needed registrations aren't overlooked.## Configuration/Context wrappers
- [`Configuration`](src/main/scala/org/hammerlab/hadoop/Configuration.scala): serializable Hadoop-`Configuration` wrapper
- [`Context`](src/main/scala/org/hammerlab/spark/Context.scala): `SparkContext` wrapper that is also a Hadoop `Configuration`, for unification of "global configuration access" patterns
- [`Conf`](src/main/scala/org/hammerlab/spark/Conf.scala): load a `SparkConf` with settings from file(s) specified in the `SPARK_PROPERTIES_FILES` environment variable## Spark Configuration
- [`SparkConfBase`](src/main/scala/org/hammerlab/spark/SparkConfBase.scala): trait that brokers setting config key-values and creating a `SparkConf`
- many mix-ins for common spark-configuration groups:
- [kryo registration](src/main/scala/org/hammerlab/spark/confs/Kryo.scala)
- [dynamic allocation](src/main/scala/org/hammerlab/spark/confs/DynamicAllocation.scala)
- [event-logging](src/main/scala/org/hammerlab/spark/confs/EventLog.scala)
- [task-speculation](src/main/scala/org/hammerlab/spark/confs/Speculation.scala)## Misc
- [`KeyPartitioner`](src/main/scala/org/hammerlab/spark/KeyPartitioner.scala) / [`Partitioner`](src/main/scala/org/hammerlab/spark/Partitioner.scala): shorthands for common Spark-`Partitioner`-creation patterns
- from the first field of tuple-like objects
- from a partial function
- from a function
- [`Histogram` accumulator](src/main/scala/org/hammerlab/spark/accumulator/Histogram.scala)