https://github.com/hammerlab/spark-tests
Utilities for writing tests that use Apache Spark.
https://github.com/hammerlab/spark-tests
Last synced: about 1 year ago
JSON representation
Utilities for writing tests that use Apache Spark.
- Host: GitHub
- URL: https://github.com/hammerlab/spark-tests
- Owner: hammerlab
- License: apache-2.0
- Created: 2016-11-13T17:28:38.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-12-29T00:18:46.000Z (over 7 years ago)
- Last Synced: 2025-04-04T09:44:40.817Z (about 1 year ago)
- Language: Scala
- Homepage:
- Size: 96.7 KB
- Stars: 24
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spark-tests
[](https://travis-ci.org/hammerlab/spark-tests)
[](https://coveralls.io/github/hammerlab/spark-tests)
[](http://search.maven.org/#search%7Cga%7C1%7Cspark-tests)
Utilities for writing tests that use Apache Spark.
## [`SparkSuite`](https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/hammerlab/spark/test/suite/SparkSuite.scala): a `SparkContext` for each test suite
Add configuration options in subclasses using `sparkConf(…)`, cf. [`KryoSparkSuite`][]:
```scala
sparkConf(
// Register this class as its own KryoRegistrator
"spark.kryo.registrator" → getClass.getCanonicalName,
"spark.serializer" → "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.referenceTracking" → referenceTracking.toString,
"spark.kryo.registrationRequired" → registrationRequired.toString
)
```
### [`PerCaseSuite`](https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/hammerlab/spark/test/suite/PerCaseSuite.scala): `SparkContext` for each test case
## [`KryoSparkSuite`][]
`SparkSuite` implementation that provides hooks for kryo-registration:
```scala
register(
classOf[Foo],
"org.foo.Bar",
classOf[Bar] → new BarSerializer
)
```
Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. [hammerlab/guacamole](https://github.com/hammerlab/guacamole/blob/9d330aeb3a7a040c174b851511f19b42d7717508/src/test/scala/org/hammerlab/guacamole/util/GuacFunSuite.scala) and [hammerlab/pageant](https://github.com/ryan-williams/pageant/blob/d063db292cad3c68222c38c964d7dda3c7258720/src/test/scala/org/hammerlab/pageant/utils/PageantSuite.scala) for examples.
## Miscellaneous RDD / Job / Stage utilities
- [`rdd.Util`](https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/hammerlab/spark/test/rdd/Util.scala): make an RDD with specific elements in specific partitions.
- [`NumJobsUtil`](https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/apache/spark/scheduler/test/NumJobsUtil.scala): verify the number of Spark jobs that have been run.
- [`RDDSerialization`](https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/hammerlab/spark/test/rdd/RDDSerialization.scala): interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.
[`KryoSparkSuite`]: https://github.com/hammerlab/spark-tests/blob/master/src/main/scala/org/hammerlab/spark/test/suite/KryoSparkSuite.scala