Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mrpowers/great-spark

Curated collection of Spark libraries and example applications
https://github.com/mrpowers/great-spark

Last synced: about 1 month ago
JSON representation

Curated collection of Spark libraries and example applications

Awesome Lists containing this project

README

        

# great-spark

Curated collection of Spark libraries and example applications.

## Scala

Scala apps should use the [sbt](https://github.com/sbt/sbt) build tool, [spark-daria](https://github.com/MrPowers/spark-daria/) helper methods, and [spark-fast-tests](https://github.com/MrPowers/spark-fast-tests/) for unit testing.

[spark-sbt.g8](https://github.com/MrPowers/spark-sbt.g8) makes it easy to create a new Spark application with SBT.

## PySpark

PySpark apps should use the [quinn](https://github.com/MrPowers/quinn/) helper methods and the [chispa](https://github.com/MrPowers/chispa) test helper library.

## Data storage

* [Delta Lake](https://github.com/delta-io/delta) is a good data store for certain query patterns.
* Parquet
* Snowflake
* memsql
* Redshift
* Postgres
* Cassandra
* Iceberg

## Data unit testing

## String similarity / phonetic algorithms