Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrpowers/great-spark
Curated collection of Spark libraries and example applications
https://github.com/mrpowers/great-spark
Last synced: about 1 month ago
JSON representation
Curated collection of Spark libraries and example applications
- Host: GitHub
- URL: https://github.com/mrpowers/great-spark
- Owner: MrPowers
- Created: 2020-10-06T10:37:46.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-10-06T11:40:45.000Z (about 4 years ago)
- Last Synced: 2024-10-12T00:12:25.353Z (3 months ago)
- Size: 0 Bytes
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# great-spark
Curated collection of Spark libraries and example applications.
## Scala
Scala apps should use the [sbt](https://github.com/sbt/sbt) build tool, [spark-daria](https://github.com/MrPowers/spark-daria/) helper methods, and [spark-fast-tests](https://github.com/MrPowers/spark-fast-tests/) for unit testing.
[spark-sbt.g8](https://github.com/MrPowers/spark-sbt.g8) makes it easy to create a new Spark application with SBT.
## PySpark
PySpark apps should use the [quinn](https://github.com/MrPowers/quinn/) helper methods and the [chispa](https://github.com/MrPowers/chispa) test helper library.
## Data storage
* [Delta Lake](https://github.com/delta-io/delta) is a good data store for certain query patterns.
* Parquet
* Snowflake
* memsql
* Redshift
* Postgres
* Cassandra
* Iceberg## Data unit testing
## String similarity / phonetic algorithms