Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/databricks/learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
https://github.com/databricks/learningsparkv2

apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming

Last synced: about 21 hours ago
JSON representation

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Awesome Lists containing this project

README

        

# Learning Spark 2nd Edition

Welcome to the GitHub repo for Learning Spark 2nd Edition.

Chapters [2](chapter2/README.md), [3](chapter3/README.md), [6](chapter6/README.md), and [7](chapter7/README.md) contain stand-alone Spark applications. You can build all the JAR files for each chapter by running the Python script: `python build_jars.py`.
Or you can cd to the chapter directory and build jars as specified in each README. Also, include `$SPARK_HOME/bin` in `$PATH` so that you
don't have to prefix `SPARK_HOME/bin/spark-submit` for these standalone applications.

For all the other chapters, we have provided notebooks in the [notebooks](notebooks) folder. We have also included notebook equivalents for a few of the stand-alone Spark applications in the aforementioned chapters.

Have Fun, Cheers!