Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/databricks/learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
https://github.com/databricks/learningsparkv2
apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming
Last synced: about 21 hours ago
JSON representation
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
- Host: GitHub
- URL: https://github.com/databricks/learningsparkv2
- Owner: databricks
- License: apache-2.0
- Created: 2019-02-10T05:17:50.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2025-01-28T04:30:40.000Z (5 days ago)
- Last Synced: 2025-02-01T07:41:44.619Z (about 22 hours ago)
- Topics: apache-spark, delta-lake, mlflow, mllib, spark, spark-mllib, spark-sql, structured-streaming
- Language: Scala
- Homepage: https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/
- Size: 75.2 MB
- Stars: 1,240
- Watchers: 41
- Forks: 750
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Learning Spark 2nd Edition
Welcome to the GitHub repo for Learning Spark 2nd Edition.
Chapters [2](chapter2/README.md), [3](chapter3/README.md), [6](chapter6/README.md), and [7](chapter7/README.md) contain stand-alone Spark applications. You can build all the JAR files for each chapter by running the Python script: `python build_jars.py`.
Or you can cd to the chapter directory and build jars as specified in each README. Also, include `$SPARK_HOME/bin` in `$PATH` so that you
don't have to prefix `SPARK_HOME/bin/spark-submit` for these standalone applications.For all the other chapters, we have provided notebooks in the [notebooks](notebooks) folder. We have also included notebook equivalents for a few of the stand-alone Spark applications in the aforementioned chapters.
Have Fun, Cheers!