Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimajix/spark-training
Repository used for Spark Trainings
https://github.com/dimajix/spark-training
hadoop hadoop-training hive pyspark python scala spark spark-ml spark-streaming spark-training sqoop
Last synced: about 2 months ago
JSON representation
Repository used for Spark Trainings
- Host: GitHub
- URL: https://github.com/dimajix/spark-training
- Owner: dimajix
- Created: 2015-12-28T15:08:55.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2023-04-21T20:46:45.000Z (over 1 year ago)
- Last Synced: 2024-03-26T20:24:59.268Z (9 months ago)
- Topics: hadoop, hadoop-training, hive, pyspark, python, scala, spark, spark-ml, spark-streaming, spark-training, sqoop
- Language: Jupyter Notebook
- Homepage: http://www.dimajix.de
- Size: 9 MB
- Stars: 53
- Watchers: 5
- Forks: 67
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark Training Repository
This repository contains many different examples, exercises and tutorials for Spark and Hadoop trainings performed
by dimajix. You can always find the latest version on GitHub athttps://github.com/dimajix/spark-training
## Contents
The repository contains different types of documents
* Source Code for Spark/Scala
* Jupyter Notebooks for PySpark
* Zeppelin Notebooks for Spark/Scala
* Hive SQL scripts
* Pig scripts
* ...and much more## External Dependencies
Some notebooks require some test data provided by dimajix on S3 at s3://dimajix-training/data/.
## Building Executables
The source code can be built using Maven, simply by running
mvn install
from the root directory.
## Running Examples
Most code is either provided as interactive Notebooks (Jupyter and/or Zeppelin) or as compilable programs. Programs
which create jar files always contain start scripts, which take care of setting any environment variables and Spark
configuration properties.