Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dimajix/spark-training

Repository used for Spark Trainings
https://github.com/dimajix/spark-training

hadoop hadoop-training hive pyspark python scala spark spark-ml spark-streaming spark-training sqoop

Last synced: 3 months ago
JSON representation

Repository used for Spark Trainings

Host: GitHub
URL: https://github.com/dimajix/spark-training
Owner: dimajix
Created: 2015-12-28T15:08:55.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2023-04-21T20:46:45.000Z (almost 2 years ago)
Last Synced: 2024-03-26T20:24:59.268Z (10 months ago)
Topics: hadoop, hadoop-training, hive, pyspark, python, scala, spark, spark-ml, spark-streaming, spark-training, sqoop
Language: Jupyter Notebook
Homepage: http://www.dimajix.de
Size: 9 MB
Stars: 53
Watchers: 5
Forks: 67
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Spark Training Repository

This repository contains many different examples, exercises and tutorials for Spark and Hadoop trainings performed
by dimajix. You can always find the latest version on GitHub at

https://github.com/dimajix/spark-training

## Contents

The repository contains different types of documents
* Source Code for Spark/Scala
* Jupyter Notebooks for PySpark
* Zeppelin Notebooks for Spark/Scala
* Hive SQL scripts
* Pig scripts
* ...and much more

## External Dependencies

Some notebooks require some test data provided by dimajix on S3 at s3://dimajix-training/data/.

## Building Executables

The source code can be built using Maven, simply by running

mvn install

from the root directory.

## Running Examples

Most code is either provided as interactive Notebooks (Jupyter and/or Zeppelin) or as compilable programs. Programs
which create jar files always contain start scripts, which take care of setting any environment variables and Spark
configuration properties.