An open API service indexing awesome lists of open source software.

https://github.com/ceteri/intro_spark

Code examples supporting the "Introduction to Apache Spark" video published by O'Reilly Media
https://github.com/ceteri/intro_spark

Last synced: about 1 year ago
JSON representation

Code examples supporting the "Introduction to Apache Spark" video published by O'Reilly Media

Awesome Lists containing this project

README

          

Introduction to Apache Spark
============================

The material here supports the O'Reilly Media video by Paco Nathan:
[Introduction to Apache Spark](http://shop.oreilly.com/product/0636920036807.do)

Please see the code examples in the `src` directory here, which are numbered
in the sequence used in the video.

This material assumes that you have downloaded a pre-compiled version of
Apache Spark on your laptop from http://spark.apache.org/downloads.html

Outline
-------

* Pre-Flight Check
* Spark Deconstructed: Log Mining Example
* Word Count
* Join
* Coding Exercise
* Pi Approximation
* Spark Streaming example
* Network Word Count in Python
* Network Word Count in Python -- Stateful
* GraphX example
* build/run SimpleApp.java with Maven
* build/run SimpleApp.scala with SBT

Updates
-------

See the `bikeshare` directory for the Spark 1.3 update, showing DataFrames,
MLlib, and GraphX with examples based on Capital Bikeshare data.

---

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/