Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sumitarora/awesome-spark

Apache Spark Awesome List
https://github.com/sumitarora/awesome-spark

List: awesome-spark

apache-spark spark spark-fundamentals spark-resources

Last synced: about 1 month ago
JSON representation

Apache Spark Awesome List

Awesome Lists containing this project

README

        

spark

A curated list of [Apache Spark](http://spark.apache.org/) resources that developers may find useful. Focused on Apache Spark resources for different use cases. Ordered alphabetically in each category.

*Inspired by the Awesome thing.*

[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
[![Circle CI](https://circleci.com/gh/sumitarora/awesome-spark.svg?style=svg)](https://circleci.com/gh/sumitarora/awesome-spark)

## Table of Contents
* [What is Spark](#what-is-spark-)
* [Books](#books)
* [Courses](#courses)
* [Links & Tutorials](#links-&-tutorials-)
* [Videos](#videos)

# What is Spark?

Apache Spark is a cluster computing platform designed to be fast and general purpose engine for large-scale data processing.

# Why Spark?

* Spark supports wide range of diverse workflows including Map Reduce, Machine Learning, Graph processing etc.
* Apache Spark makes use of RDD (Resilient Distributed Dataset) the basic abstraction in Spark.
* RDDs are immutable, partitioned collection of elements that can be operated on in parallel
* Consists of Rich Standard Library
* Spark consists of API in many programming languages supported - Scala, Java, Python, R consists of Unified development and deployment environment for all
* Regardless of which programming language you are good at, be it Scala, Java, Python or R, you can use the same single clustered runtime environment for prototyping

---

### Books
* [Databricks Spark Reference Applications](https://www.gitbook.com/book/databricks/databricks-spark-reference-applications/details)
* [Databricks Spark Knowledge Base](https://www.gitbook.com/book/databricks/databricks-spark-knowledge-base/details)
* [Getting Started with Apache Spark](https://www.mapr.com/ebooks/spark/)
* [Mastering Apache Spark](https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details)

---

### Courses
* [Advanced Distributed Machine Learning with Spark](https://www.edx.org/course/advanced-distributed-machine-learning-uc-berkeleyx-cs125x)
* [Advanced Spark for Data Science and Data Engineering](https://www.edx.org/course/advanced-spark-data-science-data-uc-berkeleyx-cs115x)
* [Big Data Analysis with Spark](https://www.edx.org/course/big-data-analysis-spark-uc-berkeleyx-cs110x)
* [Distributed Machine Learning with Spark](https://www.edx.org/course/distributed-machine-learning-spark-uc-berkeleyx-cs120x)
* [Introduction to Spark](https://www.edx.org/course/introduction-spark-uc-berkeleyx-cs105x)
* [Spark Fundamentals I](http://bigdatauniversity.com/courses/spark-fundamentals/)
* [Spark Fundamentals II](http://bigdatauniversity.com/courses/spark-fundamentals-ii/)
* [Spark Mini Course](http://ampcamp.berkeley.edu/big-data-mini-course/)
* [Spark Overview](http://bigdatauniversity.com/courses/spark-overview/)
* [Spark Programming with Python](http://bigdatauniversity.com/courses/spark-programming-with-python/)
* [Spark Summit Training](https://databricks-training.s3.amazonaws.com/index.html)

---

### Links & Tutorials
* [Data Scientists Guide](https://github.com/Jay-Oh-eN/data-scientists-guide-apache-spark)
* [Intro to Apache Spark](http://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf)
* [Spark CLI - AmpCmp](http://ampcamp.berkeley.edu/3/exercises/index.html)
* [Spark Internals](https://github.com/JerryLead/SparkInternals)
* [Spark RDD Examples](http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html)
* [Spark Resources](https://wegetsignal.wordpress.com/2015/02/25/spark-resources/)
* [Spark Tutorial](http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html)
* [Sparkhub](https://sparkhub.databricks.com/)

---

### Tools
* [Deep Spark](https://github.com/Stratio/deep-spark)
* [Developer Resources](https://databricks.com/spark/developer-resources)
* [FiloDB](https://github.com/tuplejump/FiloDB)
* [Spark Cookbook](https://github.com/clearstorydata-cookbooks/apache_spark)
* [Spark Indexedrdd](https://github.com/amplab/spark-indexedrdd)
* [Spark OpenTSDB](https://github.com/achak1987/opentsdb-spark)
* [Spark Packages](http://spark-packages.org/)
* [Spark Timeseries](https://github.com/sryza/spark-timeseries)
* [Sparkle](https://github.com/tweag/sparkle)
* [Sparkling](https://github.com/gorillalabs/sparkling)

---

### Videos
* [Databricks - Channel](https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA)
* [Edureka - Apache Spark & Scala Tutorial](https://www.youtube.com/watch?v=7k_9sdTOdX4&list=PL9ooVrP1hQOGyFc60sExNX1qBWJyV5IMb)
* [Getting Started - Apache Spark](https://www.youtube.com/playlist?list=PLf0swTFhTI8rjBS9zJGReO1IWLf7Lpi7g&nohtml5=False)
* [The Apache Spark - Channel](https://www.youtube.com/user/TheApacheSpark)