Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sumitarora/awesome-spark
Apache Spark Awesome List
https://github.com/sumitarora/awesome-spark
List: awesome-spark
apache-spark spark spark-fundamentals spark-resources
Last synced: about 1 month ago
JSON representation
Apache Spark Awesome List
- Host: GitHub
- URL: https://github.com/sumitarora/awesome-spark
- Owner: sumitarora
- License: mit
- Created: 2016-04-05T03:50:36.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-04-17T16:17:37.000Z (over 8 years ago)
- Last Synced: 2024-04-29T21:19:41.903Z (8 months ago)
- Topics: apache-spark, spark, spark-fundamentals, spark-resources
- Homepage:
- Size: 10.7 KB
- Stars: 14
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: contributing.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-spark - Apache Spark Awesome List. (Other Lists / Monkey C Lists)
README
A curated list of [Apache Spark](http://spark.apache.org/) resources that developers may find useful. Focused on Apache Spark resources for different use cases. Ordered alphabetically in each category.
*Inspired by the Awesome thing.*
[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
[![Circle CI](https://circleci.com/gh/sumitarora/awesome-spark.svg?style=svg)](https://circleci.com/gh/sumitarora/awesome-spark)## Table of Contents
* [What is Spark](#what-is-spark-)
* [Books](#books)
* [Courses](#courses)
* [Links & Tutorials](#links-&-tutorials-)
* [Videos](#videos)# What is Spark?
Apache Spark is a cluster computing platform designed to be fast and general purpose engine for large-scale data processing.
# Why Spark?
* Spark supports wide range of diverse workflows including Map Reduce, Machine Learning, Graph processing etc.
* Apache Spark makes use of RDD (Resilient Distributed Dataset) the basic abstraction in Spark.
* RDDs are immutable, partitioned collection of elements that can be operated on in parallel
* Consists of Rich Standard Library
* Spark consists of API in many programming languages supported - Scala, Java, Python, R consists of Unified development and deployment environment for all
* Regardless of which programming language you are good at, be it Scala, Java, Python or R, you can use the same single clustered runtime environment for prototyping---
### Books
* [Databricks Spark Reference Applications](https://www.gitbook.com/book/databricks/databricks-spark-reference-applications/details)
* [Databricks Spark Knowledge Base](https://www.gitbook.com/book/databricks/databricks-spark-knowledge-base/details)
* [Getting Started with Apache Spark](https://www.mapr.com/ebooks/spark/)
* [Mastering Apache Spark](https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details)---
### Courses
* [Advanced Distributed Machine Learning with Spark](https://www.edx.org/course/advanced-distributed-machine-learning-uc-berkeleyx-cs125x)
* [Advanced Spark for Data Science and Data Engineering](https://www.edx.org/course/advanced-spark-data-science-data-uc-berkeleyx-cs115x)
* [Big Data Analysis with Spark](https://www.edx.org/course/big-data-analysis-spark-uc-berkeleyx-cs110x)
* [Distributed Machine Learning with Spark](https://www.edx.org/course/distributed-machine-learning-spark-uc-berkeleyx-cs120x)
* [Introduction to Spark](https://www.edx.org/course/introduction-spark-uc-berkeleyx-cs105x)
* [Spark Fundamentals I](http://bigdatauniversity.com/courses/spark-fundamentals/)
* [Spark Fundamentals II](http://bigdatauniversity.com/courses/spark-fundamentals-ii/)
* [Spark Mini Course](http://ampcamp.berkeley.edu/big-data-mini-course/)
* [Spark Overview](http://bigdatauniversity.com/courses/spark-overview/)
* [Spark Programming with Python](http://bigdatauniversity.com/courses/spark-programming-with-python/)
* [Spark Summit Training](https://databricks-training.s3.amazonaws.com/index.html)---
### Links & Tutorials
* [Data Scientists Guide](https://github.com/Jay-Oh-eN/data-scientists-guide-apache-spark)
* [Intro to Apache Spark](http://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf)
* [Spark CLI - AmpCmp](http://ampcamp.berkeley.edu/3/exercises/index.html)
* [Spark Internals](https://github.com/JerryLead/SparkInternals)
* [Spark RDD Examples](http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html)
* [Spark Resources](https://wegetsignal.wordpress.com/2015/02/25/spark-resources/)
* [Spark Tutorial](http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html)
* [Sparkhub](https://sparkhub.databricks.com/)---
### Tools
* [Deep Spark](https://github.com/Stratio/deep-spark)
* [Developer Resources](https://databricks.com/spark/developer-resources)
* [FiloDB](https://github.com/tuplejump/FiloDB)
* [Spark Cookbook](https://github.com/clearstorydata-cookbooks/apache_spark)
* [Spark Indexedrdd](https://github.com/amplab/spark-indexedrdd)
* [Spark OpenTSDB](https://github.com/achak1987/opentsdb-spark)
* [Spark Packages](http://spark-packages.org/)
* [Spark Timeseries](https://github.com/sryza/spark-timeseries)
* [Sparkle](https://github.com/tweag/sparkle)
* [Sparkling](https://github.com/gorillalabs/sparkling)---
### Videos
* [Databricks - Channel](https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA)
* [Edureka - Apache Spark & Scala Tutorial](https://www.youtube.com/watch?v=7k_9sdTOdX4&list=PL9ooVrP1hQOGyFc60sExNX1qBWJyV5IMb)
* [Getting Started - Apache Spark](https://www.youtube.com/playlist?list=PLf0swTFhTI8rjBS9zJGReO1IWLf7Lpi7g&nohtml5=False)
* [The Apache Spark - Channel](https://www.youtube.com/user/TheApacheSpark)