An open API service indexing awesome lists of open source software.

https://github.com/myxof/sparknotes

Spark 2.0学习笔记
https://github.com/myxof/sparknotes

distributed-computing spark spark-sql

Last synced: 10 months ago
JSON representation

Spark 2.0学习笔记

Awesome Lists containing this project

README

          

# SparkNotes

[![Build Status](https://travis-ci.com/MyXOF/SparkNotes.svg?branch=master)](https://travis-ci.com/MyXOF/SparkNotes)
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)

Spark 2.0学习笔记

主要结合Spark 2.3.2源码和《图解Spark 核心技术与案例实战》一书,记录对Spark系统的一些思考。

在阅读的过程中发现《图解Spark 核心技术与案例实战》一书中许多地方的描述和源码不符合,这里以实现源码为准。

作图工具推荐一下[ProcessOn](https://www.processon.com/)网站,非常不错

话不多说,请从[这里](https://github.com/MyXOF/SparkNotes/blob/master/doc/markdown/README.md)开始吧。

## 参考资料

[1]. Apache Spark. http://spark.apache.org/

[2]. 《图解Spark 核心技术与案例实战》. 郭景瞻著

[3]. Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing[C]// Usenix Conference on Networked Systems Design and Implementation. USENIX Association, 2012:2-2.