https://github.com/waylau/apache-spark-tutorial

Apache Spark Tutorial.《跟老卫学Apache Spark》
https://github.com/waylau/apache-spark-tutorial
Last synced: 8 months ago
JSON representation
Apache Spark Tutorial.《跟老卫学Apache Spark》
Host: GitHub
URL: https://github.com/waylau/apache-spark-tutorial
Owner: waylau
Created: 2021-07-12T15:09:13.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-11-11T00:35:10.000Z (11 months ago)
Last Synced: 2025-01-06T06:13:00.002Z (9 months ago)
Size: 241 KB
Stars: 22
Watchers: 3
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Apache Spark Tutorial.《跟老卫学Apache Spark开发》《循序渐进Spark大数据应用开发》源码

![](images/spark-logo-trademark.png)

*Apache Spark Tutorial*, is a book about how to develop Apache Spark applications.

《跟老卫学Apache Spark开发》是一本 Apache Spark 应用开发的开源学习教程，主要介绍如何从0开始开发 Apache Spark 应用。本书包括最新版本 Apache Spark 3.x 中的新特性。图文并茂，并通过大量实例带你走近 Apache Spark 的世界！

本书业余时间所著，水平有限、时间紧张，难免疏漏，欢迎指正，

## Summary 目录

* [Spark下载、安装](https://developer.huawei.com/consumer/cn/forum/topic/0202568822299090741?fid=23)

* [Spark应用初探](https://developer.huawei.com/consumer/cn/forum/topic/0201568823403320732?fid=23)

* [Spark累加器LongAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622461925310080?fid=23)

* [Spark累加器DoubleAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622590853530085?fid=23)

* [Spark累加器CollectionAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622591182960086?fid=23)

* [启动Spark应用的方式](https://developer.huawei.com/consumer/cn/forum/topic/0202623507783170122?fid=23)

* [Spark广播变量](https://developer.huawei.com/consumer/cn/forum/topic/0202624224916630149?fid=23)

* [Spark RDD入门](https://developer.huawei.com/consumer/cn/forum/topic/0201624386890690172?fid=23)

* [Spark RDD基本操作](https://developer.huawei.com/consumer/cn/forum/topic/0201627152644060234?fid=23)

* [Spark RDD Shuffle操作](https://developer.huawei.com/consumer/cn/forum/topic/0202627152820110215?fid=23)

* [深入理解Spark RDD原理](https://developer.huawei.com/consumer/cn/forum/topic/0202628556358740265?fid=23)

* [Spark调度管理之资源分配](https://developer.huawei.com/consumer/cn/forum/topic/0202629577348060308?fid=23)

* [Spark调度管理之作业调度](https://developer.huawei.com/consumer/cn/forum/topic/0201629622395410333?fid=23)

* [Spark SQL概述](https://developer.huawei.com/consumer/cn/forum/topic/0202630480491580330?fid=23)

* [Spark SQL之Dataset与DataFrame](https://developer.huawei.com/consumer/cn/forum/topic/0202630480727520331?fid=23)

* [Spark SQL之DataFrame入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633012983700432?fid=23)

* [Spark SQL之Dataset入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633040938970437?fid=23)

* [Spark SQL之基于DataFrame创建临时视图](https://developer.huawei.com/consumer/cn/forum/topic/0202633194774890394?fid=23)

* [Spark SQL之RDD转为Dataset](https://developer.huawei.com/consumer/cn/forum/topic/0201633208926640450?fid=23)

* [Apache Parquet列式存储格式介绍](https://waylau.com/about-apache-parquet/)

* [Spark SQL之Apache Parquet数据源的读取和写入](https://developer.huawei.com/consumer/cn/forum/topic/0202634018676920418?fid=23)

* [Apache Hive数据仓库介绍](https://developer.huawei.com/consumer/cn/forum/topic/0201634752549850505?fid=23)

* [Spark SQL之使用Apache Hive](https://developer.huawei.com/consumer/cn/forum/topic/0202635471716910045?fid=23)

* [Spark SQL之使用JDBC操作数据库](https://developer.huawei.com/consumer/cn/forum/topic/0202635607847820058?fid=23)

* [Spark SQL之读取二进制文件](https://developer.huawei.com/consumer/cn/forum/topic/0202635626764400066?fid=23)

* [Spark导出数据到CSV文件](https://developer.huawei.com/consumer/cn/forum/topic/0202620883150950010?fid=23)

* [Spark SQL之时区处理](https://developer.huawei.com/consumer/cn/forum/topic/0202665874275260083?fid=23)

* [Spark Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202636427881730132?fid=23)

* [Spark Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201639135765210068?fid=23)

* [Spark Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0202639686793340267?fid=23)

* [Spark Structured Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202639990757790283?fid=23)

* [Spark Structured Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201640617749310121?fid=23)

* [Spark Structured Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0201647684921030332?fid=23)

* [在Spark中自定义Log4j配置](https://developer.huawei.com/consumer/cn/forum/topic/0201647777007740340?fid=23)

* [Spark MLlib机器学习库概述](https://developer.huawei.com/consumer/cn/forum/topic/0201648414415760370?fid=23)

* [Spark MLlib之ML Pipeline详解](https://developer.huawei.com/consumer/cn/forum/topic/0202652669139340720?fid=23)

* [Spark MLlib之Estimator、Transformer和Param使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0201648630447880382?fid=23)

* [Spark MLlib之ML Pipeline使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0202648630694530630?fid=23)

* [Spark GraphX图计算处理概述](https://developer.huawei.com/consumer/cn/forum/topic/0202652669536950721?fid=23)

* [Spark GraphX图计算示例](https://developer.huawei.com/consumer/cn/forum/topic/0201652741940200499?fid=23)

* [spark-shell启动报错“WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped”的解决](https://developer.huawei.com/consumer/cn/forum/topic/0204726396055740595?fid=23)

* [Spark集群部署之集群概述](https://developer.huawei.com/consumer/cn/forum/topic/0203729942975270557?fid=23)

* [Spark集群之提交应用到集群](https://developer.huawei.com/consumer/cn/forum/topic/0203729943247780558?fid=23)

* [Spark集群之使用Standalone模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0204730620151950827?fid=23)

* [Spark集群之Standalone模式集群下的高可用方案](https://developer.huawei.com/consumer/cn/forum/topic/0204730620408550828?fid=23)

* [Spark系列044——Spark集群之使用YARN模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0203732228615380806?fid=23)

* [Spark系列045——“java.lang.NoClassDefFoundError”问题的解决

](https://developer.huawei.com/consumer/cn/forum/topic/0201775600270330248?fid=23)

* 未完待续...

## Samples 示例

* [Spark累加器LongAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/LongAccumulatorSample.java)

* [Spark累加器DoubleAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/DoubleAccumulatorSample.java)

* [Spark累加器CollectionAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/CollectionAccumulatorSample.java)

* [SparkLauncher示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/SparkLauncherSample.java)

* [InProcessLauncherSample示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/InProcessLauncher.java)

* [Broadcast 示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/broadcast/BroadcastSample.java)

* [RDD基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicSample.java)

* [RDD Transformation和Action基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicOperationSample.java)

* [DataFrame基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameBasicExample.java)

* [Dataset基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetBasicExample.java)

* [基于DataFrame创建临时视图](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameTempViewExample.java)

* [RDD转为Dataset](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetSchemaExample.java)

* [Apache Parquet数据源的读取和写入](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceParquetExample.java)

* [使用Apache Hive](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceHiveExample.java)

* [使用JDBC操作数据库](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceJDBCExample.java)

* [读取二进制文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceBinaryFile.java)

* [Spark导出数据到CSV文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/WriteCVSExample.java)

* [Spark SQL时区处理](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/TimeZoneExample.java)

* [Spark Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingSocketSample.java)

* [Spark Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingWimdowSample.java)

* [Structured Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingSocketSample.java)

* [Structured Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingWindowSample.java)

* [Estimator、Transformer和Param使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/EstimatorTransformerParamExample.java)

* [ML Pipeline使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/PipelineExample.java)

* [GraphX图计算示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddGraphXSample.java)

* 未完待续...

## Get start 如何开始阅读

选择下面入口之一：

* 

* 

## Code 源码

书中所有示例源码，移步至的 `samples` 目录下，代码遵循《[Java 编码规范]()》

## Book 配套书籍

如果你喜欢本开源书，也欢迎支持下该书的正式出版物，实体店及各大网店有售。

* [《循序渐进Spark大数据应用开发》](https://waylau.com/about-harmonyos-mobile-application-development-book)（清华大学出版社）

  * [京东](https://search.jd.com/Search?keyword=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&enc=utf-8&wq=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&pvid=32d2112ca641476d9fc5323cf6113f60)

  * [当当](https://search.jd.com/Search?keyword=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&enc=utf-8&wq=%E5%BE%AA%E5%BA%8F%E6%B8%90%E8%BF%9BSpark%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91&pvid=90f7a002994847d08196d4d3e77761a1)

## Issue 意见、建议

如有勘误、意见或建议欢迎拍砖 

## Contact 联系作者

* Blog: [waylau.com](http://waylau.com)

* Gmail: [waylau521(at)gmail.com](mailto:waylau521@gmail.com)

* Weibo: [waylau521](http://weibo.com/waylau521)

* Twitter: [waylau521](https://twitter.com/waylau521)

* Github : [waylau](https://github.com/waylau)

## Support Me 请老卫喝一杯

![开源捐赠](https://waylau.com/images/showmethemoney-sm.jpg)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/waylau/apache-spark-tutorial

Awesome Lists containing this project

README