Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/waylau/apache-spark-tutorial
Apache Spark Tutorial.《跟老卫学Apache Spark》
https://github.com/waylau/apache-spark-tutorial
Last synced: about 2 months ago
JSON representation
Apache Spark Tutorial.《跟老卫学Apache Spark》
- Host: GitHub
- URL: https://github.com/waylau/apache-spark-tutorial
- Owner: waylau
- Created: 2021-07-12T15:09:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-12T07:22:40.000Z (7 months ago)
- Last Synced: 2024-10-12T06:47:32.824Z (3 months ago)
- Size: 238 KB
- Stars: 20
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache Spark Tutorial.《跟老卫学Apache Spark开发》
![](images/spark-logo-trademark.png)
*Apache Spark Tutorial*, is a book about how to develop Apache Spark applications.
《跟老卫学Apache Spark开发》是一本 Apache Spark 应用开发的开源学习教程,主要介绍如何从0开始开发 Apache Spark 应用。本书包括最新版本 Apache Spark 3.x 中的新特性。图文并茂,并通过大量实例带你走近 Apache Spark 的世界!
本书业余时间所著,水平有限、时间紧张,难免疏漏,欢迎指正,
## Summary 目录
* [Spark下载、安装](https://developer.huawei.com/consumer/cn/forum/topic/0202568822299090741?fid=23)
* [Spark应用初探](https://developer.huawei.com/consumer/cn/forum/topic/0201568823403320732?fid=23)
* [Spark累加器LongAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622461925310080?fid=23)
* [Spark累加器DoubleAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622590853530085?fid=23)
* [Spark累加器CollectionAccumulator的使用](https://developer.huawei.com/consumer/cn/forum/topic/0202622591182960086?fid=23)
* [启动Spark应用的方式](https://developer.huawei.com/consumer/cn/forum/topic/0202623507783170122?fid=23)
* [Spark广播变量](https://developer.huawei.com/consumer/cn/forum/topic/0202624224916630149?fid=23)
* [Spark RDD入门](https://developer.huawei.com/consumer/cn/forum/topic/0201624386890690172?fid=23)
* [Spark RDD基本操作](https://developer.huawei.com/consumer/cn/forum/topic/0201627152644060234?fid=23)
* [Spark RDD Shuffle操作](https://developer.huawei.com/consumer/cn/forum/topic/0202627152820110215?fid=23)
* [深入理解Spark RDD原理](https://developer.huawei.com/consumer/cn/forum/topic/0202628556358740265?fid=23)
* [Spark调度管理之资源分配](https://developer.huawei.com/consumer/cn/forum/topic/0202629577348060308?fid=23)
* [Spark调度管理之作业调度](https://developer.huawei.com/consumer/cn/forum/topic/0201629622395410333?fid=23)
* [Spark SQL概述](https://developer.huawei.com/consumer/cn/forum/topic/0202630480491580330?fid=23)
* [Spark SQL之Dataset与DataFrame](https://developer.huawei.com/consumer/cn/forum/topic/0202630480727520331?fid=23)
* [Spark SQL之DataFrame入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633012983700432?fid=23)
* [Spark SQL之Dataset入门操作](https://developer.huawei.com/consumer/cn/forum/topic/0201633040938970437?fid=23)
* [Spark SQL之基于DataFrame创建临时视图](https://developer.huawei.com/consumer/cn/forum/topic/0202633194774890394?fid=23)
* [Spark SQL之RDD转为Dataset](https://developer.huawei.com/consumer/cn/forum/topic/0201633208926640450?fid=23)
* [Apache Parquet列式存储格式介绍](https://waylau.com/about-apache-parquet/)
* [Spark SQL之Apache Parquet数据源的读取和写入](https://developer.huawei.com/consumer/cn/forum/topic/0202634018676920418?fid=23)
* [Apache Hive数据仓库介绍](https://developer.huawei.com/consumer/cn/forum/topic/0201634752549850505?fid=23)
* [Spark SQL之使用Apache Hive](https://developer.huawei.com/consumer/cn/forum/topic/0202635471716910045?fid=23)
* [Spark SQL之使用JDBC操作数据库](https://developer.huawei.com/consumer/cn/forum/topic/0202635607847820058?fid=23)
* [Spark SQL之读取二进制文件](https://developer.huawei.com/consumer/cn/forum/topic/0202635626764400066?fid=23)
* [Spark导出数据到CSV文件](https://developer.huawei.com/consumer/cn/forum/topic/0202620883150950010?fid=23)
* [Spark SQL之时区处理](https://developer.huawei.com/consumer/cn/forum/topic/0202665874275260083?fid=23)
* [Spark Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202636427881730132?fid=23)
* [Spark Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201639135765210068?fid=23)
* [Spark Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0202639686793340267?fid=23)
* [Spark Structured Streaming概述](https://developer.huawei.com/consumer/cn/forum/topic/0202639990757790283?fid=23)
* [Spark Structured Streaming统计来自Socket数据流的词频](https://developer.huawei.com/consumer/cn/forum/topic/0201640617749310121?fid=23)
* [Spark Structured Streaming窗口操作](https://developer.huawei.com/consumer/cn/forum/topic/0201647684921030332?fid=23)
* [在Spark中自定义Log4j配置](https://developer.huawei.com/consumer/cn/forum/topic/0201647777007740340?fid=23)
* [Spark MLlib机器学习库概述](https://developer.huawei.com/consumer/cn/forum/topic/0201648414415760370?fid=23)
* [Spark MLlib之ML Pipeline详解](https://developer.huawei.com/consumer/cn/forum/topic/0202652669139340720?fid=23)
* [Spark MLlib之Estimator、Transformer和Param使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0201648630447880382?fid=23)
* [Spark MLlib之ML Pipeline使用示例](https://developer.huawei.com/consumer/cn/forum/topic/0202648630694530630?fid=23)
* [Spark GraphX图计算处理概述](https://developer.huawei.com/consumer/cn/forum/topic/0202652669536950721?fid=23)
* [Spark GraphX图计算示例](https://developer.huawei.com/consumer/cn/forum/topic/0201652741940200499?fid=23)
* [spark-shell启动报错“WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped”的解决](https://developer.huawei.com/consumer/cn/forum/topic/0204726396055740595?fid=23)
* [Spark集群部署之集群概述](https://developer.huawei.com/consumer/cn/forum/topic/0203729942975270557?fid=23)
* [Spark集群之提交应用到集群](https://developer.huawei.com/consumer/cn/forum/topic/0203729943247780558?fid=23)
* [Spark集群之使用Standalone模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0204730620151950827?fid=23)
* [Spark集群之Standalone模式集群下的高可用方案](https://developer.huawei.com/consumer/cn/forum/topic/0204730620408550828?fid=23)
* [Spark系列044——Spark集群之使用YARN模式部署集群](https://developer.huawei.com/consumer/cn/forum/topic/0203732228615380806?fid=23)
* [Spark系列045——“java.lang.NoClassDefFoundError”问题的解决
](https://developer.huawei.com/consumer/cn/forum/topic/0201775600270330248?fid=23)
* 未完待续...## Samples 示例
* [Spark累加器LongAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/LongAccumulatorSample.java)
* [Spark累加器DoubleAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/DoubleAccumulatorSample.java)
* [Spark累加器CollectionAccumulator的使用](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/util/CollectionAccumulatorSample.java)
* [SparkLauncher示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/SparkLauncherSample.java)
* [InProcessLauncherSample示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/launcher/InProcessLauncher.java)
* [Broadcast 示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/broadcast/BroadcastSample.java)
* [RDD基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicSample.java)
* [RDD Transformation和Action基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddBasicOperationSample.java)
* [DataFrame基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameBasicExample.java)
* [Dataset基本操作示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetBasicExample.java)
* [基于DataFrame创建临时视图](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataFrameTempViewExample.java)
* [RDD转为Dataset](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DatasetSchemaExample.java)
* [Apache Parquet数据源的读取和写入](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceParquetExample.java)
* [使用Apache Hive](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceHiveExample.java)
* [使用JDBC操作数据库](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceJDBCExample.java)
* [读取二进制文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/DataSourceBinaryFile.java)
* [Spark导出数据到CSV文件](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/WriteCVSExample.java)
* [Spark SQL时区处理](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/TimeZoneExample.java)
* [Spark Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingSocketSample.java)
* [Spark Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/streaming/SparkStreamingWimdowSample.java)
* [Structured Streaming统计来自Socket数据流的词频](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingSocketSample.java)
* [Structured Streaming窗口操作](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/sql/streaming/StructuredStreamingWindowSample.java)
* [Estimator、Transformer和Param使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/EstimatorTransformerParamExample.java)
* [ML Pipeline使用示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/ml/PipelineExample.java)
* [GraphX图计算示例](samples/spark-java-samples/src/main/java/com/waylau/spark/java/samples/rdd/JavaRddGraphXSample.java)
* 未完待续...## Get start 如何开始阅读
选择下面入口之一:
*
*## Code 源码
书中所有示例源码,移步至的 `samples` 目录下,代码遵循《[Java 编码规范]()》
## Issue 意见、建议
如有勘误、意见或建议欢迎拍砖
## Contact 联系作者
* Blog: [waylau.com](http://waylau.com)
* Gmail: [waylau521(at)gmail.com](mailto:[email protected])
* Weibo: [waylau521](http://weibo.com/waylau521)
* Twitter: [waylau521](https://twitter.com/waylau521)
* Github : [waylau](https://github.com/waylau)## Support Me 请老卫喝一杯
![开源捐赠](https://waylau.com/images/showmethemoney-sm.jpg)