{"id":13458860,"url":"https://github.com/heibaiying/BigData-Notes","last_synced_at":"2025-03-24T16:31:07.509Z","repository":{"id":37334881,"uuid":"174765647","full_name":"heibaiying/BigData-Notes","owner":"heibaiying","description":"大数据入门指南  :star:","archived":false,"fork":false,"pushed_at":"2024-01-05T03:00:32.000Z","size":24052,"stargazers_count":16263,"open_issues_count":38,"forks_count":4272,"subscribers_count":448,"default_branch":"master","last_synced_at":"2025-03-18T21:22:04.384Z","etag":null,"topics":["azkaban","big-data","bigdata","flume","hadoop","hbase","hdfs","hive","kafka","mapreduce","phoenix","scala","spark","sqoop","storm","yarn","zookeeper"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heibaiying.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-10T01:40:01.000Z","updated_at":"2025-03-18T20:54:40.000Z","dependencies_parsed_at":"2023-02-17T05:01:21.714Z","dependency_job_id":"59d4d9ec-9bc1-4fcf-aa91-fe8d0358c074","html_url":"https://github.com/heibaiying/BigData-Notes","commit_stats":{"total_commits":545,"total_committers":9,"mean_commits":60.55555555555556,"dds":0.563302752293578,"last_synced_commit":"3898939aca387c25b3eb4e51ef49dfccca8543ed"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heibaiying%2FBigData-Notes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heibaiying%2FBigData-Notes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heibaiying%2FBigData-Notes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heibaiying%2FBigData-Notes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heibaiying","download_url":"https://codeload.github.com/heibaiying/BigData-Notes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245308477,"owners_count":20594257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azkaban","big-data","bigdata","flume","hadoop","hbase","hdfs","hive","kafka","mapreduce","phoenix","scala","spark","sqoop","storm","yarn","zookeeper"],"created_at":"2024-07-31T09:00:58.903Z","updated_at":"2025-03-24T16:31:06.131Z","avatar_url":"https://github.com/heibaiying.png","language":"Java","readme":"# BigData-Notes\n\n\n\n\u003cdiv align=\"center\"\u003e \u003cimg width=\"444px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/bigdata-notes-icon.png\"/\u003e \u003c/div\u003e\n\u003cbr/\u003e\n\n**大数据入门指南**\n\n\n\n\u003ctable\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop.jpg\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hive.jpg\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/spark.jpg\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/storm.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flink.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hbase.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/kafka.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/zookeeper.jpg\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flume.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/sqoop.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/azkaban.png\"\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cimg width=\"50px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/scala.jpg\"\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#一hadoop\"\u003eHadoop\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#二hive\"\u003eHive\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#三spark\"\u003eSpark\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#四storm\"\u003eStorm\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#五flink\"\u003eFlink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#六hbase\"\u003eHBase\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#七kafka\"\u003eKafka\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#八zookeeper\"\u003eZookeeper\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#九flume\"\u003eFlume\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#十sqoop\"\u003eSqoop\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#十一azkaban\"\u003eAzkaban\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"#十二scala\"\u003eScala\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e\n\u003cbr/\u003e\n\n\u003cdiv align=\"center\"\u003e\n\t\u003ca href = \"https://github.com/heibaiying/Full-Stack-Notes\"\u003e \n\t\u003cimg width=\"150px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin.jpg\"/\u003e \n\t\u003c/a\u003e \n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e \u003cstrong\u003e 如果需要离线阅读，可以在公众号上发送 “bigdata” 获取《大数据入门指南》离线阅读版！ \u003c/strong\u003e \u003c/div\u003e\n\n\u003cbr/\u003e\n\n## :black_nib: 前  言\n\n1. [大数据学习路线](notes/大数据学习路线.md)\n2. [大数据技术栈思维导图](notes/大数据技术栈思维导图.md)        \n3. [大数据常用软件安装指南](notes/大数据常用软件安装指南.md)\n\n## 一、Hadoop\n\n1. [分布式文件存储系统 —— HDFS](notes/Hadoop-HDFS.md)\n2. [分布式计算框架 —— MapReduce](notes/Hadoop-MapReduce.md)\n3. [集群资源管理器 —— YARN](notes/Hadoop-YARN.md)\n4. [Hadoop 单机伪集群环境搭建](notes/installation/Hadoop单机环境搭建.md)\n5. [Hadoop 集群环境搭建](notes/installation/Hadoop集群环境搭建.md)\n6. [HDFS 常用 Shell 命令](notes/HDFS常用Shell命令.md)\n7. [HDFS Java API 的使用](notes/HDFS-Java-API.md)\n8. [基于 Zookeeper 搭建 Hadoop 高可用集群](notes/installation/基于Zookeeper搭建Hadoop高可用集群.md)\n\n## 二、Hive\n\n1. [Hive 简介及核心概念](notes/Hive简介及核心概念.md)\n2. [Linux 环境下 Hive 的安装部署](notes/installation/Linux环境下Hive的安装部署.md)\n4. [Hive CLI 和 Beeline 命令行的基本使用](notes/HiveCLI和Beeline命令行的基本使用.md)\n6. [Hive 常用 DDL 操作](notes/Hive常用DDL操作.md)\n7. [Hive 分区表和分桶表](notes/Hive分区表和分桶表.md)\n8. [Hive 视图和索引](notes/Hive视图和索引.md)\n9. [Hive 常用 DML 操作](notes/Hive常用DML操作.md)\n10. [Hive 数据查询详解](notes/Hive数据查询详解.md)\n\n## 三、Spark\n\n**Spark Core :**\n\n1. [Spark 简介](notes/Spark简介.md)\n2. [Spark 开发环境搭建](notes/installation/Spark开发环境搭建.md)\n4. [弹性式数据集 RDD](notes/Spark_RDD.md)\n5. [RDD 常用算子详解](notes/Spark_Transformation和Action算子.md)\n5. [Spark 运行模式与作业提交](notes/Spark部署模式与作业提交.md)\n6. [Spark 累加器与广播变量](notes/Spark累加器与广播变量.md)\n7. [基于 Zookeeper 搭建 Spark 高可用集群](notes/installation/Spark集群环境搭建.md)\n\n**Spark SQL :**\n\n1. [DateFrame 和 DataSet ](notes/SparkSQL_Dataset和DataFrame简介.md)\n2. [Structured API 的基本使用](notes/Spark_Structured_API的基本使用.md)\n3. [Spark SQL 外部数据源](notes/SparkSQL外部数据源.md)\n4. [Spark SQL 常用聚合函数](notes/SparkSQL常用聚合函数.md)\n5. [Spark SQL JOIN 操作](notes/SparkSQL联结操作.md)\n\n**Spark Streaming ：**\n\n1. [Spark Streaming 简介](notes/Spark_Streaming与流处理.md)\n2. [Spark Streaming 基本操作](notes/Spark_Streaming基本操作.md)\n3. [Spark Streaming 整合 Flume](notes/Spark_Streaming整合Flume.md)\n4. [Spark Streaming 整合 Kafka](notes/Spark_Streaming整合Kafka.md)\n\n## 四、Storm\n\n1. [Storm 和流处理简介](notes/Storm和流处理简介.md)\n2. [Storm 核心概念详解](notes/Storm核心概念详解.md)\n3. [Storm 单机环境搭建](notes/installation/Storm单机环境搭建.md)\n4. [Storm 集群环境搭建](notes/installation/Storm集群环境搭建.md)\n5. [Storm 编程模型详解](notes/Storm编程模型详解.md)\n6. [Storm 项目三种打包方式对比分析](notes/Storm三种打包方式对比分析.md)\n7. [Storm 集成 Redis 详解](notes/Storm集成Redis详解.md)\n8. [Storm 集成 HDFS/HBase](notes/Storm集成HBase和HDFS.md)\n9. [Storm 集成 Kafka](notes/Storm集成Kakfa.md)\n\n## 五、Flink\n\n1. [Flink 核心概念综述](notes/Flink核心概念综述.md)\n2. [Flink 开发环境搭建](notes/Flink开发环境搭建.md)\n3. [Flink Data Source](notes/Flink_Data_Source.md)\n4. [Flink Data Transformation](notes/Flink_Data_Transformation.md)\n4. [Flink Data Sink](notes/Flink_Data_Sink.md)\n6. [Flink 窗口模型](notes/Flink_Windows.md)\n7. [Flink 状态管理与检查点机制](notes/Flink状态管理与检查点机制.md)\n8. [Flink Standalone 集群部署](notes/installation/Flink_Standalone_Cluster.md)\n\n\n## 六、HBase\n\n1. [Hbase 简介](notes/Hbase简介.md)\n2. [HBase 系统架构及数据结构](notes/Hbase系统架构及数据结构.md)\n3. [HBase 基本环境搭建 (Standalone /pseudo-distributed mode)](notes/installation/HBase单机环境搭建.md)\n4. [HBase 集群环境搭建](notes/installation/HBase集群环境搭建.md)\n5. [HBase 常用 Shell 命令](notes/Hbase_Shell.md)\n6. [HBase Java API](notes/Hbase_Java_API.md)\n7. [HBase 过滤器详解](notes/Hbase过滤器详解.md)\n8. [HBase 协处理器详解](notes/Hbase协处理器详解.md)\n9. [HBase 容灾与备份](notes/Hbase容灾与备份.md)\n10. [HBase的 SQL 中间层 —— Phoenix](notes/Hbase的SQL中间层_Phoenix.md)\n11. [Spring/Spring Boot 整合 Mybatis + Phoenix](notes/Spring+Mybtais+Phoenix整合.md)\n\n## 七、Kafka\n\n1. [Kafka 简介](notes/Kafka简介.md)\n2. [基于 Zookeeper 搭建 Kafka 高可用集群](notes/installation/基于Zookeeper搭建Kafka高可用集群.md)\n3. [Kafka 生产者详解](notes/Kafka生产者详解.md)\n4. [Kafka 消费者详解](notes/Kafka消费者详解.md)\n5. [深入理解 Kafka 副本机制](notes/Kafka深入理解分区副本机制.md)\n\n## 八、Zookeeper\n\n1. [Zookeeper 简介及核心概念](notes/Zookeeper简介及核心概念.md)\n2. [Zookeeper 单机环境和集群环境搭建](notes/installation/Zookeeper单机环境和集群环境搭建.md) \n3. [Zookeeper 常用 Shell 命令](notes/Zookeeper常用Shell命令.md)\n4. [Zookeeper Java 客户端 —— Apache Curator](notes/Zookeeper_Java客户端Curator.md)\n5. [Zookeeper  ACL 权限控制](notes/Zookeeper_ACL权限控制.md)\n\n## 九、Flume\n\n1. [Flume 简介及基本使用](notes/Flume简介及基本使用.md)\n2. [Linux 环境下 Flume 的安装部署](notes/installation/Linux下Flume的安装.md)\n3. [Flume 整合 Kafka](notes/Flume整合Kafka.md)\n\n## 十、Sqoop\n\n1. [Sqoop 简介与安装](notes/Sqoop简介与安装.md)\n2. [Sqoop 的基本使用](notes/Sqoop基本使用.md)\n\n## 十一、Azkaban\n\n1. [Azkaban 简介](notes/Azkaban简介.md)\n2. [Azkaban3.x 编译及部署](notes/installation/Azkaban_3.x_编译及部署.md)\n3. [Azkaban Flow 1.0 的使用](notes/Azkaban_Flow_1.0_的使用.md)\n4. [Azkaban Flow 2.0 的使用](notes/Azkaban_Flow_2.0_的使用.md)\n\n## 十二、Scala\n\n1. [Scala 简介及开发环境配置](notes/Scala简介及开发环境配置.md)\n2. [基本数据类型和运算符](notes/Scala基本数据类型和运算符.md)\n3. [流程控制语句](notes/Scala流程控制语句.md)\n4. [数组 —— Array](notes/Scala数组.md)\n5. [集合类型综述](notes/Scala集合类型.md)\n6. [常用集合类型之 —— List \u0026 Set](notes/Scala列表和集.md)\n7. [常用集合类型之 —— Map \u0026 Tuple](notes/Scala映射和元组.md)\n8. [类和对象](notes/Scala类和对象.md)\n9. [继承和特质](notes/Scala继承和特质.md)\n10. [函数 \u0026 闭包 \u0026 柯里化](notes/Scala函数和闭包.md)\n11. [模式匹配](notes/Scala模式匹配.md)\n12. [类型参数](notes/Scala类型参数.md)\n13. [隐式转换和隐式参数](notes/Scala隐式转换和隐式参数.md)\n\n## 十三、公共内容\n\n1. [大数据应用常用打包方式](notes/大数据应用常用打包方式.md)\n\n\u003cbr\u003e\n\n## :bookmark_tabs: 后  记\n\n[资料分享与开发工具推荐](notes/资料分享与工具推荐.md)\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n\t\u003ca href = \"https://blog.csdn.net/m0_37809146\"\u003e \n\t\u003cimg width=\"200px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/blog-logo.png\"/\u003e \n\t\u003c/a\u003e \n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e \u003ca  href = \"https://blog.csdn.net/m0_37809146\"\u003e 欢迎关注我的博客：https://blog.csdn.net/m0_37809146\u003c/a\u003e \u003c/div\u003e\n","funding_links":[],"categories":["Java","学习资料","Java/Kotlin","Tutorial","Java (504)","数据库管理系统","大数据"],"sub_categories":["Big Data","网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheibaiying%2FBigData-Notes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheibaiying%2FBigData-Notes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheibaiying%2FBigData-Notes/lists"}