{"id":18006624,"url":"https://github.com/aikuyun/bigdata-doc","last_synced_at":"2025-11-07T06:03:46.359Z","repository":{"id":43651345,"uuid":"153401851","full_name":"aikuyun/bigdata-doc","owner":"aikuyun","description":"大数据学习笔记，学习路线，技术案例整理。","archived":false,"fork":false,"pushed_at":"2023-01-04T15:27:52.000Z","size":2492,"stargazers_count":39,"open_issues_count":20,"forks_count":21,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-03-05T13:22:33.699Z","etag":null,"topics":["bigdata","flink","hadoop","hdfs","hive","kafka","mapreduce"],"latest_commit_sha":null,"homepage":"https://data.cuteximi.com","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aikuyun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-17T05:46:50.000Z","updated_at":"2023-02-24T03:45:42.000Z","dependencies_parsed_at":"2023-02-02T18:02:30.460Z","dependency_job_id":null,"html_url":"https://github.com/aikuyun/bigdata-doc","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aikuyun%2Fbigdata-doc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aikuyun%2Fbigdata-doc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aikuyun%2Fbigdata-doc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aikuyun%2Fbigdata-doc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aikuyun","download_url":"https://codeload.github.com/aikuyun/bigdata-doc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222144798,"owners_count":16938457,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","flink","hadoop","hdfs","hive","kafka","mapreduce"],"created_at":"2024-10-30T01:09:06.129Z","updated_at":"2025-11-07T06:03:41.341Z","avatar_url":"https://github.com/aikuyun.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"![自学大数据](https://img.shields.io/badge/%E8%87%AA%E5%AD%A6-%E5%A4%A7%E6%95%B0%E6%8D%AE-brightgreen.svg)\n![自学机器学习](https://img.shields.io/badge/%E8%87%AA%E5%AD%A6-%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0-brightgreen.svg)\n![大数据进击之路](https://img.shields.io/badge/%E8%87%AA%E5%AD%A6-%E5%A4%A7%E6%95%B0%E6%8D%AE%E8%BF%9B%E5%87%BB%E4%B9%8B%E8%B7%AF-blue.svg)\n\n# 大数据学习资源整合\n\n大数据与机器学习笔记，持续更新中。\n\n# 文章分类\n- 大数据技术周报\n  - [大数据技术周报,每周更新](https://mp.weixin.qq.com/mp/appmsgalbum?__biz=MzU0OTgxNjMyNA==\u0026action=getalbum\u0026album_id=2052897255342309378\u0026scene=173\u0026from_msgid=2247484324\u0026from_itemidx=1\u0026count=3\u0026nolastread=1#wechat_redirect)\n\n- 机器学习\n  - [从机器学习谈起](https://github.com/aikuyun/bigdata-doc/blob/master/docs/ml/ml-guid.md)\n  - [机器学习术语](https://github.com/aikuyun/bigdata-doc/blob/master/docs/ml/ml-term.md)\n  - [机器学习路线](https://github.com/aikuyun/bigdata-doc/blob/master/docs/ml/study-road.md)\n  - [推荐两个网站，认清自己的阶段](https://github.com/aikuyun/bigdata-doc/blob/master/docs/ml/study-website.md)\n\n- 分布式基础\n\n  - [分布式基础](https://github.com/aikuyun/ziyuan/blob/master/docs/distribute/distribute.md)\n\n- 大数据生态\n  - [HDFS](https://github.com/aikuyun/ziyuan/tree/master/docs/ziyuan01#hdfs)\n\n  - [MapReduce](https://github.com/aikuyun/ziyuan/tree/master/docs/ziyuan01#mapreduce)\n\n  - [Hive](https://github.com/aikuyun/ziyuan/blob/master/docs/ziyuan01/Hive.md)\n\n- 深挖底层\n  - [Hadoop HA 机制](https://github.com/aikuyun/ziyuan/tree/master/docs/ziyuan02#hadoop-ha-%E6%9C%BA%E5%88%B6)\n\n  - [MR原理和运行过程](https://github.com/aikuyun/ziyuan/blob/master/docs/ziyuan02/MRyuanli.md)\n\n  - [NameNode内部解析](https://github.com/aikuyun/ziyuan/blob/master/docs/ziyuan02/MRyuanli.md)\n\n  - [二次排序](https://github.com/aikuyun/ziyuan/blob/master/docs/ziyuan02/secondarySort.md)\n\n  - [kafka](https://github.com/aikuyun/ziyuan/blob/master/docs/ziyuan02/kafka-01.md)\n\n- 解决方案\n  - [很多大厂解决方案](https://github.com/aikuyun/ziyuan/blob/master/docs/It-chat/case.md)\n  - [日均万亿条数据如何处理？爱奇艺实时计算平台这样做](https://mp.weixin.qq.com/s/DKP08aUSNMOySNcs_y6ODA)\n  - [揭秘微信「看一看」 是如何为你推荐的](https://mp.weixin.qq.com/s/Regv8UUc5PH9HcnUq_zq3A)\n\n- 技术文章整理\n\n  - [技术文章整理](https://github.com/aikuyun/ziyuan/blob/master/docs/artical/artical.md)\n\n- Spark\n\t- [Spark 调优](https://mp.weixin.qq.com/s/iNovecaYkKrytNgQMvIMZw)\n\t- [Spark shuffle 寻址流程](https://mp.weixin.qq.com/s/0eQPmVnXCbEr1ziPAW569A)\n\t- [Spark shuffle 调优](https://mp.weixin.qq.com/s/keJnU0trtTW9W-zBWPKD5A)\n\t- [Spark 数据本地化级别](https://mp.weixin.qq.com/s/kF4zjiambBohSJG9gZW8_g)\n\t- [Spark 的核心 RDD 以及 Stage 划分细节，运行模式总结](https://mp.weixin.qq.com/s/aPwsPTkFakBwv3MIioaOOg)\n\n- kafka\n\t- [kafka + sparkstreaming](https://mp.weixin.qq.com/s/wKjSalxFdVkRXGPnNVg_2g)\n\t- [kafka 数据丢失与重复消费](https://mp.weixin.qq.com/s/ROoVOVgNW8jzdCZeAwLTDQ)\n\n- HBase\n\t- [HBase 架构](https://mp.weixin.qq.com/s/j2Kbi003Etzw_15KwV0TyQ)\n\t- [HBase 架构补充](https://mp.weixin.qq.com/s/7yRequ0pqGN_00zi704wwA)\n\n- Hadoop\n\t- [Hadoop HA 原理分析](https://mp.weixin.qq.com/s/BmVvoi8k0mU9pmGQCl2Sug)\n\t- [Hadoop系列之 1.0 和 2.0 架构](https://mp.weixin.qq.com/s/B_wOtK1gSVlmB4cF5hZG2A)\n  - [Hadoop系列之 Hive](https://mp.weixin.qq.com/s/fWKX6NR908fLbVUMFwpj8A)\n  - [Hadoop系列之 Mapreduce](https://mp.weixin.qq.com/s/JDDTTy6QfZtwz547M88GMQ)\n  - [Hadoop系列之 HDFS](https://mp.weixin.qq.com/s/Dcsat0-iRB_xYRBoMfhoXg)\n\n- Flink\n  - [Flink社区电子书](https://mp.weixin.qq.com/s?__biz=MzIwMjA2MTk4Ng==\u0026mid=2247485438\u0026idx=1\u0026sn=2bb7f82402dc4607f94cdb78e48cd48b\u0026chksm=96e52633a192af25a5c6b2371dfed395aa46168639c01bb49dbc36381f2b3dd889bfe9256d6a\u0026xtrack=1\u0026scene=0\u0026subscene=91\u0026sessionid=1555230598\u0026clicktime=1555230760\u0026ascene=7\u0026devicetype=android-27\u0026version=27000334\u0026nettype=cmnet\u0026abtest_cookie=BAABAAoACwASABMABQAjlx4AVpkeAMeZHgDRmR4A3JkeAAAA\u0026lang=zh_CN\u0026pass_ticket=cRjrq%2F8EqXfIhZvDoJO4rqTvtx1hEu4fyHiignznzsezMHPtQ83VFn8G02ozwToC\u0026wx_header=1)\n  - [Flink 里程碑版本即将发布，快点入手](https://mp.weixin.qq.com/s/OmTmPHaP0vSPT128eAf2Ig)\n  - [重磅福利！《Apache Flink 十大技术难点实战》发布，帮你从容应对生产环境中的技术难题](https://mp.weixin.qq.com/s/U3c4oXFLPuc4XiUNUY55gg)\n  - [2020 年 Flink 学习资料整合，建议收藏](https://mp.weixin.qq.com/s/wuKBvNbkO-pTWZEMSvGLNg)\n\n\n\n- 数据仓库\n  - [离线数仓与实时数仓（一）](https://mp.weixin.qq.com/s/dpwQ4sx-IWL66m03lPa6rg)\n  - [58全站用户行为数据仓库建设及实践](https://mp.weixin.qq.com/s/MnfdsLHGjK9okv020cS_Kg)\n  - [干货 | 携程机票数据仓库建设之路](https://mp.weixin.qq.com/s/oPQFDl-A-6BnPXhNdwnePA)\n  - [干货 | 携程Hadoop跨机房架构实践](https://mp.weixin.qq.com/s/S5SXNabYqwyUMl1ReLayKw)\n\n- Hive 基础\n  - [Hive 数据压缩格式总结](https://mp.weixin.qq.com/s/T6Y4vMYghb_asWdtsZnjpA)\n  - [CombineFileInputFormat 文件分片总结](https://mp.weixin.qq.com/s/DZ-CfrVrr7i0iA2GRdBN1g)\n  - [Hive SQL 窗口函数](https://mp.weixin.qq.com/s/qhP2tOS5plxaczPN1JkWJw)\n  - [Hive SQL 分析函数](https://mp.weixin.qq.com/s/6nNr97z-Rj5Alofl8wCwhw)\n\n- 底层基础\n  - [深入理解 MySQL 索引底层原理](https://mp.weixin.qq.com/s/J7eQcwBgQEGJk4bGIa9wDA)\n  - [缓存击穿、缓存失效及热点key的解决方案](https://mp.weixin.qq.com/s/TqqTDy2YizLMwE0tyHxKVA)\n\n## 欢迎关注原创公众号\n\n公众号：大数据学习指南 专注大数据数据技术\n\n![扫我](https://cdn.nlark.com/yuque/0/2021/png/199648/1631944506464-83677e15-283f-43de-b106-5ff823300c85.png?x-oss-process=image%2Fresize%2Cw_900%2Climit_0)\n\n其他平台，会不定时同步更新。\n\n- [语雀](https://www.yuque.com/cuteximi/base)\n- [知乎](https://zhuanlan.zhihu.com/bigdata1995)\n- [头条号](https://www.toutiao.com/c/user/70068423102/#mid=1579500719412238)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faikuyun%2Fbigdata-doc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faikuyun%2Fbigdata-doc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faikuyun%2Fbigdata-doc/lists"}