{"id":15032989,"url":"https://github.com/turboway/bigdata_analyse","last_synced_at":"2025-05-14T21:07:00.013Z","repository":{"id":37760279,"uuid":"324295142","full_name":"TurboWay/bigdata_analyse","owner":"TurboWay","description":"大数据分析项目","archived":false,"fork":false,"pushed_at":"2023-04-27T02:55:12.000Z","size":1211,"stargazers_count":4445,"open_issues_count":7,"forks_count":731,"subscribers_count":49,"default_branch":"main","last_synced_at":"2025-05-14T21:06:53.947Z","etag":null,"topics":["hql","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TurboWay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-12-25T05:35:32.000Z","updated_at":"2025-05-14T11:54:19.000Z","dependencies_parsed_at":"2024-01-14T15:22:49.546Z","dependency_job_id":"5fd5a3a0-11ca-45e6-bd5f-5f641e157a9b","html_url":"https://github.com/TurboWay/bigdata_analyse","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fbigdata_analyse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fbigdata_analyse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fbigdata_analyse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurboWay%2Fbigdata_analyse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TurboWay","download_url":"https://codeload.github.com/TurboWay/bigdata_analyse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254227612,"owners_count":22035669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hql","python","sql"],"created_at":"2024-09-24T20:19:52.814Z","updated_at":"2025-05-14T21:06:54.998Z","avatar_url":"https://github.com/TurboWay.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bigdata_analyse\n该 repo 是本人实践过的数据分析项目集合，每个项目都会包含一个友好的说明文档，用来阐述和展示整个开发流程，同时也会提供相关的数据集，以供下载练习。\n\n## wish\n\n采用不同的技术栈，通过对不同行业的数据集进行分析，期望达到以下目的：\n\n- 了解不同领域的业务分析指标 \n- 深化数据处理、数据分析、数据可视化能力\n- 增加大数据批处理、流处理的实践经验\n- 增加数据挖掘的实践经验\n\n## tip\n\n- 项目主要使用的编程语言是 python、sql、hql\n- .ipynb 可以用 jupyter notebook 打开，如何安装, 可以参考 [jupyter notebook](http://blog.turboway.top/article/jupyter/)\n\u003ejupyter notebook 是一种网页交互形式的 python 编辑器，直接通过 pip 安装，也支持 markdown，很适合用来做数据分析可视化以及写文章、写示例代码等。\n\n## list\n\n| 主题 | 处理方式 | 技术栈  |  数据集下载 |\n| ------------ | ------------ | ------------ | ------------ |\n| [1 亿条淘宝用户行为数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/UserBehaviorFromTaobao_Batch/用户行为数据分析.md)       |  离线处理  | 清洗 hive  + 分析 hive + 可视化 echarts | [阿里云](https://tianchi.aliyun.com/dataset/dataDetail?dataId=649\u0026userId=1) 或者 [百度网盘](https://pan.baidu.com/s/15Ss-nDMA120EHhuwpzYm0g) 提取码：5ipq |\n| [1000 万条淘宝用户行为数据实时分析](https://github.com/TurboWay/bigdata_analyse/blob/main/UserBehaviorFromTaobao_Stream/用户行为数据实时分析.md)       |  实时处理  | 数据源 kafka  + 实时分析 flink + 可视化（es + kibana）  | [百度网盘](https://pan.baidu.com/s/1CPD5jpmvOUvg1LETAVETGw)  提取码：m4mc|\n| [300 万条《野蛮时代》的玩家数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/AgeOfBarbarians/野蛮时代数据分析.md)       |  离线处理  | 清洗 pandas  + 分析 mysql + 可视化 pyecharts | [百度网盘](https://pan.baidu.com/s/1Mi5lvGDF405Nk8Y2BZDzdQ) 提取码：paq4 |\n| [130 万条深圳通刷卡数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/SZTcard/深圳通刷卡数据分析.md)       |  离线处理  | 清洗 pandas  + 分析 impala + 可视化 dbeaver | [百度网盘](https://pan.baidu.com/s/1WslwKXKhVH1q_6u4SvuKkQ) 提取码：t561 |\n| [10 万条厦门招聘数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/AmoyJob/2021厦门招聘数据分析.md)       |  离线处理  | 清洗 pandas  + 分析 hive + 可视化 ( hue + pyecharts ) + 预测 sklearn | [百度网盘](https://pan.baidu.com/s/1mco8dKb5o0qPd2kqsj7bNg) 提取码：9wx0|\n| [7000 条租房数据分析](https://github.com/TurboWay/bigdata_analyse/blob/main/RentFromDanke/租房数据分析.md)       |  离线处理  | 清洗 pandas  + 分析 sqlite + 可视化 matplotlib  | [百度网盘](https://pan.baidu.com/s/1l1x5qurJdkyUxAuhknj_Qw) 提取码：9en3 |\n| [6000 条倒闭企业数据分析](https://nbviewer.jupyter.org/github/TurboWay/bigdata_analyse/blob/main/DeathCompany/倒闭企业数据分析.ipynb)       |  离线处理  | 清洗 pandas  + 分析 pandas + 可视化 (jupyter notebook + pyecharts) | [百度网盘](https://pan.baidu.com/s/1I6E6i4ZadxE9IlVPe3Bqwg) 提取码：xvgm |\n| [COVID-19 疫情数据分析](https://nbviewer.jupyter.org/github/TurboWay/bigdata_analyse/blob/main/COVID-19/新冠疫情数据分析.ipynb)       |  离线处理  | 清洗 pandas  + 分析 pandas + 可视化 (jupyter notebook + pyecharts) | [COVID-19](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series) 或者 [百度网盘](https://pan.baidu.com/s/1b45MqPwjEWPoTOuEXquVcw) 提取码：wgmg |\n| [7 万条天猫订单数据分析](https://nbviewer.jupyter.org/github/TurboWay/bigdata_analyse/blob/main/OrderFromTmall/电商订单分析.ipynb)       |  离线处理  | 清洗 pandas  + 分析 pandas + 可视化 (jupyter notebook + pyecharts) | [百度网盘](https://pan.baidu.com/s/1psK07rkNU0_OdOXG4u1VDw) 提取码：27nr |\n\n## refer\n\n\u003e 1. [https://tianchi.aliyun.com/dataset/](https://tianchi.aliyun.com/dataset/)\n\u003e 2. [https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601](https://opendata.sz.gov.cn/data/api/toApiDetails/29200_00403601)\n\u003e 3. [https://www.kesci.com/home/dataset](https://www.kesci.com/home/dataset)\n\u003e 4. [https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturboway%2Fbigdata_analyse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fturboway%2Fbigdata_analyse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturboway%2Fbigdata_analyse/lists"}