{"id":13467700,"url":"https://github.com/simbafl/Data-analysis","last_synced_at":"2025-03-26T03:30:57.363Z","repository":{"id":132722003,"uuid":"164375468","full_name":"simbafl/Data-analysis","owner":"simbafl","description":"数据分析，挖掘建模。","archived":false,"fork":false,"pushed_at":"2022-01-02T10:43:43.000Z","size":6111,"stargazers_count":200,"open_issues_count":1,"forks_count":63,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-10-29T21:58:57.771Z","etag":null,"topics":["data-analysis","kaggle","matplotlib","numpy","pandas","python3","scipy","sklearn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simbafl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-01-07T03:53:04.000Z","updated_at":"2024-10-21T16:33:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"d0b3a6a2-88db-4eb4-b35f-40a3653c7438","html_url":"https://github.com/simbafl/Data-analysis","commit_stats":null,"previous_names":["fenglei110/data-analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simbafl%2FData-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simbafl%2FData-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simbafl%2FData-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simbafl%2FData-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simbafl","download_url":"https://codeload.github.com/simbafl/Data-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584591,"owners_count":20639580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","kaggle","matplotlib","numpy","pandas","python3","scipy","sklearn"],"created_at":"2024-07-31T15:00:59.486Z","updated_at":"2025-03-26T03:30:56.208Z","avatar_url":"https://github.com/simbafl.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"﻿# Data-analysis\npython数据分析与挖掘建模\n\n- :soccer: [\u003cfont size=+1\u003e数据分析与特征处理\u003c/font\u003e](./ch01)\n- :basketball: [\u003cfont size=+1\u003e机器学习与数据建模\u003c/font\u003e](./ch02)\n- :apple: [\u003cfont size=+1\u003e模型评估\u003c/font\u003e](./ch03)\n- :hamburger: [\u003cfont size=+1\u003e自然语言处理\u003c/font\u003e](./NLP)\n- :cherries: [\u003cfont size=+1\u003e爬虫那点事\u003c/font\u003e](./Spider)\n- :fries: [\u003cfont size=+1\u003ekaggle竞赛项目\u003c/font\u003e](./kaggle)\n- :banana: [\u003cfont size=+1\u003e知识图谱\u003c/font\u003e]()\n\n## 常用工具\n- numpy\n- pandas\n- matplotlib\n  - seaborn 基于matplotlib，对图像的丰富\n- SciPy 科学计算中包的集合\n  - scipy.integrade 数值积分例程和微分方程求解器\n  - scipy.linalg 线性代数例程和矩阵分解\n  - scipy.optimize 函数优化器和根查找算法\n  - scipy.signal 信号处理工具\n  - scipy.sparse 稀疏矩阵和稀疏线性系统求解器\n  - scipy.special SPECFUN(实现了许多常用数学函数)\n  - scipy.stats 标准连续和离散概率分布\n  - scipy.weave 利用内敛c++代码加速数组计算的工具\n  \n- scikit-learn 简称sk-learn, 机器学习工具，用于数据分析和数据挖掘，建立在Numpy, SciPy和matplotlib上。\n- Jupyter Notebook的本质是一个Web应用程序，便于创建和共享文学化程序文档，支持实时代码，数学方程，可视化和markdown，kaggle竞赛里资料都是Jupyter格式。用途包括：数据清理和转换，数值模拟，统计建模，机器学习等。 \n- TensorFlow 是一个采用数据流图，用于数值计算的开源软件库。最初被Google用于机器学习和深度神经网络方面的研究，但也可广泛用于其他计算领域。\n- Anaconda 包括Conda，Python以及180多安装好的工具包机器依赖，比如：numpy, pandas等。Conda是一个开源的包，环境管理器，可以用于在同一个机器上安装不同版本的软件包及其依赖，并能够在不同的环境之间切换。\n\n**jieba分词Demo**\n![ciyun](https://github.com/fenglei110/Data-analysis/blob/master/Spider/ch_Haiwang/images/ciyun.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimbafl%2FData-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimbafl%2FData-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimbafl%2FData-analysis/lists"}