{"id":13617881,"url":"https://github.com/oldratlee/data-science-practice","last_synced_at":"2025-04-10T00:15:52.052Z","repository":{"id":149710336,"uuid":"205848750","full_name":"oldratlee/data-science-practice","owner":"oldratlee","description":"数据科学实践 | data science practice","archived":false,"fork":false,"pushed_at":"2020-01-03T14:37:11.000Z","size":13,"stargazers_count":27,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-10T00:15:44.967Z","etag":null,"topics":["anaconda","data-science","python","statistics"],"latest_commit_sha":null,"homepage":"https://github.com/oldratlee/data-science-practice","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oldratlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-09-02T12:07:09.000Z","updated_at":"2022-11-29T06:50:54.000Z","dependencies_parsed_at":"2023-06-29T13:31:19.656Z","dependency_job_id":null,"html_url":"https://github.com/oldratlee/data-science-practice","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oldratlee%2Fdata-science-practice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oldratlee%2Fdata-science-practice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oldratlee%2Fdata-science-practice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oldratlee%2Fdata-science-practice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oldratlee","download_url":"https://codeload.github.com/oldratlee/data-science-practice/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248131315,"owners_count":21052819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anaconda","data-science","python","statistics"],"created_at":"2024-08-01T20:01:49.781Z","updated_at":"2025-04-10T00:15:51.935Z","avatar_url":"https://github.com/oldratlee.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# 数据科学实践\n\n**数据科学** 是指\n\n1. **数学/统计学**\n1. **计算机技术（编程）**\n1. **业务领域**\n\n三者的交叉应用学科。数据科学这个词近些年火起来，典型事件是2015年2月美国白宫宣布任命曾在多家硅谷科技公司任职的帕蒂尔（_DJ Patil_）为白宫首位首席数据科学家。\n\n数据科学3者下的两两交叉应用，其实已经广为大家所知：\n\n1. 数学/统计学 **`+`** 计算机技术（编程）  \n    **`=\u003e`** **机器学习**（Bang!）\n1. 计算机技术（编程） **`+`** 业务领域  \n    **`=\u003e`** **业务软件开发**（平时说的软件工程师）  \n1. 数学/统计学 **`+`** 业务领域  \n    **`=\u003e`** **传统研究**\n\n在数据科学火起来之前，大家用的多是『数据分析』这个词。所以2个主题的书一起看，早些年典型的『数据分析』主题的书实际讲的是数据科学的内容。\n\n❤️❤️\n欢迎进入数据科学的世界！\n❤️❤️\n\n-----------------------------\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n\n- [0. 学习资料/书单](#0-%E5%AD%A6%E4%B9%A0%E8%B5%84%E6%96%99%E4%B9%A6%E5%8D%95)\n- [1. 实践/开发环境搭建](#1-%E5%AE%9E%E8%B7%B5%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA)\n    - [1.1 `Python`运行环境搭建](#11-python%E8%BF%90%E8%A1%8C%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA)\n        - [1.1.1 安装`Anaconda`的`Python`发行版](#111-%E5%AE%89%E8%A3%85anaconda%E7%9A%84python%E5%8F%91%E8%A1%8C%E7%89%88)\n        - [1.1.2 配置镜像源](#112-%E9%85%8D%E7%BD%AE%E9%95%9C%E5%83%8F%E6%BA%90)\n        - [1.1.3 `Anaconda`的使用](#113-anaconda%E7%9A%84%E4%BD%BF%E7%94%A8)\n    - [1.2 代码编写的环境](#12-%E4%BB%A3%E7%A0%81%E7%BC%96%E5%86%99%E7%9A%84%E7%8E%AF%E5%A2%83)\n        - [1.2.1 `Jupyter Notebook`](#121-jupyter-notebook)\n        - [1.2.2 `PyCharm`](#122-pycharm)\n        - [1.2.3 `VS Code`](#123-vs-code)\n- [2. 自己的数据科学学习/实践](#2-%E8%87%AA%E5%B7%B1%E7%9A%84%E6%95%B0%E6%8D%AE%E7%A7%91%E5%AD%A6%E5%AD%A6%E4%B9%A0%E5%AE%9E%E8%B7%B5)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n-----------------------------\n\n\n# 0. 学习资料/书单\n\n- [数据科学/Data Science](https://www.douban.com/doulist/119731263/) 书单，个人推荐先看：\n    - [深入浅出数据科学](https://book.douban.com/subject/30338984/)\n    - [集体智慧编程](https://book.douban.com/subject/3288908/)\n    - [Python数据科学手册](https://book.douban.com/subject/27667378/)\n- [数据分析](https://www.douban.com/doulist/45963852/) 书单，个人推荐先看：\n    - [精益数据分析](https://book.douban.com/subject/26278639/)\n    - [利用Python进行数据分析 原书第2版](https://book.douban.com/subject/30283996/)\n\n可以看看了解\n\n- [数据分析师和数据科学家有何区别？ - 知乎](https://www.zhihu.com/question/20935297)\n\n# 1. 实践/开发环境搭建\n\n`Python`已经成为数据科学/机器学习的首选实践/开发环境。  \n\\# 当然也可以使用`R` 或是 `Excel`，使用不同工具环境都可以实践数据科学。  \n\\# `Excel`，是的，没听错；`Excel`应该是使用人数最多的数据分析工具。\n\n- `Python`繁荣与活跃生态 对 数据科学/机器学习 已经有了成熟的支持。\n- `Python`作为通用编程语言，相对`R`、`Excel`而言，灵活性不可比拟。\n\n## 1.1 `Python`运行环境搭建\n\n**_`Anaconda`_**！\n\n- 使用[`Anaconda`的`Python`发行版](https://www.anaconda.com/)已经成为数据科学/机器学习`Python`运行环境搭建的最佳实践！\n- `Anaconda`快速提供了\n    - 一个包含各种数据分析、机器学习的库的`Python`运行环境\n    - 不同的`Python`版本/不同库的隔离环境\n- 而无需在琐碎但没有价值的事情上浪费时间：\n    - 各种库的安装过程\n    - 不同库不同版本的兼容性问题\n\n下面给下快速搭建数据科学/机器学习的`Python`运行环境的说明。\n\n### 1.1.1 安装`Anaconda`的`Python`发行版\n\n下载地址：\n\n- https://www.anaconda.com/distribution/\n- 照着网页上的说明，完成安装。\n\n安装好`Anaconda`之后，执行命令\n\n- `jupyter notebook`： 运行基于`Web`浏览器里的一体化交互式环境\n- `ipython`：运行加强的`python`解释器\n\n，运行看看～ 🎉\n\n安装过程，如有问题，更多说明参见\n\n- 2019年最新最全的 Anaconda 安装指南 - 简书  \n    https://www.jianshu.com/p/e23e70fe8788\n- 或是搜索一下 :\")\n\n### 1.1.2 配置镜像源\n\n在国内没有镜像可不行，包安装下载要等死。\n\n- 配置`Anaconda`的镜像源\n    - 使用清华的镜像。\n    - 直接修改配置文件`~/.condarc`：\n\n        ```yaml\n        channels:\n          - defaults\n        show_channel_urls: true\n        default_channels:\n          - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main\n          - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free\n          - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r\n        custom_channels:\n          conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n          msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n          bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n          menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n          pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n          simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud\n        ```\n\n    - 或是通过命令行设置（安装好`Anaconda`之后，包含了`conda`）：\n\n        ```clj\n        conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/\n        conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/\n        conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r\n        conda config --set show_channel_urls yes\n        ```\n\n- 配置`pip`的镜像源\n    - 修改`~/.pip/pip.conf`：\n\n        ```ini\n        [global]\n        trusted-host = pypi.douban.com\n        index-url = http://pypi.douban.com/simple\n        ```\n\n如有问题，更多说明参见\n\n- Anaconda | 镜像站使用帮助 | 清华大学开源软件镜像站  \n    https://mirror.tuna.tsinghua.edu.cn/help/anaconda/\n- `Anaconda`修改国内镜像源 - 简书  \n    https://www.jianshu.com/p/042fd657e2d4\n- 或是 搜索一下 :\")\n\n### 1.1.3 `Anaconda`的使用\n\n下载安装后`Anaconda`的使用：\n\n- 常见的`Anaconda`使用\n- `Python`环境维护\n\n👉 参见独立的文档：[`Anaconda`的使用](anaconda-usage.md)。\n\n## 1.2 代码编写的环境\n\n`Jupyter Notebook` | `PyCharm` | `VS Code`。\n\n### 1.2.1 `Jupyter Notebook`\n\n`Jupyter Notebook`已经在`Anaconda`的发行版本中有了。\n\n提供基于`Web`浏览器里的一体化交互式环境，非常流行。试试用用，你会喜欢的。\n\n### 1.2.2 `PyCharm`\n\n`IDE`王者`JetBrains`提供`Python`开发的专业`IDE`。\n\n- 强劲的代码编写提示支持\n- 内置集成支持\n    - 流行`Jupyter Notebook`的编写\n    - `Anaconda`\n\n如果你是`JetBrains`/`IntelliJ`的粉丝更会喜欢。\n\n### 1.2.3 `VS Code`\n\n无需多解释。\n\n# 2. 自己的数据科学学习/实践\n\n- 学习\n    - 《统计学：从数据到结论》第四版 _吴喜之_：  \n        [代码练习库 - github.com/oldratlee/statistics-from-data-to-conclusion-4e](https://github.com/oldratlee/statistics-from-data-to-conclusion-4e)\n\n- 实践\n    - 计数不对的『豆瓣豆列』抓取：  \n        [github.com/oldratlee/doulist-wrong-counter-analysis](https://github.com/oldratlee/doulist-wrong-counter-analysis)  \n        基于`Jupyter Notebook`使用`Python`完成 页面抓取 与 分析。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foldratlee%2Fdata-science-practice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foldratlee%2Fdata-science-practice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foldratlee%2Fdata-science-practice/lists"}