{"id":23146787,"url":"https://github.com/haukzero/decisiontree-strat-from-scratch","last_synced_at":"2026-04-27T22:32:00.331Z","repository":{"id":249340700,"uuid":"831242784","full_name":"haukzero/DecisionTree-strat-from-scratch","owner":"haukzero","description":"依据西瓜书从零开始手写决策树分类器，只使用 pandas, numpy, graphviz 三个扩展包，其中 graphviz 用于将生成的树可视化","archived":false,"fork":false,"pushed_at":"2024-07-20T03:17:32.000Z","size":2343,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-04T13:28:37.735Z","etag":null,"topics":["decision-tree-classifier","machine-learning","numpy","python","start-from-scratch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mulanpsl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haukzero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-20T02:53:53.000Z","updated_at":"2024-07-20T07:20:13.000Z","dependencies_parsed_at":"2024-07-20T05:05:19.176Z","dependency_job_id":"7324f294-69b2-4a89-a024-3de581fdf690","html_url":"https://github.com/haukzero/DecisionTree-strat-from-scratch","commit_stats":null,"previous_names":["haukzero/decisiontree-strat-from-scratch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/haukzero/DecisionTree-strat-from-scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haukzero%2FDecisionTree-strat-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haukzero%2FDecisionTree-strat-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haukzero%2FDecisionTree-strat-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haukzero%2FDecisionTree-strat-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haukzero","download_url":"https://codeload.github.com/haukzero/DecisionTree-strat-from-scratch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haukzero%2FDecisionTree-strat-from-scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32358509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-tree-classifier","machine-learning","numpy","python","start-from-scratch"],"created_at":"2024-12-17T16:37:59.654Z","updated_at":"2026-04-27T22:32:00.317Z","avatar_url":"https://github.com/haukzero.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 从零开始手写一棵决策树分类器\n\n## How to start\n\n首先确保已经安装了 graphviz 并正确设置环境变量\n\n若未安装则需要先在官网选择合适版本下载安装并将其目录下的`bin`目录添加至系统 PATH 环境变量中\n\n```bash\npip install -r requirements.txt\n```\n\n```bash\npython main.py\n```\n\n## 项目结构\n\n- `datasets`: 数据集存放目录\n    - 数据集：鸢尾花，乳腺癌，西瓜书中的 melon2, melon3, melon3alpha\n- `out`: 最终生成的可视化决策树图片存放目录\n- `data_class`: 对传入的数据进行一个整合处理，方便后续树的构建\n  - `RealData.py`: 存放 `RealData` 类，保存的是真实的所有数据\n    - 数据格式：特征与标签分开存储，特征向量的 `shape` 为 [n_samples, n_features]\n  - `DataIndex.py`: 存放 `DataIndex` 类, 保存的是真实数据的索引，通过索引的方式映射出拆分的子数据集\n    - 数据格式：\n      - 样例维度索引列表 `smple_ids`, 相当于 `i`\n      - 特征维度索引列表 `features_ids`, 相当于 `j`\n  - `utils.py`: 构建 data 的工具函数集合\n    - `from_`字段：从对应数据结构中生成 `RealData`\n    - `build_index`: 为一个新的 `RealData` 构建一个全索引\n    - `split_`字段: 对数据按照给定的方式进行拆分\n- `utils`: 存放相关公式建立的目录\n  - `evaluate.py`: 包含西瓜书上相关评估特征选取方面上的函数\n  - `chose_best.py`: 根据不同的评估方式选取最优的特征及特征取值(连续值特有)\n  - 为了方便后续参数的统一, 在离散特征选取函数的返回值后面添加了无意义的`None`用于与连续值情况对齐\n- `tree`: 决策树构建目录\n  - `TreeNode.py`: 存放 `TreeNode` 类, 保存的是树结点\n  - ` DecisionTreeClassifier.py`: 决策树分类器\n  - `visual.py`: 通过 `graphviz` 包画出决策树\n- `main.py`: 主函数入口\n\n![data_format](./img/data_format.png)\n\n## 相关公式及算法\n来源于西瓜书的决策树章节\n\n- 交叉熵\n\n![entropy](./img/entropy.png)\n\n- 信息增益\n\n![infoGain](./img/infoGain.png)\n\n- 增益率\n\n![gain_ratio](./img/gain_ratio.png)\n\n- 基尼指数\n\n![gini_index](./img/gini_index.png)\n\n- 决策树构建算法\n\n![algorithm](./img/algorithm.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaukzero%2Fdecisiontree-strat-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaukzero%2Fdecisiontree-strat-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaukzero%2Fdecisiontree-strat-from-scratch/lists"}