Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/doubleplusplus/incremental_decision_tree-CART-Random_Forest
incremental CART decision tree, based on the hoeffding tree i.e. very fast decision tree (VFDT), which is proposed in this paper "Mining High-Speed Data Streams" by Domingos & Hulten (2000). And a newly extended model "Extremely Fast Decision Tree" (EFDT) by Manapragada, Webb & Salehi (2018). Added new implementation of Random Forest
https://github.com/doubleplusplus/incremental_decision_tree-CART-Random_Forest
Last synced: 2 months ago
JSON representation
incremental CART decision tree, based on the hoeffding tree i.e. very fast decision tree (VFDT), which is proposed in this paper "Mining High-Speed Data Streams" by Domingos & Hulten (2000). And a newly extended model "Extremely Fast Decision Tree" (EFDT) by Manapragada, Webb & Salehi (2018). Added new implementation of Random Forest
- Host: GitHub
- URL: https://github.com/doubleplusplus/incremental_decision_tree-CART-Random_Forest
- Owner: doubleplusplus
- Created: 2018-05-08T10:43:04.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-22T19:44:19.000Z (about 4 years ago)
- Last Synced: 2024-05-21T01:04:44.782Z (8 months ago)
- Language: Python
- Homepage:
- Size: 1.57 MB
- Stars: 95
- Watchers: 7
- Forks: 28
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-decision-tree-papers - [Code
README
# incremental-decision-tree-learner
Definition from wikipedia: [incremental decision tree](https://en.wikipedia.org/wiki/Incremental_decision_tree)
"An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, construct a tree using a complete (static) dataset. Incremental decision tree methods allow an existing tree to be updated using only new data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time (concept drift)."
## VFDT
This implementation is CART tree, based on the Hoeffding Tree i.e. very fast decision tree (VFDT) which is describe by the paper "Mining High-Speed Data Streams" (Domingos & Hulten, 2000). The code is tested on dataset downloaded from UCI data base.## EFDT
"Extremely Fast Decision Tree" by Manapragada, Webb & Salehi (2018). As new data instances come in, EFDT can dynamically modify existing model, re-evaluate previous split or kill subtree. Now EFDT is available. But it runs slower than VFDT.# Random Forest
Added implementation of Random Forest: `rf.py`. It is very efficient, because I used vectorized computation for computing gini impurity/index, and pooling for multi-processing.