Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sushantnair/decision_tree_gini_index
A Python implementation of the Decision Tree Algorithm
https://github.com/sushantnair/decision_tree_gini_index
decision-tree gini gini-index id3 id3-algorithm machine-learning machine-learning-algorithms ml
Last synced: about 2 months ago
JSON representation
A Python implementation of the Decision Tree Algorithm
- Host: GitHub
- URL: https://github.com/sushantnair/decision_tree_gini_index
- Owner: sushantnair
- License: gpl-3.0
- Created: 2024-03-28T17:33:16.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-03-28T18:26:59.000Z (9 months ago)
- Last Synced: 2024-03-29T19:05:05.398Z (9 months ago)
- Topics: decision-tree, gini, gini-index, id3, id3-algorithm, machine-learning, machine-learning-algorithms, ml
- Language: Jupyter Notebook
- Homepage: https://colab.research.google.com/drive/1xYDPYfWOJ2AaIIHmDzqeDUvu97knYekv
- Size: 74.2 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Decision_Tree_GINI_Index
A Python implementation of the Decision Tree Algorithm## Purpose
This code is meant to foster an in-depth understanding of the Decision Tree Algorithm used in Machine Learning.## Implementation Details
This code has been written purely in Python to implement the ID3 Decision Tree Algorithm used in Machine Learning.### Libraries Used
Pandas for dataframe manipulation
Networkx for visualization of the Decision Tree
Combinations (in Itertools library) to generate combinations of values of variable (column)
Matplotlib for plotting graphs
Math for mathematical operations like squaring and logarithms.### Special Note
No ML algorithms like Scikit-learn, PyTorch or TensorFlow has been used. This is intentional,
so that every step of the process is coded for better understanding of the algorithm.## Methodology
I have added the Colab notebook for the code. I have deliberately done this, so that every step of the code can be run
to promote an in-depth understanding of the algorithm.## Dataset and Py Code
Click on the link: https://github.com/sushantnair/ML/tree/main/experiment_4
Use the BuyCarNew.csv dataset for best results.## Issues
There are currently **three** issues, which I'd like the learner to solve for better understanding
of the algorithm by practical experience. The issues can be found in the issues section. Plus,
**ensure that you don't use ML libraries, or change the code so much that it loses its identity**.
I have deliberately not used ML libraries so that each step of the process can be exposed to enable
better understanding for learners.## About GINI Index
The latter part of the code is the implementation of GINI Index in order to choose the appropriate
partitioning of the dataset. Earlier in the code, the rootnode for a dataset is found based on
Gain in Information and the dataset is partitioned based simply on the values of the rootnode column.
For a dataset, the node which gives maximum Information Gain is chosen as the rootnode and the dataset
is partitioned accordingly. However, there is a precise method in order to determine the dataset partitions
and that is GINI Index. It helps in determining the correct partitions. For example, for the BuyCarNew.csv
dataset the rootnode is Age. Now Age has three values Young, Middleage and Senior, so the partition
(as can be seen in the tree) has been done such that one brance is for Age Young, the middle branch is for
Age Middleage and the third branch is for Age Senior. So this may not be the best way to partition the dataset.
This is where GINI Index can be used to create better partitions. However, I am only doing concept implementation
here, so I have decided to keep the two parts (main Decision Tree part and GINI Index part) seperate.
Of course, you may go ahead and merge the two, but please don't submit a PR to modify the existing
decisiontree_gini.py code. However you surely can submit a PR to add the merged code as new code to this repository.## Documentation
Coming soon...## Licensing
GNU General Public Licence. You may use the code for free, no attribution required (although it'd be
appreciated!).Thank you very much. Please consider starring the repository if it has helped or delighted you.