https://github.com/ravesandstorm/id3-python-stepwise
Detailed Entropy and Information Gain Calculation for ID3 Algorithm
https://github.com/ravesandstorm/id3-python-stepwise
entropy id3 id3-algorithm information-gain jupyter jupyter-notebook pandas pandas-dataframe pandas-library script
Last synced: about 2 months ago
JSON representation
Detailed Entropy and Information Gain Calculation for ID3 Algorithm
- Host: GitHub
- URL: https://github.com/ravesandstorm/id3-python-stepwise
- Owner: ravesandstorm
- License: apache-2.0
- Created: 2025-01-13T15:49:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-13T18:20:22.000Z (about 1 year ago)
- Last Synced: 2025-01-13T19:21:57.706Z (about 1 year ago)
- Topics: entropy, id3, id3-algorithm, information-gain, jupyter, jupyter-notebook, pandas, pandas-dataframe, pandas-library, script
- Language: Jupyter Notebook
- Homepage:
- Size: 18.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ID3-Python-Stepwise
### Project Description: ID3 Algorithm Implementation in Python
This project implements the **ID3 (Iterative Dichotomiser 3)** algorithm in Python for decision tree learning. The algorithm uses entropy and information gain as metrics to build a decision tree from a dataset. This implementation provides step-by-step calculation of the entropy and information gain for better verification and understanding.
---
#### Major Steps:
1. **Input Data:** The script takes a string input for column names and data rows.
2. **Preprocessing:** The data is parsed into a DataFrame for easy manipulation and processing.
3. **Entropy Calculation:**
- Calculates the overall entropy of the target variable.
- Calculates conditional entropy for each unique value of the input attributes.
4. **Information Gain:**
- Uses the entropy values to calculate the information gain for each attribute.
- Outputs the formulas and intermediate steps for transparency.
5. **Decision Tree Logic:** The attribute with the highest information gain is selected for the split.
---
#### How to Use:
1. Clone this repository:
```bash
git clone https://github.com/ravesandstorm/ID3-Python-Stepwise.git
```
2. Open the script in your Python environment.
3. Modify the `columns` and `data` variables to input your dataset.
4. Run the script to view entropy and information gain calculations for all the data.
---
#### Example:
Given the following dataset:
| Weather | Parents | Cash | Exam | Decision |
|---------|----------|------|------|-----------|
| sunny | visit | rich | yes | cinema |
| windy | no-visit | rich | no | shopping |
The script calculates:
- Entropy of the dataset
- Conditional entropy for each attribute
- Information gain for each attribute
Sample output:
```
Entropy of Data = 0.9852 = -(6/11)*log(6/11) - (5/11)*log(5/11)
Entropy for 'sunny' = 0.9183 = -(4/6)*log(4/6) - (2/6)*log(2/6)
...
Info. Gain for 'Weather' = 0.1934 = 0.985 - [(6/11)*0.9183 + (5/11)*0.7222]
```
---
#### Dependencies:
- Python 3.8+
- Pandas
- NumPy
- Python Math library
---
Feel free to contribute, report issues, or suggest improvements! 😊