https://github.com/dms-codes/decision-tree-drugs
Decision Tree Classifier with Grid Search and Visualization
https://github.com/dms-codes/decision-tree-drugs
decison-trees machine-learning python
Last synced: 2 months ago
JSON representation
Decision Tree Classifier with Grid Search and Visualization
- Host: GitHub
- URL: https://github.com/dms-codes/decision-tree-drugs
- Owner: dms-codes
- Created: 2024-12-08T06:36:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-08T06:50:03.000Z (over 1 year ago)
- Last Synced: 2025-12-26T21:52:54.483Z (6 months ago)
- Topics: decison-trees, machine-learning, python
- Language: Python
- Homepage: https://github.com/dms-codes/decision-tree-drugs
- Size: 118 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Decision Tree Classifier with Grid Search and Visualization
This project demonstrates how to build, train, and evaluate a **Decision Tree Classifier** using **GridSearchCV** for hyperparameter tuning. The dataset used is the `drug200.csv`, which contains information about drug prescriptions based on patient data. The code includes data preprocessing, model training, evaluation, and decision tree visualization.
---
## Features
- **Data Loading and Cleaning**: Reads data from a CSV file and handles missing values.
- **Preprocessing**:
- Encodes categorical variables.
- Scales numerical features for optimal performance.
- **Model Training**:
- Uses `DecisionTreeClassifier` with hyperparameter tuning via `GridSearchCV`.
- **Evaluation**:
- Calculates accuracy, classification report, and confusion matrix.
- **Visualization**:
- Visualizes the decision tree using `matplotlib` and `plot_tree`.
---
## Requirements
To run this code, you need the following Python packages installed:
- `pandas`
- `numpy`
- `scikit-learn`
- `matplotlib`
You can install the required packages using:
```bash
pip install pandas numpy scikit-learn matplotlib
```
---
## File Structure
```plaintext
.
├── data/
│ └── drug200.csv # Dataset file
├── decision_tree.py # Main script containing the code
└── README.md # Documentation
```
---
## Usage
### 1. Dataset
Ensure the `drug200.csv` file is located in the `data/` directory. The dataset should include the following columns:
- **Features**: Various patient-related data.
- **Target Column (`Drug`)**: The drug prescribed (e.g., `drugA`, `drugB`, `drugC`, `drugX`, `drugY`).
---
### 2. Running the Code
1. Clone this repository or download the code.
2. Navigate to the directory containing the script.
3. Run the script using:
```bash
python decision_tree.py
```
---
## Output
### 1. Console Outputs:
- **Best Hyperparameters**: Displays the best hyperparameter combination found by `GridSearchCV`.
- **Evaluation Metrics**:
- Accuracy
- Classification Report
- Confusion Matrix
### 2. Visualization:
A decision tree plot displaying:
- Feature splits
- Feature importance
- Class labels
Example:

---
## Customization
### File Paths:
Modify the `DATA_DIR` and `DATA_FILE` constants to change the dataset path:
```python
DATA_DIR = "data"
DATA_FILE = "drug200.csv"
```
### Target Column:
Update the `TARGET_COLUMN` constant to specify the target column name:
```python
TARGET_COLUMN = "Drug"
```
### Class Labels:
Update the `class_names` parameter in the visualization function for your dataset's target labels:
```python
class_names=['drugA', 'drugB', 'drugC', 'drugX', 'drugY']
```
---
## Dependencies and Version Information
This code has been tested with the following versions:
- Python 3.8+
- scikit-learn 1.2+
- pandas 1.4+
- matplotlib 3.5+
---
## Acknowledgments
- **Dataset**: The `drug200.csv` dataset used in this project.
- **Libraries**: This project leverages the power of scikit-learn and matplotlib for machine learning and visualization.
---
## License
This project is licensed under the MIT License. You are free to use, modify, and distribute this code for any purpose.
---
## Author
If you have any questions, feel free to reach out! 😊