Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benami171/ml_knn_decision-trees
A ml implementation comparing Decision Trees and k-Nearest Neighbors (k-NN) algorithms for Iris flower classification. Features comprehensive analysis of different approaches including brute-force and entropy-based decision trees, along with k-NN using multiple distance metrics.
https://github.com/benami171/ml_knn_decision-trees
classification cross-validation data-analysis decision-trees iris-dataset k-nearest-neighbours machine-learning nearest-neighbors python
Last synced: 23 days ago
JSON representation
A ml implementation comparing Decision Trees and k-Nearest Neighbors (k-NN) algorithms for Iris flower classification. Features comprehensive analysis of different approaches including brute-force and entropy-based decision trees, along with k-NN using multiple distance metrics.
- Host: GitHub
- URL: https://github.com/benami171/ml_knn_decision-trees
- Owner: benami171
- Created: 2025-01-16T18:56:27.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-01-20T13:51:05.000Z (29 days ago)
- Last Synced: 2025-01-20T14:41:17.098Z (29 days ago)
- Topics: classification, cross-validation, data-analysis, decision-trees, iris-dataset, k-nearest-neighbours, machine-learning, nearest-neighbors, python
- Language: Python
- Homepage:
- Size: 404 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Iris šø Classification using Decision Trees and k-NN
A comprehensive machine learning project implementing and comparing Decision Trees and k-Nearest Neighbors (k-NN) algorithms for classifying Iris flowers. This project focuses on binary classification between Versicolor and Virginica species using their petal measurements.
## š Table of Contents
- [Project Overview](#project-overview)
- [Key Features](#key-features)
- [š Project Structure](#project-structure)
- [Installation](#installation)
- [š Results and Analysis](#results-and-analysis)
- [k-NN Performance Analysis](#k-nn-performance-analysis)
- [Decision Tree Comparison](#decision-tree-comparison)
- [Usage](#usage)
- [š¬ Technical Details](#technical-details)
- [Implemented Algorithms](#implemented-algorithms)
- [Performance Metrics](#performance-metrics)
- [š¤ Contributing](#contributing)
- [š License](#license)## Project Overview
This project implements and analyzes two fundamental machine learning algorithms:
1. k-Nearest Neighbors (k-NN) with various distance metrics
2. Decision Trees with two different splitting strategies (Brute-force and Binary Entropy)The implementation uses the Iris dataset, specifically focusing on distinguishing between Versicolor and Virginica species using only their second and third features.
## Key Features
- **Advanced k-NN Implementation**:
- Multiple k values (1, 3, 5, 7, 9)
- Different distance metrics (L1, L2, Lā)
- Comprehensive error analysis across parameters
- **Dual Decision Tree Approaches**:
- Brute-force approach constructing all possible trees
- Binary entropy-based splitting strategy
- Visualizations of tree structures and decision boundaries## š Project Structure
```bash
.
āāā models/ # Core ML model implementations
ā āāā __init__.py
ā āāā decision_trees.py # Decision tree algorithms
ā āāā knn.py # k-NN implementation
āāā results/ # Generated visualizations
ā āāā decision_tree_errors.png
ā āāā decision_tree_figure1_visualization.png
ā āāā decision_tree_figure2_visualization.png
ā āāā k-NN_errors.png
āāā data_utils.py # Data handling utilities
āāā main.py # Main execution script
āāā metrics.py # Evaluation metrics
āāā visualization.py # Visualization tools
```## Installation
1. **Clone the repository**:
```bash
git clone https://github.com/yourusername/iris-classification.git
cd iris-classification
```2. **Set up a virtual environment** (recommended):
```bash
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
```3. **Install dependencies**:
```bash
pip install -r requirements.txt
```## š Results and Analysis
### k-NN Performance Analysis
The k-NN implementation was tested with various parameters:
- k values: 1, 3, 5, 7, 9
- Distance metrics: L1 (Manhattan), L2 (Euclidean), Lā (Chebyshev)> š” **Key Findings**:
> - Higher k values generally resulted in more stable predictions
> - L2 distance metric showed slightly better performance
> - Best performance achieved with k=9 using L2 distance![k-NN Error Analysis](results/k-NN_errors1.png)
### Decision Tree Comparison
Two decision tree implementations were compared:
1. **Brute-Force Approach** š:
- Error rate: 5.00%2. **Entropy-Based Approach** šÆ:
- Error rate: 7.00%![Decision Tree Structures](results/decision_tree_figure1_visualization.png)
![Decision Boundaries](results/decision_tree_figure2_visualization.png)## Usage
Run the main analysis script:
```bash
python main.py
```This will execute:
1. š„ Load and preprocess the Iris dataset
2. š Perform k-NN analysis with various parameters
3. š³ Generate decision trees using both approaches
4. š Create visualizations and error analysis## š¬ Technical Details
### Implemented Algorithms
1. **k-Nearest Neighbors**:
- Custom implementation with multiple distance metrics
- Parameter evaluation framework
- Cross-validation with 100 iterations2. **Decision Trees**:
- Brute-force tree construction
- Entropy-based splitting
- Visualization of tree structures and decision boundaries### Performance Metrics
The project employs several metrics for evaluation:
- Classification error rates
- Training vs. Test set performance
- Error difference analysis## š¤ Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes:
1. š“ Fork the repository.
2. šæ Create a new branch (`git checkout -b feature-branch`).
3. š” Commit your changes (`git commit -m 'Add new feature'`).
4. š¤ Push to the branch (`git push origin feature-branch`).
5. š Open a Pull Request.## š License
This project is licensed under the MIT License - see the LICENSE file for details.