Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/amirjahantab/iris_classification

This project analyzes the famous Iris dataset using various machine learning techniques. The goal is to classify the iris flowers into three species: Setosa, Versicolor, and Virginica based on the features provided in the dataset.
https://github.com/amirjahantab/iris_classification

classification data-science machine-learning scikit-learn

Last synced: 3 days ago
JSON representation

This project analyzes the famous Iris dataset using various machine learning techniques. The goal is to classify the iris flowers into three species: Setosa, Versicolor, and Virginica based on the features provided in the dataset.

Awesome Lists containing this project

README

        

# Iris Classification Notebook

This Jupyter notebook demonstrates the process of classifying the Iris dataset using various machine learning techniques. The Iris dataset is a classic dataset in the field of machine learning and statistics, often used for testing algorithms.

## Introduction

The Iris dataset contains 150 samples of iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. The samples belong to one of three species: Iris-setosa, Iris-versicolor, and Iris-virginica. This notebook demonstrates how to load the dataset, preprocess it, train different classifiers, evaluate their performance, and visualize the results.

## Requirements

To run the notebook, you need the following Python libraries:

- `numpy`
- `matplotlib`
- `scikit-learn`

You can install these dependencies using `pip`:

```sh
pip install numpy matplotlib scikit-learn jupyter
```

## Usage

1. Clone the repository or download the notebook file.
2. Ensure you have all the required libraries installed.
3. Open the notebook using Jupyter:

```sh
jupyter notebook iris.ipynb
```

4. Run the cells in the notebook to see the data loading, model training, evaluation, and visualization steps.

## Notebook Structure

### Data Loading and Exploration

- **Loading the Iris dataset**: The dataset is loaded using `scikit-learn`'s built-in function.
- **Basic data exploration and visualization**: Initial exploration of the dataset, including summary statistics and pair plots to visualize the relationships between features.

### Data Preprocessing

- **Splitting the dataset**: The dataset is split into training and testing sets to evaluate the performance of the models.

### Model Training

- **K-Nearest Neighbors (KNN) Classifier**: Classifier implementing the k-nearest neighbors vote.
- **Multi-layer Perceptron (MLP) classifier**:This model optimizes the log-loss function using LBFGS or stochastic
gradient descent.
### Model Evaluation

- **Evaluating the models**: The models are evaluated using accuracy score on the test set.
- **Identifying incorrect predictions**: Incorrect predictions are identified and analyzed.

### Visualization

- **Plotting the classification results**: A scatter plot of the classification results is generated, highlighting the incorrectly classified samples.

### Accuracy Scores

The accuracy scores of the different classifiers on the test set are as follows:

- **K-Nearest Neighbors (KNN)**: $94.66$ %
- **Multi-layer Perceptron classifier (MLP)**: $96.00%$ %