An open API service indexing awesome lists of open source software.

https://github.com/daniel-elston/k-means-clustering-algorithm

Creation and implementation of k-means unsupervised machine learning algorithm, used to classify Iris species (Iris dataset).
https://github.com/daniel-elston/k-means-clustering-algorithm

artificial-intelligence classification clustering data-science deep-learning iris-classification k-means-clustering machine-learning machine-learning-algorithms python unsupervised-machine-learning

Last synced: 3 months ago
JSON representation

Creation and implementation of k-means unsupervised machine learning algorithm, used to classify Iris species (Iris dataset).

Awesome Lists containing this project

README

        



image



Unsupervised Machine Learning Algorithm for Iris Species Classification


## Table of contents
- [Status and Details](#status-and-details)
- [Technology](#technology)
- [Introduction](#introduction)
- [Project Description](#project-description)
- [Objectives](#objectives)
- [Data Science Methodology](#data-science-methodology)
- [Problem Formulation](#problem-formulation)
- [Data Engineering Methods](#data-engineering-methods)
- [Conclusions](#conclusions)
- [Contributing Members and Contacts](#contributing-members-and-contacts)

## Status and Details
- **Project Status**: [Completed]
- **Date Coded**: 20/10/21
- **Link to Raw Data**: The raw data is loaded in the first cell of the ipynb file

## Technology
- **Language**: Python 3.3
- **Libraries**: sys, pandas, sklearn, csv, numpy, matplotlib, seaborn, unittest, time
- **Set up File**: N/A

## Introduction
The purpose of this project is to build an unsupervised machine learning algorithm capable of classification from scratch. A k-means clustering algorithm has been built for the purpose of deciding which species of iris flower each observation belongs to. A test harness is set up

### Project Description
The dataset used for this project was the famous Iris dataset. The dataset consists of four features. Petal width and length, and sepal width and length. The dimensions are measured in centimeters to one significant figure. There are 150 observations.

K-means clustering algorithms are created from scratch to enable classification of the dataset. When the features are plotted, there are two blobs of clusters. One round and on its own, and another more distorted blob. The k-means clustering algorithm is implemented until the centroids update no more and a classification has been made.

### Objectives
- Implement an unsupervised machine learning algorithm
- Build a test harness
- Use k-means clustering algorithms to classify points
- Identify iris flower species

## Data Science Methodology

### Problem Formulation
Provide a means of classifying the Iris plant species using Python. An algorithm is constructed that can be applied in many classification scenarios. This can be scaled up, to allow machine to classify or decide what they are 'looking' at. Classification algorithms have progressed to so far that they are now capable of safely driving a person to and from destinations. The Tesla self driving car has classification recognition software deeply embedded in its systems.

### Data Engineering Methods
- Data Visualisation
- Machine Learning
- Classification
- Algorithm architecture
- Test driven development

## Conclusions
The algorithm is tested on a small dataset and works perfectly. When applied to the iris dataset, it performs very well. The algorithm clearly classifies 3 distinct blobs of points. There are approximately <5 obnservations that are misclassifed from the green and yellow blobs on figure 5. This results in an accuracy of 97%.

## Contributing Members and Contacts
**Team Lead: [Daniel Elston](https://github.com/Daniel-Elston)**

|Name | GitHub Handles |
|---------|-----------------|
| Daniel Elston | [Git DE](https://github.com/Daniel-Elston) |

Please feel free to contact me if you have any questions, require any further information or wish to contribute.

Email 1: [email protected]

Email 2: [email protected]