Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jofaval/iris-flowers
Multilabel Classification of the famous Iris Flowers Dataset from Ronald Aylmer Fisher in 1936
https://github.com/jofaval/iris-flowers
classification data-analysis data-science data-visualization google-colab iris-flowers kaggle machine-learning python scikit-learn xgboost
Last synced: 17 days ago
JSON representation
Multilabel Classification of the famous Iris Flowers Dataset from Ronald Aylmer Fisher in 1936
- Host: GitHub
- URL: https://github.com/jofaval/iris-flowers
- Owner: jofaval
- License: gpl-3.0
- Created: 2022-07-22T20:29:11.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-07-22T20:39:37.000Z (over 2 years ago)
- Last Synced: 2024-12-09T16:53:56.607Z (2 months ago)
- Topics: classification, data-analysis, data-science, data-visualization, google-colab, iris-flowers, kaggle, machine-learning, python, scikit-learn, xgboost
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/code/jofaval/iris-flower-analysis-and-species-identification
- Size: 369 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Iris flowers Analysis and Species Identification #
[](https://colab.research.google.com/github/jofaval/iris-flowers/blob/master/notebook.ipynb)Â [](https://www.kaggle.com/code/jofaval/iris-flower-analysis-and-species-identification)
## Table of contents
1. [đ Data](#-data)
1. [đ Description](#-description)
1. [âī¸ Objective](#-objective)
1. [𧹠Tech stack](#-tech-stack)
1. [đš Algorithms](#-algorithms)
1. [đ Visualization](#-visualization)
1. [đ¤ Conclusions](#-conclusions)
1. [Šī¸ Credits](#-credits)## đ Data
[â Back to the table](#table-of-contents)The data is available at the following link:\
[https://www.kaggle.com/datasets/arshid/iris-flower-dataset](https://www.kaggle.com/datasets/arshid/iris-flower-dataset)Being it's official link:
[https://archive.ics.uci.edu/ml/datasets/iris](https://archive.ics.uci.edu/ml/datasets/iris)## đ Description
[â Back to the table](#table-of-contents)The Iris flower data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
## âī¸ Objectives
[â Back to the table](#table-of-contents)Testing different machine learning multi-label classification techniques as to predict the species of an iris flower (setosa, versicolor and virginica).
## 𧹠Tech stack
[â Back to the table](#table-of-contents)Python, that's it! R is a programming language that, as for the moment being, I have no experience with, even though it's powerful and broadly used, but I'd dare to say that no more than Python.
And one of the strongest points, if not the most, about Python, are it's libraries, so... the libraries I've used are:
- Pandas, data manipulation with an ease of use and exploration data analysis.
- Numpy, a really strong linear algebra library, used in the project for it's statistics utilities, SciPy may be an alternative, but I have no experience at all with it.
- Matplotlib and Seaborn, both fantastic libraries for data visualization, and they complement each other.
- Scikit-Learn, the library used for Machine Learning and statistics models: Linear Regression, SVR, Lasso, Ridge, etc.## đš Algorithms
[â Back to the table](#table-of-contents)I just wanted to try out how different algorithms would perform with this dataset, leaving the Support Vector Machine for last (the recomended one).
I've used: Logistic Regression, XGBoost, Random Forest, Decision Trees, K-Nearest Neighbors, the aforementioned Support Vector Machine and LightGBM (similar to XGBoost, kind of, but open-sourced by Microsoft).
They all performed as expected, as in, XGBoost did perform better than Logistic Regression, while the latter not being too far off, Random Forest did performed "worse", but it's a matter of seed, but I did wanted to stick to one seed just to focus on applying the right techniques rather than a high score. LightGBM did shocked a little by not performing similarly to XGBoost, in fact, it performed the worst from them all, but, again, a matter of the random seed, mostly.
## đ Visualization
[â Back to the table](#table-of-contents)Here there weren't too many features, just four, and they were all cleaned and without any missing value whatosever, so the main pair of visualizations here are the simple scatterplots that identify the "clusters", so to say, of the different species, by sepal and petal, being petal the one that more clearly helps identify each species on a plot.
## đ¤ Conclusions
[â Back to the table](#table-of-contents)At times may be boring, but history promises to, at least, always teach you something new, or the origins of something you knew from long. I'm seeing a lot of researches and papers being published publicly, a lot of data to play with, and it's nice to see it's a "tradition" that did start long ago.
As for the machine learning models and techniques, it is a perfect dataset, there was no surprise here about it, but it did allowed for a little bit of playground for some models and tuning. As well as plotting and visually identifying labels with the help of markers.
## Šī¸ Credits
[â Back to the table](#table-of-contents)From R.A. Fisher, 1936. Also known as Ronald Aylmer Fisher.
Uploaded at Kaggle, a company owned by Google by [https://www.kaggle.com/arshid](https://www.kaggle.com/arshid).