https://github.com/cintia0528/data_science-supervised_machine_learning_classification_mushrooms
This project aims to identify the inevitable trade-off between accuracy and safety when predicting poisonous mushrooms with ML.
https://github.com/cintia0528/data_science-supervised_machine_learning_classification_mushrooms
accuracy-analysis classification error-types roc-curve supervised-machine-learning treshold
Last synced: 6 months ago
JSON representation
This project aims to identify the inevitable trade-off between accuracy and safety when predicting poisonous mushrooms with ML.
- Host: GitHub
- URL: https://github.com/cintia0528/data_science-supervised_machine_learning_classification_mushrooms
- Owner: Cintia0528
- Created: 2023-12-06T19:59:05.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-06T20:23:29.000Z (almost 2 years ago)
- Last Synced: 2025-03-31T05:35:17.862Z (6 months ago)
- Topics: accuracy-analysis, classification, error-types, roc-curve, supervised-machine-learning, treshold
- Language: Jupyter Notebook
- Homepage:
- Size: 172 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Supervised Machine Learning - Classification
## Goal
To explore the practical application of ML by trying to predict poisonous mushrooms, noticing the trade off between accuracy and safety.## Overview
We are interested in keeping Catalonian mushroom foragers safe from poisonous mushrooms, and therefore our aim is to completely eliminate Type II errors.## Context
In general, the aim of fine-tuning and perfecting the algorithms is to get our accuracy close to perfection. However, this time around the emphasis is on Error Types and the delicate dance between accuracy and safety.1. Are there any ML algorithms that by default err on the side of caution?
2. Can we achieve 0 hospital cases with adjusting tresholds and exploring ROC curves?### Task:
* Import mushroom database
* Explore and analyze features
* Experiment with several ML models
* Experiment with tresholds while keeping an eye on accuracy
* Explore ROC curve
* Test our algorithm on data it has never seen before
* Rinse and repeat## Deliverables
The **Google Colab Notebook** for trying out different ML algorithms is found [here](https://github.com/Cintia0528/Project-7-Supervised-Machine-Learning-Classification-Mushrooms/blob/feedbce3b23780986154448dda560d3e3b3fa9d8/Mushrooms.ipynb) with a supporting Medium article that outlines my thinking process and practical takeaways more in detail [here](https://medium.com/@ubp0528/poisonous-mushrooms-striking-a-balance-between-accuracy-and-safety-with-machine-learning-80b77112e6dd).## Skills & Tools
1. Data Reading & Cleaning
2. Data Splitting
3. Building a Preprocessor
4. LazyPredict & Modelling
5. Error Analysis
6. Tresholds and ROC Curve analysis## Note to the Reader about my choice of models to try:
My aim after running LazyPredict was to **experiment with algorithms based on various mathematical models**.
RandomForest is a Decision Tree-based classifier, Label Propagation is a semi-supervised learning model, LGBM is a gradient boosting method, KNN groups data into “neighborhoods” based on similarities, while SVC looks for and calculates distances for the optimal hyperplane to divide the data into classes.
By exploring various methods based on different mathematical models, I was curious whether any one of them would be more or less prone to a certain error type.