https://github.com/annaanastasy/mushroom-binary-classification-eda-ml
Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.
https://github.com/annaanastasy/mushroom-binary-classification-eda-ml
binary-classification data data-cleaning-and-preprocessing data-science exploratory-data-analysis machine-learning-algorithms xgboost-classifier
Last synced: about 1 year ago
JSON representation
Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.
- Host: GitHub
- URL: https://github.com/annaanastasy/mushroom-binary-classification-eda-ml
- Owner: AnnaAnastasy
- Created: 2024-08-12T17:47:57.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-21T14:54:46.000Z (over 1 year ago)
- Last Synced: 2025-02-03T13:43:55.906Z (over 1 year ago)
- Topics: binary-classification, data, data-cleaning-and-preprocessing, data-science, exploratory-data-analysis, machine-learning-algorithms, xgboost-classifier
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/code/annastasy/ps4e8-data-cleaning-and-eda-of-mushrooms
- Size: 4.06 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Mushroom Classification: Data Cleaning, EDA, and Machine Learning Model
This project was part of a competitive data science challenge, aiming to classify mushroom species as edible or poisonous based on various features. The focus was on rigorous data cleaning, exploratory data analysis (EDA), and extracting meaningful insights to support accurate classification models.
## Project Overview
Mushrooms are a fascinating and diverse group of organisms, but some species can be highly toxic. This competition provided an opportunity to analyze a detailed dataset and uncover patterns that differentiate edible mushrooms from poisonous ones. Our efforts focused on:
- **Data Cleaning**: Addressed missing or inconsistent values and ensured data readiness for analysis.
- **EDA**: Used advanced visualization techniques to uncover trends, patterns, and feature correlations.
- **Machine Learning Models**: Built and evaluated multiple models to classify mushrooms as edible or poisonous.
## Dataset
The dataset used for this project is available on Kaggle. Please download it from the following [link](https://www.kaggle.com/competitions/playground-series-s4e8).
After downloading, ensure the dataset is placed in the same folder as the project.
## Results
### Exploratory Data Analysis:
- Identified key features influencing mushroom edibility (e.g., odor, gill size).
- Visualized feature distributions and correlations.
### Model Performance:
- Achieved high accuracy with Gradient Boosting model (**XGBoost**).