Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arif-miad/obesity-level-classification-using-machine-learning


https://github.com/arif-miad/obesity-level-classification-using-machine-learning

classification exploratory-data-analysis feature-engineering machine-learning matplotlib numpy pandas python sklearn

Last synced: 4 days ago
JSON representation

Awesome Lists containing this project

README

        

# Obesity Level Classification using Machine Learning

## πŸ“Œ Overview
This project aims to predict obesity levels in individuals from **Mexico, Peru, and Colombia** based on their eating habits and physical conditions. The dataset consists of **2111 records** with **17 attributes**, including details like BMI, family history, eating habits, physical activity, and transportation mode.

## πŸ“Š Dataset Features
- **Gender:** Male/Female
- **Age, Height, Weight:** Physical attributes
- **Eating Habits:** Frequency of high-caloric food, vegetable intake, water consumption, snacking behavior
- **Lifestyle Factors:** Smoking, alcohol consumption, exercise frequency, screen time, transportation mode
- **Target Variable:** **Obesity Level** (7 categories: Insufficient Weight, Normal Weight, Overweight I & II, Obesity I, II & III)

---
## πŸš€ Project Workflow

### **1️⃣ Exploratory Data Analysis (EDA)**
- **Univariate Analysis:** Distribution of numerical & categorical features
- **Bivariate Analysis:** Relationship between variables using scatterplots, box plots, and bar plots
- **Correlation Analysis:** Heatmaps to identify feature relationships
- **Outlier Detection:** Identifying extreme values in numerical features

### **2️⃣ Data Preprocessing & Feature Engineering**
- Encoding categorical variables using **Label Encoding**
- Feature creation: **Body Mass Index (BMI)**
- Train-test split (80-20 ratio)
- Scaling numerical features using **StandardScaler**

### **3️⃣ Machine Learning Model Training (10 Models)**
We implemented and compared **10 classification models:**
- Logistic Regression
- Random Forest
- Gradient Boosting
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Decision Tree
- NaΓ―ve Bayes
- XGBoost
- LightGBM
- CatBoost

### **4️⃣ Model Evaluation & Performance Metrics**
Each model was evaluated using:
- **Accuracy, F1 Score, ROC-AUC Score**
- **Confusion Matrix for Misclassification Analysis**
- **Classification Report for Precision, Recall, F1-Score**
- **Feature Importance Analysis** for tree-based models

### **5️⃣ Model Comparison & Optimization**
- Performance comparison across models
- Hyperparameter tuning using **GridSearchCV & RandomizedSearchCV**

---
## πŸ“Œ Results & Insights
- **Best Performing Model:** Identified based on accuracy & ROC-AUC
- **Feature Importance:** Key factors influencing obesity prediction
- **Impact of Lifestyle Factors:** Strong correlation with obesity levels

---
## πŸ“‚ Repository Structure
```
πŸ“¦ Obesity_Level_Classification
β”œβ”€β”€ πŸ“ data # Dataset & Preprocessed Files
β”œβ”€β”€ πŸ“ notebooks # Jupyter Notebooks for EDA & Modeling
β”œβ”€β”€ πŸ“ models # Trained Machine Learning Models
β”œβ”€β”€ πŸ“œ obesity_classification.py # Main Code Implementation
β”œβ”€β”€ πŸ“œ README.md # Project Documentation
```

---
## πŸ“Œ Kaggle Notebook & LinkedIn Profile
πŸ”— **Kaggle Notebook:** [Check it out here](https://www.kaggle.com/code/arifmia/predicting-obesity-levels-using-machine-learning)

πŸ”— **LinkedIn Profile:** [Connect with me](www.linkedin.com/in/arif-miah-8751bb217)

---
## πŸ’‘ Future Improvements
- **Deep Learning Approach:** Experimenting with Neural Networks
- **More Features:** Incorporating dietary habits, sleep patterns, and medical history
- **Deployment:** Creating a web-based prediction tool using Flask or Streamlit

### ⭐ **If you found this helpful, don't forget to star the repository!** ⭐