https://github.com/khushirajurkar/exoplanet-habitability-prediction-model
Predicts whether an exoplanet is habitable using ML. Handles class imbalance with ADASYN, tests multiple models, and saves the best one. Includes confusion matrices, ROC curves, and a clean Jupyter notebook
https://github.com/khushirajurkar/exoplanet-habitability-prediction-model
adasyn astroinformatics confusion-matrix exoplanets logistic-regression machine-learning multiclass-classification python roc-curve scikit-learn smote
Last synced: 3 days ago
JSON representation
Predicts whether an exoplanet is habitable using ML. Handles class imbalance with ADASYN, tests multiple models, and saves the best one. Includes confusion matrices, ROC curves, and a clean Jupyter notebook
- Host: GitHub
- URL: https://github.com/khushirajurkar/exoplanet-habitability-prediction-model
- Owner: KhushiRajurkar
- License: mit
- Created: 2025-04-05T07:22:23.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-06-08T04:14:17.000Z (4 months ago)
- Last Synced: 2025-07-20T08:03:06.675Z (3 months ago)
- Topics: adasyn, astroinformatics, confusion-matrix, exoplanets, logistic-regression, machine-learning, multiclass-classification, python, roc-curve, scikit-learn, smote
- Language: Jupyter Notebook
- Homepage:
- Size: 7.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Exoplanet-Habitability-Prediction-Model
A machine learning pipeline to predict whether an exoplanet is habitable using pre-processed astronomical features. This project explores multiple classification models, handles class imbalance with resampling techniques, and evaluates performance using robust metrics and visualization.
---
## Overview
This project applies advanced ML techniques to classify exoplanets into:
- Not Habitable (0)
- Potentially Habitable (1)
- Habitable (2)It includes:
- GridSearchCV for hyperparameter tuning
- Logistic Regression, SVM, MLP, KNN models
- ADASYN, SMOTE, SMOTE-Tomek, ClusterCentroids samplers
- ROC curves, confusion matrices, and F1/Recall comparison---
## Dataset
- The dataset is derived from exoplanet observational features and cleaned for modeling.
- Final file: [`hwc.xlsx`](Dataset/hwc.xlsx)
- Contains ~99 features with target label: `P_HABITABLE`To convert to `.csv`:
```python
import pandas as pd
pd.read_excel('hwc.xlsx').to_csv('hwc.csv', index=False)
```## Modeling Approach
The modeling pipeline follows these key stages:
1. **Preprocessing**
- Missing values handled
- Feature scaling with `StandardScaler`2. **Resampling**
- Addressed class imbalance using `ADASYN`
- Oversampled rare classes to match dominant ones3. **Model Training**
- Best results achieved using **Logistic Regression**
- Hyperparameters tuned via `GridSearchCV`4. **Evaluation Metrics**
- F1 Score (Macro & Weighted)
- Recall
- ROC AUC (for each class and overall)5. **Output Artifacts**
- Saved model
- Confusion matrices
- Metrics export## 📈 Performance Summary
The best-performing combination was **Logistic Regression + ADASYN**, achieving strong metrics across the board despite the imbalanced dataset.
### 🔍 Key Results:
| Metric | Score |
|-------------------|-----------|
| **Accuracy** | 99% |
| **Macro F1 Score**| 72% |
| **Macro Recall** | 72% |
| **Weighted F1** | 99% |
| **Overall AUC** | 0.993 |- **Class 0 (Not Habitable):** Precision = 1.00, Recall = 0.99
- **Class 1 (Habitable Class 1):** Precision = 0.20, Recall = 0.33
- **Class 2 (Habitable Class 2):** Precision = 0.62, Recall = 0.83## Confusion Matrices
Visual evaluations for all model + sampler combinations are stored in the folder: `Confusion Matrices`
Each image follows the naming format:
`__ConfusionMatrix.png`