https://github.com/khushirajurkar/exoplanet-habitability-prediction-model

Predicts whether an exoplanet is habitable using ML. Handles class imbalance with ADASYN, tests multiple models, and saves the best one. Includes confusion matrices, ROC curves, and a clean Jupyter notebook
https://github.com/khushirajurkar/exoplanet-habitability-prediction-model

adasyn astroinformatics confusion-matrix exoplanets logistic-regression machine-learning multiclass-classification python roc-curve scikit-learn smote

Last synced: 3 days ago
JSON representation

Host: GitHub
URL: https://github.com/khushirajurkar/exoplanet-habitability-prediction-model
Owner: KhushiRajurkar
License: mit
Created: 2025-04-05T07:22:23.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-06-08T04:14:17.000Z (4 months ago)
Last Synced: 2025-07-20T08:03:06.675Z (3 months ago)
Topics: adasyn, astroinformatics, confusion-matrix, exoplanets, logistic-regression, machine-learning, multiclass-classification, python, roc-curve, scikit-learn, smote
Language: Jupyter Notebook
Homepage:
Size: 7.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Exoplanet-Habitability-Prediction-Model

A machine learning pipeline to predict whether an exoplanet is habitable using pre-processed astronomical features. This project explores multiple classification models, handles class imbalance with resampling techniques, and evaluates performance using robust metrics and visualization.

---

## Overview

This project applies advanced ML techniques to classify exoplanets into:

- Not Habitable (0)

- Potentially Habitable (1)

- Habitable (2)

It includes:

- GridSearchCV for hyperparameter tuning  

- Logistic Regression, SVM, MLP, KNN models  

- ADASYN, SMOTE, SMOTE-Tomek, ClusterCentroids samplers  

- ROC curves, confusion matrices, and F1/Recall comparison

---

## Dataset

- The dataset is derived from exoplanet observational features and cleaned for modeling.

- Final file: [`hwc.xlsx`](Dataset/hwc.xlsx)

- Contains ~99 features with target label: `P_HABITABLE`

To convert to `.csv`:

```python

import pandas as pd

pd.read_excel('hwc.xlsx').to_csv('hwc.csv', index=False)

```

## Modeling Approach

The modeling pipeline follows these key stages:

1. **Preprocessing**  

   - Missing values handled

   - Feature scaling with `StandardScaler`

2. **Resampling**  

   - Addressed class imbalance using `ADASYN`

   - Oversampled rare classes to match dominant ones

3. **Model Training**  

   - Best results achieved using **Logistic Regression**

   - Hyperparameters tuned via `GridSearchCV`

4. **Evaluation Metrics**  

   - F1 Score (Macro & Weighted)

   - Recall

   - ROC AUC (for each class and overall)

5. **Output Artifacts**  

   - Saved model

   - Confusion matrices

   - Metrics export

## 📈 Performance Summary

The best-performing combination was **Logistic Regression + ADASYN**, achieving strong metrics across the board despite the imbalanced dataset.

### 🔍 Key Results:

| Metric            | Score     |

|-------------------|-----------|

| **Accuracy**      | 99%       |

| **Macro F1 Score**| 72%       |

| **Macro Recall**  | 72%       |

| **Weighted F1**   | 99%       |

| **Overall AUC**   | 0.993     |

- **Class 0 (Not Habitable):** Precision = 1.00, Recall = 0.99  

- **Class 1 (Habitable Class 1):** Precision = 0.20, Recall = 0.33  

- **Class 2 (Habitable Class 2):** Precision = 0.62, Recall = 0.83

## Confusion Matrices

Visual evaluations for all model + sampler combinations are stored in the folder: `Confusion Matrices`

Each image follows the naming format:  

`__ConfusionMatrix.png`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/khushirajurkar/exoplanet-habitability-prediction-model

Awesome Lists containing this project

README