https://github.com/junayed-hasan/life-satisfaction-machine-learning
This repo contains code for predicting life satisfaction using machine learning and explainable AI, as published in Heliyon. It includes a Jupyter Notebook with data processing, model building, and result visualization using Python libraries. The analysis uses the SHILD dataset to explore factors influencing life satisfaction.
https://github.com/junayed-hasan/life-satisfaction-machine-learning
data-science data-visualization explainable-ai feature-extraction feature-selection large-language-models machine-learning tabular-data
Last synced: 7 months ago
JSON representation
This repo contains code for predicting life satisfaction using machine learning and explainable AI, as published in Heliyon. It includes a Jupyter Notebook with data processing, model building, and result visualization using Python libraries. The analysis uses the SHILD dataset to explore factors influencing life satisfaction.
- Host: GitHub
- URL: https://github.com/junayed-hasan/life-satisfaction-machine-learning
- Owner: junayed-hasan
- License: mit
- Created: 2024-09-08T00:03:11.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-01T01:46:22.000Z (about 1 year ago)
- Last Synced: 2025-07-21T05:03:06.741Z (7 months ago)
- Topics: data-science, data-visualization, explainable-ai, feature-extraction, feature-selection, large-language-models, machine-learning, tabular-data
- Language: Jupyter Notebook
- Homepage:
- Size: 13.6 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Predicting Life Satisfaction Using Machine Learning and Explainable AI
## Table of Contents
1. [Introduction](#introduction)
2. [Repository Structure](#repository-structure)
3. [Highlights](#highlights)
4. [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
5. [Dataset](#dataset)
6. [Notebook Structure](#notebook-structure)
7. [Results and Insights](#results-and-insights)
8. [Explainable AI](#explainable-ai)
9. [Ablation Studies](#ablation-studies)
10. [Citation](#citation)
11. [Contact](#contact)
12. [License](#license)
---
## Introduction
This repository accompanies the research article *"Predicting Life Satisfaction Using Machine Learning and Explainable AI"*, published in **Heliyon**. The project demonstrates how advanced machine learning and explainable AI (XAI) techniques can predict life satisfaction with high accuracy. The dataset, sourced from the SHILD survey in Denmark, provides critical insights into factors affecting well-being. The study also explores the use of large language models (LLMs) for predicting life satisfaction, achieving significant results.
**Publication Link:** [Heliyon Article](https://www.sciencedirect.com/science/article/pii/S2405844024071895)
---
## Repository Structure
```
├── Figures/ # Contains visualizations used in the notebooks
├── LICENSE # License information
├── README.md # Repository documentation
├── Predicting_Life_Satisfaction.ipynb # Main Jupyter Notebook
```
---
## Highlights
- Achieved **93.8% accuracy** and **73% macro F1-score** for predicting life satisfaction.
- Used **Recursive Feature Elimination with Cross-Validation (RFECV)** to identify 27 key determinants of life satisfaction.
- Employed **Explainable AI** techniques to ensure interpretability and transparency of predictions.
- Explored **Large Language Models (LLMs)** like BERT, BioBERT, and ClinicalBERT to predict life satisfaction using natural language sentences.
- Conducted **ablation studies** on data resampling and feature selection techniques to optimize model performance.
---
## Getting Started
### Prerequisites
- **Python 3.6+**: Download [here](https://www.python.org/downloads/)
- **Jupyter Notebook**: Install via pip:
```bash
pip install notebook
```
### Installation
Clone the repository and navigate to the directory:
```bash
git clone https://github.com/alifelham/Predicting-Life-Satisfaction-Using-Machine-Learning.git
cd Predicting-Life-Satisfaction-Using-Machine-Learning
```
Install required libraries:
```bash
pip install numpy pandas matplotlib scikit-learn seaborn missingno imbalanced-learn scikit-plot xgboost lightgbm
```
---
## Dataset
The dataset is sourced from the **SHILD (Survey of Health Impairment and Living Conditions in Denmark)**. It is publicly available under a **CC0 1.0 Universal Public Domain Dedication license**.
**Dataset Link:** [SHILD Dataset](https://doi.org/10.5061/dryad.qd2nj)
---
## Notebook Structure
1. **Data Importing and Preprocessing**: Handles missing values, categorical encoding, and outlier management.
2. **Exploratory Data Analysis**: Visualizations and data summaries.
3. **Model Building**: Implements ML models such as Random Forest, XGBoost, and LightGBM.
4. **Model Evaluation**: Uses metrics like accuracy, F1-score, precision, recall, and AUC-ROC.
5. **Results Visualization**: Displays model performance and insights.
6. **Explainable AI**: Interprets predictions using XAI techniques.
7. **Age Group Analysis**: Examines primary determinants across different age brackets.
---
## Results and Insights
### Key Performance Metrics:
| Model | Accuracy (%) | F1-Score (%) | Precision (%) | Recall (%) |
|---------------------|--------------|--------------|---------------|------------|
| Random Forest | 93.8 | 70.6 | 72.0 | 69.3 |
| Gradient Boosting | 92.2 | 70.3 | 67.9 | 73.7 |
| XGBoost | 93.0 | 68.5 | 68.7 | 68.2 |
| Ensemble (Best) | **93.6** | **73.0** | 71.9 | 74.3 |
### Insights:
- Health condition is the most critical determinant across all age groups.
- Dual data resampling (SMOTE + undersampling) improves both accuracy and F1-score.
- RFECV-based feature selection outperforms PCA-based approaches.
---
## Explainable AI
Explainable AI was employed to ensure model transparency. The framework explains how each input feature contributes to the prediction, providing actionable insights for stakeholders like policymakers and healthcare professionals.
**Example Visualization:**
*Add visualizations here for model explanations or prediction thresholds.*
---
## Ablation Studies
### Data Resampling:
The dual strategy of oversampling and undersampling led to significant improvements in model performance, achieving a balanced precision-recall tradeoff.
### Feature Selection:
RFECV selected 27 key features, surpassing PCA in both accuracy and interpretability.
---
## Citation
If you use this repository, please cite the following:
```bibtex
@article{alifelham2024lifesatisfaction,
title={Predicting life satisfaction using machine learning and explainable AI},
author={Alif Elham Khan, Mohammad Junayed Hasan, Humayra Anjum, Nabeel Mohammed, Sifat Momen},
journal={Heliyon},
year={2024},
doi={10.1016/j.heliyon.2024.e31158}
}
```
---
## Contact
For questions or collaboration, contact:
- **Alif Elham Khan**: [alif.khan1@northsouth.edu](mailto:alif.khan1@northsouth.edu)
- **Mohammad Junayed Hasan**: [mohammad.hasan5@northsouth.edu](mailto:mohammad.hasan5@northsouth.edu)
- **Humayra Anjum**: [humayra.anjum@northsouth.edu](mailto:humayra.anjum@northsouth.edu)
---
## License
This repository is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.