https://github.com/den1ksk/exploring-mental-health-data
Kaggle competition
https://github.com/den1ksk/exploring-mental-health-data
catboost data-science feature-engineering kaggle machine-learning
Last synced: about 2 months ago
JSON representation
Kaggle competition
- Host: GitHub
- URL: https://github.com/den1ksk/exploring-mental-health-data
- Owner: den1ksk
- License: mit
- Created: 2024-11-10T15:37:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-18T19:31:13.000Z (over 1 year ago)
- Last Synced: 2025-02-04T04:07:30.650Z (over 1 year ago)
- Topics: catboost, data-science, feature-engineering, kaggle, machine-learning
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/competitions/playground-series-s4e11
- Size: 6.12 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#### English Version | [Kaggle competition](https://www.kaggle.com/competitions/playground-series-s4e11)
# Mental Health and Depression Analysis
### Competition Description
This project is part of a Kaggle competition aimed at exploring factors influencing mental health and depression based on survey data. The dataset includes over 140,000 entries, covering attributes like academic and work pressure, CGPA, sleep duration, and dietary habits. The goal is to predict whether an individual is experiencing depression.
### Goal
The goal is to predict the presence of depression. For each `id` in the test set, the target variable to predict is `Depression` (1 - Yes, 0 - No).
### Approach
This project follows a structured approach to analyze mental health data and predict depression using machine learning. The key steps involved were:
1. **Data Exploration:**
- The training and testing datasets were analyzed to identify missing values and understand the distribution of key features such as `Academic Pressure`, `Work Pressure`, and `Dietary Habits`.
- Visualizations were created to examine the relationships between features like `Gender`, `Family History of Mental Illness`, and the target variable.
2. **Data Preprocessing:**
- Missing values were filled with appropriate placeholders.
- Categorical features such as `Gender` and `City` were identified for encoding and included in the modeling process.
3. **Model Selection and Training:**
- The CatBoostClassifier model was selected for its efficiency with categorical features and robust performance.
- Hyperparameters were configured for optimal results, and the model was trained on an 80-20 train-validation split.
4. **Model Evaluation:**
- The model was evaluated using classification metrics such as precision, recall, F1-score, and ROC-AUC.
- Achieved an accuracy of **94%** and a ROC-AUC score of **0.8903** on the validation set.
5. **Prediction and Submission:**
- The trained model was used to predict depression for the test dataset.
- Final predictions were saved in the required CSV format for Kaggle submission.
### Results
The model demonstrated strong predictive performance, achieving a high accuracy and a balanced evaluation across precision, recall, and F1-score.
### Files
- `MentalHealth.ipynb`: Jupyter Notebook containing the complete analysis and model implementation.
- `submission.csv`: File containing the predictions for the test dataset.
- `train.csv`: Training dataset used for model training and validation.
- `test.csv`: Test dataset used for final predictions.
### Libraries Used
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- catboost
### Acknowledgements
This project was completed as part of a Kaggle competition. Thanks to Kaggle and the competition organizers for providing the dataset and platform!