https://github.com/manraj29/student-dropout-attrition-risk
This project aims to predict the risk of student attrition by analyzing various features, such as academic performance, attendance, and involvement in extracurricular activities. By utilizing machine learning models, this project provides insights into potential risk factors for student dropout and suggests proactive measures for student retention.
https://github.com/manraj29/student-dropout-attrition-risk
feature-engineering hyperparameter-tuning ml ml-project model model-training-and-evaluation
Last synced: 3 months ago
JSON representation
This project aims to predict the risk of student attrition by analyzing various features, such as academic performance, attendance, and involvement in extracurricular activities. By utilizing machine learning models, this project provides insights into potential risk factors for student dropout and suggests proactive measures for student retention.
- Host: GitHub
- URL: https://github.com/manraj29/student-dropout-attrition-risk
- Owner: Manraj29
- Created: 2024-10-29T18:03:12.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-29T18:12:24.000Z (7 months ago)
- Last Synced: 2025-01-13T16:50:35.492Z (5 months ago)
- Topics: feature-engineering, hyperparameter-tuning, ml, ml-project, model, model-training-and-evaluation
- Language: Jupyter Notebook
- Homepage:
- Size: 593 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Student Attrition Risk Prediction
Overview
This project predicts the risk of student attrition by analyzing various features, such as academic performance, attendance, and extracurricular involvement. By using machine learning models, it identifies students at potential risk of dropping out and provides insights for timely intervention.
Dataset
The dataset used is synthetic and randomly generated, aiming to simulate real-world student data with features such as academic scores, attendance percentage, part-time job status, extracurricular activities, and weekly study hours.
Project Structure
This project includes two main files:
-
student_dropout_risk.ipynb: This file includes the final workflow for the selected models based on performance, handling the primary predictions and evaluations. -
model_selection.ipynb: This file trains multiple models (Random Forest, Gradient Boosting, Linear Regression, SVC, etc.) on the dataset. The best models for classification and regression are selected from this file and then used instudent_dropout_risk.ipynb
for further tuning and evaluation.
Methodology
The methodology followed in this project includes:
- Data Preprocessing: Cleaning and preparing synthetic data for model training.
- Feature Engineering: Creating additional features to enhance model effectiveness.
- Model Training: Testing various machine learning models for optimal performance.
- Hyperparameter Tuning: Fine-tuning model parameters to improve accuracy and minimize overfitting.
Models Used and Performance
Several models were tested and evaluated, including:
-
Random Forest Regressor: Achieved an R² score of 0.9934 with Mean Squared Error (MSE) of 0.0007, making it the best-performing regression model. -
Gradient Boosting Classifier: Provided high classification accuracy with a cross-validation score near 99%, and selected as the best classifier.
Model Performance Summary
The selected models—Random Forest Regressor for regression tasks and Gradient Boosting Classifier for classification tasks—demonstrated strong predictive performance. Random Forest obtained an R² of 0.9934, and Gradient Boosting achieved high accuracy and stability in classification tasks.
Execution Steps
- Clone the repository or download the code and open in Google Colab
- Install required dependencies
- To run, execute the cells step by step
- Input user queries for the subjects and other features. Get the results.
example:


on the right the chart is for min conditions a user should have. The pie chart is helpful for GAP anlysis.