https://github.com/manraj29/student-dropout-attrition-risk

This project aims to predict the risk of student attrition by analyzing various features, such as academic performance, attendance, and involvement in extracurricular activities. By utilizing machine learning models, this project provides insights into potential risk factors for student dropout and suggests proactive measures for student retention.
https://github.com/manraj29/student-dropout-attrition-risk

feature-engineering hyperparameter-tuning ml ml-project model model-training-and-evaluation

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/manraj29/student-dropout-attrition-risk
Owner: Manraj29
Created: 2024-10-29T18:03:12.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-10-29T18:12:24.000Z (7 months ago)
Last Synced: 2025-01-13T16:50:35.492Z (5 months ago)
Topics: feature-engineering, hyperparameter-tuning, ml, ml-project, model, model-training-and-evaluation
Language: Jupyter Notebook
Homepage:
Size: 593 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Student Attrition Risk Prediction

Overview

This project predicts the risk of student attrition by analyzing various features, such as academic performance, attendance, and extracurricular involvement. By using machine learning models, it identifies students at potential risk of dropping out and provides insights for timely intervention.

Dataset

The dataset used is synthetic and randomly generated, aiming to simulate real-world student data with features such as academic scores, attendance percentage, part-time job status, extracurricular activities, and weekly study hours.

Project Structure

This project includes two main files:

student_dropout_risk.ipynb: This file includes the final workflow for the selected models based on performance, handling the primary predictions and evaluations.

model_selection.ipynb: This file trains multiple models (Random Forest, Gradient Boosting, Linear Regression, SVC, etc.) on the dataset. The best models for classification and regression are selected from this file and then used in student_dropout_risk.ipynb for further tuning and evaluation.

Methodology

The methodology followed in this project includes:

Data Preprocessing: Cleaning and preparing synthetic data for model training.

Feature Engineering: Creating additional features to enhance model effectiveness.

Model Training: Testing various machine learning models for optimal performance.

Hyperparameter Tuning: Fine-tuning model parameters to improve accuracy and minimize overfitting.

Models Used and Performance

Several models were tested and evaluated, including:

Random Forest Regressor: Achieved an R² score of 0.9934 with Mean Squared Error (MSE) of 0.0007, making it the best-performing regression model.

Gradient Boosting Classifier: Provided high classification accuracy with a cross-validation score near 99%, and selected as the best classifier.

Model Performance Summary

The selected models—Random Forest Regressor for regression tasks and Gradient Boosting Classifier for classification tasks—demonstrated strong predictive performance. Random Forest obtained an R² of 0.9934, and Gradient Boosting achieved high accuracy and stability in classification tasks.

Execution Steps

Clone the repository or download the code and open in Google Colab

Install required dependencies

To run, execute the cells step by step

Input user queries for the subjects and other features. Get the results.

example:
![image](https://github.com/user-attachments/assets/f0a93c65-f3aa-4dba-b7c3-ac39431f55df)
![image](https://github.com/user-attachments/assets/18dc0e0d-51ea-4ff3-8d2e-b992d1fb58fd)
on the right the chart is for min conditions a user should have. The pie chart is helpful for GAP anlysis.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome