{"id":23627408,"url":"https://github.com/sameermujahid/student_performance","last_synced_at":"2025-10-04T15:04:03.623Z","repository":{"id":258400160,"uuid":"867487948","full_name":"sameermujahid/student_performance","owner":"sameermujahid","description":"Predictive model for student performance based on various factors","archived":false,"fork":false,"pushed_at":"2024-10-16T21:21:51.000Z","size":21040,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-18T18:16:33.000Z","etag":null,"topics":["data-science","education","machine-learning"],"latest_commit_sha":null,"homepage":"https://studentperformance-rwgv44z58msw4wqhtk8iou.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sameermujahid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-04T06:54:28.000Z","updated_at":"2024-10-16T21:25:22.000Z","dependencies_parsed_at":"2024-10-18T20:37:48.948Z","dependency_job_id":"f1379438-d697-47a5-9259-afec92cd1e16","html_url":"https://github.com/sameermujahid/student_performance","commit_stats":null,"previous_names":["sameermujahid/student_performance"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sameermujahid/student_performance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sameermujahid%2Fstudent_performance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sameermujahid%2Fstudent_performance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sameermujahid%2Fstudent_performance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sameermujahid%2Fstudent_performance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sameermujahid","download_url":"https://codeload.github.com/sameermujahid/student_performance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sameermujahid%2Fstudent_performance/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278328167,"owners_count":25968901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","education","machine-learning"],"created_at":"2024-12-27T23:59:12.042Z","updated_at":"2025-10-04T15:04:03.584Z","avatar_url":"https://github.com/sameermujahid.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Student Performance Prediction\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Project Overview](#project-overview)\n- [Data Collection](#data-collection)\n- [Data Validation](#data-validation)\n- [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)\n- [Data Preprocessing](#data-preprocessing)\n- [Predictive Modeling](#predictive-modeling)\n- [Model Evaluation](#model-evaluation)\n- [Conclusion](#conclusion)\n- [Future Work](#future-work)\n- [References](#references)\n\n## Introduction\n\nThe Indian Student Performance Prediction project aims to analyze and predict students' academic performance based on various socio-demographic and academic factors. By leveraging data analysis and machine learning techniques, this project seeks to provide insights into the factors influencing student performance and to develop a predictive model that can aid educators and policymakers in making informed decisions.\n\n## Project Overview\n\nThis project encompasses several stages, including data collection, data validation, exploratory data analysis, data preprocessing, and predictive modeling. The primary goal is to predict students' grades based on features such as parental education, study time, tutoring, extracurricular activities, and more.\n\n## Data Collection\n\nThe dataset used in this project consists of various attributes related to student performance. The data was collected from [source/website or method used, e.g., surveys, educational institutions, etc.], comprising information from [number of students] students.\n\n### Dataset Features\n\nThe dataset includes the following columns:\n- **StudentID**: Unique identifier for each student.\n- **Age**: Age of the student.\n- **Gender**: Gender of the student (Male/Female).\n- **Ethnicity**: Ethnic background of the student.\n- **ParentalEducation**: Education level of the parents.\n- **StudyTimeWeekly**: Time spent studying weekly (in hours).\n- **Absences**: Number of absences from school.\n- **Tutoring**: Whether the student receives tutoring (Yes/No).\n- **ParentalSupport**: Level of parental support (High/Medium/Low).\n- **Extracurricular**: Participation in extracurricular activities (Yes/No).\n- **Sports**: Participation in sports (Yes/No).\n- **Music**: Participation in music (Yes/No).\n- **Volunteering**: Participation in volunteering activities (Yes/No).\n- **GPA**: Grade Point Average of the student.\n- **GradeClass**: Predicted class based on GPA (A, B, C, D, F).\n\n## Data Validation\n\nData validation is essential to ensure the accuracy and reliability of the dataset. The following validation checks were performed:\n\n1. **Missing Values**: Identified and addressed missing values in the dataset.\n   ```python\n   # Check for missing values\n   missing_values = dataset.isnull().sum()\n   ```\n\n2. **Data Types**: Verified that each column has the correct data type (e.g., numerical, categorical).\n   ```python\n   # Check data types\n   data_types = dataset.dtypes\n   ```\n\n3. **Unique Values**: Checked for unique values in categorical columns to ensure data consistency.\n   ```python\n   # Check unique values in categorical columns\n   unique_values = dataset['Gender'].unique()\n   ```\n\n4. **Statistical Summary**: Generated a statistical summary of numerical features to understand their distribution.\n   ```python\n   # Statistical summary\n   statistical_summary = dataset.describe()\n   ```\n\n## Exploratory Data Analysis (EDA)\n\nEDA was performed to gain insights into the dataset and visualize relationships between variables. Key steps included:\n\n1. **Visualizing Distributions**: Created histograms and box plots to visualize the distribution of numerical features (e.g., GPA, study time).\n   ```python\n   import seaborn as sns\n   import matplotlib.pyplot as plt\n\n   sns.histplot(dataset['GPA'], bins=10, kde=True)\n   plt.title('Distribution of GPA')\n   plt.show()\n   ```\n\n2. **Correlation Analysis**: Analyzed the correlation between features and the target variable (GPA) using heatmaps.\n   ```python\n   correlation_matrix = dataset.corr()\n   sns.heatmap(correlation_matrix, annot=True)\n   plt.title('Correlation Matrix')\n   plt.show()\n   ```\n\n3. **Categorical Variable Analysis**: Explored relationships between categorical variables and GPA through bar charts.\n   ```python\n   sns.barplot(x='ParentalSupport', y='GPA', data=dataset)\n   plt.title('GPA by Parental Support')\n   plt.show()\n   ```\n\n## Data Preprocessing\n\nData preprocessing is crucial for preparing the dataset for modeling. Steps included:\n\n1. **Encoding Categorical Variables**: Converted categorical variables into numerical format using one-hot encoding or label encoding.\n   ```python\n   # One-hot encoding for categorical variables\n   dataset = pd.get_dummies(dataset, columns=['Gender', 'Ethnicity', 'ParentalEducation'], drop_first=True)\n   ```\n\n2. **Handling Missing Values**: Imputed missing values using appropriate methods (e.g., mean, median, mode).\n   ```python\n   # Fill missing values with mean for numerical columns\n   dataset['StudyTimeWeekly'].fillna(dataset['StudyTimeWeekly'].mean(), inplace=True)\n   ```\n\n3. **Feature Scaling**: Normalized or standardized numerical features to bring them to a similar scale.\n   ```python\n   from sklearn.preprocessing import StandardScaler\n\n   scaler = StandardScaler()\n   dataset['GPA'] = scaler.fit_transform(dataset[['GPA']])\n   ```\n\n4. **Creating Target Variable**: Converted GPA to categorical grades (A, B, C, D, F) based on specified criteria.\n   ```python\n   def categorize_gpa(gpa):\n       if gpa \u003e= 3.5:\n           return 'A'\n       elif gpa \u003e= 3.0:\n           return 'B'\n       elif gpa \u003e= 2.5:\n           return 'C'\n       elif gpa \u003e= 2.0:\n           return 'D'\n       else:\n           return 'F'\n\n   dataset['GradeClass'] = dataset['GPA'].apply(categorize_gpa)\n   ```\n\n## Predictive Modeling\n\nThe following steps were taken to build and evaluate predictive models:\n\n1. **Splitting the Dataset**: Divided the dataset into training and testing sets.\n   ```python\n   from sklearn.model_selection import train_test_split\n\n   X = dataset.drop(['GPA', 'GradeClass', 'StudentID'], axis=1)\n   y = dataset['GradeClass']\n   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n   ```\n\n2. **Choosing Algorithms**: Selected suitable algorithms for classification (e.g., Logistic Regression, Random Forest, Decision Tree).\n   ```python\n   from sklearn.ensemble import RandomForestClassifier\n\n   model = RandomForestClassifier(n_estimators=100, random_state=42)\n   model.fit(X_train, y_train)\n   ```\n\n3. **Model Predictions**: Made predictions on the test dataset.\n   ```python\n   y_pred = model.predict(X_test)\n   ```\n\n## Model Evaluation\n\nModel performance was evaluated using metrics such as accuracy, confusion matrix, and classification report.\n\n1. **Accuracy**: Calculated the accuracy of the model.\n   ```python\n   from sklearn.metrics import accuracy_score\n\n   accuracy = accuracy_score(y_test, y_pred)\n   print(f'Accuracy: {accuracy:.2f}')\n   ```\n\n2. **Confusion Matrix**: Visualized the confusion matrix to understand model performance.\n   ```python\n   from sklearn.metrics import confusion_matrix\n   import seaborn as sns\n\n   cm = confusion_matrix(y_test, y_pred)\n   sns.heatmap(cm, annot=True, fmt='d')\n   plt.title('Confusion Matrix')\n   plt.xlabel('Predicted')\n   plt.ylabel('Actual')\n   plt.show()\n   ```\n\n3. **Classification Report**: Generated a classification report for precision, recall, and F1 score.\n   ```python\n   from sklearn.metrics import classification_report\n\n   report = classification_report(y_test, y_pred)\n   print(report)\n   ```\n\n## Conclusion\n\nThis project successfully analyzed and predicted student performance based on various factors. The predictive model developed can assist educators in identifying students who may need additional support and resources to enhance their academic success.\n\n## Future Work\n\nFuture enhancements may include:\n- Exploring additional features that could influence student performance.\n- Implementing advanced machine learning techniques such as neural networks.\n- Conducting a more detailed analysis of the impact of specific variables on student performance.\n- Integrating this model into an application for real-time predictions.\n\n## References\n1. [Kaggle Dataset](https://www.kaggle.com/datasets/rabieelkharoua/students-performance-dataset)\n2. [Wiley Online Library](https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721)\n3. [KTH Diva Portal](https://kth.diva-portal.org/smash/get/diva2:1795896/FULLTEXT01.pdf)\n4. [IRJMETS](https://www.irjmets.com/uploadedfiles/paper/issue_6_june_2023/42322/final/fin_irjmets1687680310.pdf)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameermujahid%2Fstudent_performance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsameermujahid%2Fstudent_performance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameermujahid%2Fstudent_performance/lists"}