{"id":48147499,"url":"https://github.com/ihuzaifashoukat/student-performance-analysis","last_synced_at":"2026-04-04T17:01:42.965Z","repository":{"id":338605627,"uuid":"1158437728","full_name":"ihuzaifashoukat/student-performance-analysis","owner":"ihuzaifashoukat","description":"Professional Data Science project analyzing student performance factors using XGBoost, SHAP implementation, and K-Means Clustering for student segmentation.","archived":false,"fork":false,"pushed_at":"2026-02-15T11:26:30.000Z","size":1423,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-15T17:12:52.042Z","etag":null,"topics":["analytics","clustering","data-science","education","machine-learning","python","shap","student-performance","visualization","xgboost"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ihuzaifashoukat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-15T11:23:33.000Z","updated_at":"2026-02-15T11:27:34.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ihuzaifashoukat/student-performance-analysis","commit_stats":null,"previous_names":["ihuzaifashoukat/student-performance-analysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ihuzaifashoukat/student-performance-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fstudent-performance-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fstudent-performance-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fstudent-performance-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fstudent-performance-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ihuzaifashoukat","download_url":"https://codeload.github.com/ihuzaifashoukat/student-performance-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fstudent-performance-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31407391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","clustering","data-science","education","machine-learning","python","shap","student-performance","visualization","xgboost"],"created_at":"2026-04-04T17:01:40.378Z","updated_at":"2026-04-04T17:01:42.724Z","avatar_url":"https://github.com/ihuzaifashoukat.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Student Academic Performance Analysis\n\n## Overview\n\nThis repository contains an advanced data science project focused on analyzing and predicting student academic performance. Utilizing a comprehensive dataset of factors influencing student success, we employ state-of-the-art machine learning techniques to identify key performance drivers and segment the student population into actionable personas.\n\nThe project demonstrates a production-grade data science workflow, including modular code architecture, automated data acquisition, rigorous exploratory data analysis (EDA), predictive modeling with XGBoost, and model interpretability using SHAP (SHapley Additive exPlanations).\n\n## Dataset\n\nThe analysis is based on the **Student Performance Factors** dataset, sourced from Kaggle.\n\n*   **Source**: [Student Performance Dataset on Kaggle](https://www.kaggle.com/datasets/ayeshasiddiqa123/student-perfirmance)\n*   **Description**: The dataset includes variables such as attendance, hours studied, parental involvement, access to resources, and various other socio-economic factors.\n*   **Target Variable**: `Exam_Score`\n\n## Key Features\n\n*   **Automated Data Pipeline**: Scripts to automatically download, validate, and preprocess data using `kagglehub`.\n*   **Advanced EDA**: Comprehensive univariate and bivariate analysis to uncover initial correlations and data distributions.\n*   **Predictive Modeling**: Implementation of Ensemble methods (XGBoost, Random Forest) with Hyperparameter tuning via RandomizedSearchCV to predict exam scores with high accuracy ($R^2 \\approx 0.75$).\n*   **Model Interpretability**: Integration of SHAP values to provide global and local explanations for model predictions, offering transparency into *why* a student is predicted to achieve a certain score.\n*   **Student Segmentation**: Unsupervised learning (K-Means Clustering) to identify distinct student profiles (e.g., \"High Potentials\", \"At Risk\") based on behavioral patterns.\n\n## Repository Structure\n\n```text\n.\n├── analysis/               # Analysis artifacts\n│   ├── plots/              # Generated visualizations (SHAP, Clustering, EDA)\n│   ├── student_performance.csv # Local copy of the dataset (downloaded)\n│   └── analysis_results.md # Detailed Markdown report of findings\n├── src/                    # Source code modules\n│   ├── __init__.py\n│   ├── loader.py           # Data loading and validation logic\n│   ├── preprocess.py       # Scikit-learn pipelines for transformation\n│   ├── model.py            # Model training and evaluation\n│   ├── analysis.py         # SHAP and Clustering logic\n│   └── vis.py              # Visualization utilities\n├── main.py                 # Main entry point for the analysis pipeline\n├── requirements.txt        # Project dependencies\n├── LICENSE                 # MIT License\n└── README.md               # Project documentation\n```\n\n## Installation\n\n### Prerequisites\n\n*   Python 3.8+\n*   pip\n\n### Setup\n\n1.  Clone the repository:\n    ```bash\n    git clone https://github.com/ihuzaifashoukat/student-performance-analysis.git\n    cd student-performance-analysis\n    ```\n\n2.  Install dependencies:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n## Usage\n\nTo execute the full analysis pipeline, including data download, processing, training, and report generation, run:\n\n```bash\npython main.py\n```\n\nThe script will:\n1.  Download the dataset if not present.\n2.  Clean and preprocess the data.\n3.  Train the XGBoost regressor.\n4.  Generate performance metrics (RMSE, MAE, R2).\n5.  Save SHAP and clustering visualizations to `analysis/plots/`.\n6.  Print a summary of cluster characteristics to the console.\n\n## Results Summary\n\nOur analysis identified **Attendance** and **Hours Studied** as the most critical determinants of academic success.\n\n*   **Model Performance**: The XGBoost model achieved an $R^2$ of 0.75.\n*   **Insights**:\n    *   Attendance has the strongest positive correlation with exam scores.\n    *   Students falling into the \"At Risk\" cluster (Low Attendance, Low Study Hours) score significantly lower on average (approx. 64.7) compared to the \"High Performer\" cluster (approx. 69.3).\n\nFor a detailed breakdown of findings, refer to [analysis/analysis_results.md](analysis/analysis_results.md).\n\n## Contributing\n\nContributions are welcome. Please refer to `CONTRIBUTING.md` for guidelines on how to submit improvements or bug fixes.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihuzaifashoukat%2Fstudent-performance-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fihuzaifashoukat%2Fstudent-performance-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihuzaifashoukat%2Fstudent-performance-analysis/lists"}