{"id":15105318,"url":"https://github.com/moizeali/student-exam-performance-predictor","last_synced_at":"2026-02-10T03:03:12.688Z","repository":{"id":257181048,"uuid":"854103717","full_name":"moizeali/student-exam-performance-predictor","owner":"moizeali","description":"An End-to-End MLOps implementation for predicting student exam performance using machine learning. Features automated pipelines for data ingestion, model training, and a Flask-based web interface for predictions.","archived":false,"fork":false,"pushed_at":"2024-09-15T21:55:52.000Z","size":2199,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-22T11:02:10.853Z","etag":null,"topics":["data-science","flask","machine-learning","mlops","mlops-workflow","prediction-model","python","vercel","vercel-deployment"],"latest_commit_sha":null,"homepage":"https://student-exam-performance-predictor.vercel.app","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moizeali.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-08T12:29:31.000Z","updated_at":"2024-09-15T21:55:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"93cb5344-278d-44c7-b994-be59ff7bfade","html_url":"https://github.com/moizeali/student-exam-performance-predictor","commit_stats":null,"previous_names":["moizeali/mlproject"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/moizeali/student-exam-performance-predictor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moizeali%2Fstudent-exam-performance-predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moizeali%2Fstudent-exam-performance-predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moizeali%2Fstudent-exam-performance-predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moizeali%2Fstudent-exam-performance-predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moizeali","download_url":"https://codeload.github.com/moizeali/student-exam-performance-predictor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moizeali%2Fstudent-exam-performance-predictor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270820793,"owners_count":24651534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","flask","machine-learning","mlops","mlops-workflow","prediction-model","python","vercel","vercel-deployment"],"created_at":"2024-09-25T20:23:22.899Z","updated_at":"2026-02-10T03:03:12.623Z","avatar_url":"https://github.com/moizeali.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Student Exam Performance Predictor (End-to-End MLOps Implementation)\n\nThis project demonstrates an **End-to-End MLOps implementation** for predicting a student's math score based on various factors such as gender, race/ethnicity, parental education level, lunch type, and performance in reading and writing exams. \n\nThe project covers the entire machine learning lifecycle, from data ingestion and preprocessing to model training, evaluation, and deployment, seamlessly integrating both back-end and front-end components using Flask. The goal is to provide an interactive web-based application where users can input student details and receive real-time predictions for math scores.\n\n### Key Concepts:\n- **Machine Learning Pipeline**: The model is trained using multiple regression models (e.g., Random Forest, Decision Tree, XGBoost, and CatBoost), and the best-performing model is selected through a pipeline process based on the R-squared score.\n- **Back-end (Flask)**: Flask serves as the back-end framework, handling:\n  - **Model Training**: A training pipeline that ingests raw student data, preprocesses it, trains multiple regression models, and selects the best-performing model.\n  - **Prediction Pipeline**: After the model is trained, Flask exposes an API endpoint that allows new predictions based on user inputs.\n  - **Application Logic**: Flask orchestrates the entire workflow, from data ingestion, model training, and preprocessing to serving the trained model for predictions.\n\n- **Front-end (Flask-based Web Interface)**: \n  - The front-end is designed using Flask’s templating engine, **Jinja2**, and styled with **Bootstrap**, providing an interactive and responsive interface where users can input student information.\n  - The interface includes form fields to capture the student's gender, race/ethnicity, parental education level, lunch type, and test scores in reading and writing.\n  - Upon submission, the inputs are sent to the back-end, where the model processes the data and returns the predicted math score, which is then displayed on the same page.\n\n- **MLOps Automation**: \n  - The project includes an automation script (`run_app.py`) that installs the necessary dependencies, runs the model training pipeline, starts the Flask application, and automatically opens the web app in the user's browser for predictions.\n  - The MLOps aspect focuses on streamlining the model development and deployment lifecycle, ensuring that the model can be retrained, updated, and deployed seamlessly with minimal intervention.\n\n### Overall Workflow:\n1. **Front-end (User Input)**: \n   Users input the relevant student details via the web form on the homepage. The web interface is built with Flask’s HTML templating engine and Bootstrap for styling.\n\n2. **Back-end (Model Prediction)**: \n   When the form is submitted, Flask routes the data to the back-end, where the model processes it using a pre-built pipeline to generate a math score prediction.\n\n3. **Result Display**: \n   The predicted math score is displayed on the results page, providing real-time feedback to the user based on their input.\n\nThis architecture ensures a seamless interaction between users and the underlying machine learning model, making the process intuitive and interactive, while also demonstrating a full-fledged **MLOps pipeline**.\n\n## Key Features of the Project:\n- **Data Ingestion**: Read raw data, split it into training and test datasets.\n- **Data Transformation**: Handle missing values, scale numerical features, encode categorical features.\n- **Model Training**: Train multiple models such as Random Forest, Decision Tree, XGBoost, and CatBoost. The model with the best performance is selected based on the R-squared score.\n- **Prediction Pipeline**: Take user input through a web form, preprocess the input data, and predict the math score using the trained model.\n- **Web Interface**: A Flask web application is used to provide the interface for predictions.\n- **MLOps Automation**: The project includes automation for running the training pipeline, starting the web application, and opening the web app in a browser using a single script.\n\n## Project Structure\n\n```\n.\n├── artifacts/                    # Contains saved models and preprocessing objects\n├── notebook/                     # Jupyter notebooks for data analysis and model training\n│   └── data/                     # Raw data used for model training\n│       ├── 1. EDA STUDENT PERFORMANCE.ipynb   # Notebook for exploratory data analysis\n│       └── 2. MODEL TRAINING.ipynb            # Notebook for model training and evaluation\n├── src/                          # Main source directory for all components\n│   ├── components/               # Data transformation and model training components\n│   │   ├── data_ingestion.py      # Handles data ingestion, splitting data into train and test sets\n│   │   ├── data_transformation.py # Preprocesses data (imputation, encoding, scaling)\n│   │   └── model_trainer.py       # Trains multiple models and selects the best one\n│   ├── pipeline/                 # Pipelines for training and predicting\n│   │   ├── predict_pipeline.py    # Predicts math score based on user input data\n│   │   └── train_pipeline.py      # Executes the end-to-end training pipeline\n│   ├── exception.py              # Custom exceptions for error handling\n│   ├── logger.py                 # Logger setup to track events and errors\n│   └── utils.py                  # Utility functions (saving/loading objects, evaluating models)\n├── static/                       # Static files (images, CSS, JS, etc.)\n│   └── images/                   # Folder for storing images used in the project\n│       ├── index.jpg             # Screenshot of the homepage\n│       └── result.jpg            # Screenshot of the result page\n├── venv/                         # Virtual environment\n├── app.py                        # Flask web app entry point\n├── run_app.py                    # Script to run training pipeline and start the Flask app\n├── README.md                     # Project description and instructions (this file)\n├── requirements.txt              # Python dependencies\n└── setup.py                      # Project setup for packaging\n```\n\n## Installation\n\n1. **Clone the repository**:\n    ```bash\n    git clone https://github.com/moizeali/student-exam-performance-predictor.git\n    ```\n\n2. **Navigate to the project directory**:\n    ```bash\n    cd student-exam-performance-predictor\n    ```\n\n3. **Create and activate a virtual environment**:\n\n   - For **Linux/macOS**:\n     ```bash\n     python3 -m venv env              # Create a virtual environment named 'env'\n     source env/bin/activate          # Activate the virtual environment\n     ```\n\n   - For **Windows**:\n     ```bash\n     python -m venv env               # Create a virtual environment named 'env'\n     .\\env\\Scripts\u0007ctivate           # Activate the virtual environment\n     ```\n4. **Project Dependencies**:\n\n   The project also supports dependency management using `pyproject.toml`. If you prefer using this approach for managing dependencies and project configuration, you can refer to the `pyproject.toml` file for setting up the project environment, as follows:\n\n   ```toml\n   [build-system]\n   requires = [\"setuptools\u003e=42\", \"wheel\"]\n   build-backend = \"setuptools.build_meta\"\n\n   [project]\n   name = \"student-exam-performance-predictor\"\n   description = \"A machine learning model to predict student exam performance based on various features.\"\n   readme = \"README.md\"\n   requires-python = \"\u003e=3.8\"\n\n## Running the Application\n\nTo simplify the process, use the provided `run_app.py` script, which automates the following:\n- Install the dependencies.\n- Runs the model training pipeline.\n- Starts the Flask web application.\n- Opens the web browser at `http://localhost:5000`.\n\nRun the following command to execute this script:\n```bash\npython run_app.py\n```\n\nThis will:\n- Train the model using `src/pipeline/train_pipeline.py`.\n- Start the Flask app (`app.py`).\n- Open `http://localhost:5000` in your browser automatically.\n\n## Usage\n\n### Web Interface\n1. Go to the homepage (`http://localhost:5000/`), and click **Start Prediction**.\n2. Fill out the form with student details, including gender, ethnicity, parental education, and scores for writing and reading.\n3. Submit the form to get the predicted math score.\n\n### Screenshots\n- **Homepage Screenshot**: ![Homepage](static/images/index.jpg)\n\n- **Prediction Result Screenshot**: ![Result Page](static/images/result.jpg)\n\n### Prediction Pipeline\nFor making predictions programmatically, use the `predict_pipeline.py` file. For example:\n```python\nfrom src.pipeline.predict_pipeline import CustomData, PredictPipeline\n\ndata = CustomData(\n    gender='male',\n    race_ethnicity='group A',\n    parental_level_of_education=\"bachelor's degree\",\n    lunch='standard',\n    test_preparation_course='completed',\n    reading_score=80,\n    writing_score=78\n)\n\npredict_pipeline = PredictPipeline()\npred_df = data.get_data_as_data_frame()\nprediction = predict_pipeline.predict(pred_df)\nprint(f\"Predicted Math Score: {prediction[0]}\")\n```\n\n## Files Overview\n\n### `app.py`\nThe main Flask application that serves the HTML pages and handles requests for predicting student math scores.\n\n### `run_app.py`\nThis script automates the process of:\n1. Running the model training using the `train_pipeline.py`.\n2. Starting the Flask application (`app.py`).\n3. Opening the app in your browser (`http://localhost:5000`).\n\n### `predict_pipeline.py`\n- **PredictPipeline**: Loads the trained model and preprocessing pipeline to make predictions on new data.\n- **CustomData**: Collects the input features, converts them into a pandas DataFrame, and prepares the data for the model.\n\n### `train_pipeline.py`\nThe pipeline for training models, which includes:\n- Data ingestion\n- Data transformation\n- Model training and selection\n- Saving the best model and preprocessing object\n\n### `data_transformation.py`\nThis file defines the preprocessing pipeline that handles:\n- Imputation for missing values\n- One-hot encoding for categorical features\n- Scaling for numerical features\n\n### `model_trainer.py`\nHandles the training of different regression models and hyperparameter tuning using `GridSearchCV`. It selects the best model based on the R-squared score.\n\n### `utils.py`\nUtility functions for saving/loading models and evaluating the model's performance.\n\n### `data_ingestion.py`\nHandles reading data from a CSV file, splitting it into training and test sets, and saving the datasets.\n\n## Deployment\nTo deploy the app to a platform like Heroku, you need to add the following files:\n- `Procfile` (for defining how to run the app)\n- `runtime.txt` (to specify the Python version)\n\n## Future Improvements\n- Support for more student attributes (e.g., previous academic scores).\n- Addition of more regression models to improve accuracy.\n- Enhanced feature engineering for better predictions.\n\n## License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoizeali%2Fstudent-exam-performance-predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoizeali%2Fstudent-exam-performance-predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoizeali%2Fstudent-exam-performance-predictor/lists"}