{"id":18888464,"url":"https://github.com/sahiltiwariiii/dssp","last_synced_at":"2026-03-27T04:26:09.488Z","repository":{"id":242101240,"uuid":"808684830","full_name":"sahilTiwariiii/Dssp","owner":"sahilTiwariiii","description":"Predicting student math scores ! This project utilizes advanced machine learning techniques and MLOps tools like DVC and MLflow to predict a student's math score based on various factors such as gender, race/ethnicity, parental level of education, lunch type, test preparation course, writing etc","archived":false,"fork":false,"pushed_at":"2024-06-07T05:21:46.000Z","size":1699,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-03T12:26:25.382Z","etag":null,"topics":["docker","dotenv","dvc","flask","machine-learning","mlflow","mlops","mysql","mysql-connector-python","numpy","pandas","pymysql","python","python-dotenv","scikit-learn","seaborn","sklearn-library","statistics","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sahilTiwariiii.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-31T15:35:28.000Z","updated_at":"2024-06-07T05:21:49.000Z","dependencies_parsed_at":"2024-11-08T07:44:27.647Z","dependency_job_id":"8cb8daaf-501b-4d1c-8247-78522e7d7f67","html_url":"https://github.com/sahilTiwariiii/Dssp","commit_stats":null,"previous_names":["sahiltiwariiii/dssp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sahilTiwariiii/Dssp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahilTiwariiii%2FDssp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahilTiwariiii%2FDssp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahilTiwariiii%2FDssp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahilTiwariiii%2FDssp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sahilTiwariiii","download_url":"https://codeload.github.com/sahilTiwariiii/Dssp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahilTiwariiii%2FDssp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31018707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-27T03:51:26.850Z","status":"ssl_error","status_checked_at":"2026-03-27T03:51:09.693Z","response_time":164,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","dotenv","dvc","flask","machine-learning","mlflow","mlops","mysql","mysql-connector-python","numpy","pandas","pymysql","python","python-dotenv","scikit-learn","seaborn","sklearn-library","statistics","streamlit"],"created_at":"2024-11-08T07:44:16.994Z","updated_at":"2026-03-27T04:26:09.449Z","avatar_url":"https://github.com/sahilTiwariiii.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# End to End Data Science Project\n#Student Exam Performance Indicator\n\nProject Overview\nThis project aims to predict a student's math score based on various factors such as gender, race/ethnicity, parental level of education, lunch type, test preparation course, writing score, and reading score. The project uses MLOps tools like DVC for data versioning control and MLflow for tracking experiments, models, and deployment.\n\n\nTable of Contents\n-\u003eIntroduction\n-\u003eDataset\n-\u003eProject Structure\n-\u003eMLOps Tools\n1)DVC (Data Version Control)\n2)MLflow\n-\u003eInstallation\n-\u003eUsage\n-\u003eResults\n-\u003eConclusion\n-\u003eContact\n\n\n# 📊 Project Data Description\n\n## 🗂️ Data Collection\n\n- **Dataset Source**: [Students Performance in Exams](https://www.kaggle.com/datasets/spscientist/students-performance-in-exams?datasetId=74977)\n- **Data Dimensions**: The dataset contains **1000 rows** and **8 columns**.\n\n## 📋 Data Columns\n\n1. **👤 gender**: Gender of the student (e.g., male, female)\n2. **🌍 race_ethnicity**: Race/ethnicity group of the student (e.g., Group A, Group B, etc.)\n3. **🎓 parental_level_of_education**: Highest level of education attained by the student's parents (e.g., high school, bachelor's degree)\n4. **🥪 lunch**: Type of lunch received (e.g., standard, free/reduced)\n5. **📚 test_preparation_course**: Participation in test preparation course (e.g., completed, none)\n6. **🧮 math_score**: Score obtained in the math exam (dependent variable)\n7. **📖 reading_score**: Score obtained in the reading exam\n8. **✍️ writing_score**: Score obtained in the writing exam\n\n## 🔍 Independent and Dependent Variables\n\n- **Independent Variables (X)**: All columns except 'math_score'\n  - 👤 `gender`\n  - 🌍 `race_ethnicity`\n  - 🎓 `parental_level_of_education`\n  - 🥪 `lunch`\n  - 📚 `test_preparation_course`\n  - 📖 `reading_score`\n  - ✍️ `writing_score`\n\n- **Dependent Variable (y)**: 🧮 `math_score`\n\nThis dataset will be used to analyze and predict student performance in mathematics based on various demographic and educational factors.\n\n## 🚀 Model Training and Evaluation\n\nFor this project, we trained and evaluated several regression models to predict student performance in mathematics. The models used are:\n\n- **Linear Regression**\n- **Lasso**\n- **Ridge**\n- **K-Neighbors Regressor**\n- **Decision Tree**\n- **Random Forest Regressor**\n- **XGBRegressor**\n- **CatBoosting Regressor**\n- **AdaBoost Regressor**\n\n### 📊 Performance Metrics\n\nTo assess the performance of these models, we used the following metrics:\n- **Root Mean Squared Error (RMSE)**\n- **Mean Absolute Error (MAE)**\n- **R² Score**\n\n### 🏆 Results\n\nHere are the R² scores for each model:\n\n| Model Name               | R² Score  |\n|--------------------------|-----------|\n| Ridge                    | 0.880593  |\n| Linear Regression        | 0.880345  |\n| CatBoosting Regressor    | 0.851632  |\n| AdaBoost Regressor       | 0.849847  |\n| Random Forest Regressor  | 0.847291  |\n| Lasso                    | 0.825320  |\n| XGBRegressor             | 0.821589  |\n| K-Neighbors Regressor    | 0.783813  |\n| Decision Tree            | 0.760313  |\n\nThese results indicate that the **Ridge** regression model performed the best, closely followed by **Linear Regression**.\n\n\n\n\n\n## Introduction\n\nPredicting student performance is a critical aspect of educational systems. By leveraging machine learning techniques, this project aims to provide insights into how various factors influence student performance, particularly in mathematics.\n\n\n### Dataset\n\nThe dataset used in this project includes the following features:\n\n-\u003eGender: The gender of the student.\n-\u003eRace/Ethnicity: The race or ethnicity of the student.\n-\u003eParental Level of Education: The highest level of education attained by the student's parents.\n-\u003eLunch Type: The type of lunch the student receives.\n-\u003eTest Preparation Course: Whether the student completed a test preparation course.\n-\u003eWriting Score: The student's writing score (out of 100).\n-\u003eReading Score: The student's reading score (out of 100).\n\n\n  The target variable is the **Math Score**.\n\n\n## Project Structure\n\nStudent-Exam-Performance-Indicator/\n├── data/\n│   ├── raw/\n│   ├── processed/\n│   └── DVCfile\n├── notebooks/\n│   └── EDA.ipynb\n├── src/\n│   ├── data_processing.py\n│   ├── model_training.py\n│   ├── model_evaluation.py\n│   └── prediction.py\n├── models/\n├── reports/\n│   └── report.html\n├── dvc.yaml\n├── mlflow/\n│   ├── experiments/\n│   ├── models/\n│   └── mlflow.db\n├── README.md\n├── requirements.txt\n└── setup.py\n\n\n\n# MLOps Tools\n\n## DVC (Data Version Control)\n\nDVC is used for versioning datasets and machine learning models. It allows you to track data files and model files, enabling reproducibility and efficient collaboration in your machine learning projects.\n\n**Benefits of DVC:**\n\n-\u003e**Versioning**: Keep track of changes in data and models over time.\n-\u003e**Reproducibility**: Ensure that experiments can be reproduced with the same data and model versions.\n-\u003e**Collaboration**: Facilitate collaboration among team members by sharing data and model versions.\n\n# MLflow\n\nMLflow is used for managing the end-to-end machine learning lifecycle, including experimentation, reproducibility, and deployment. It tracks experiments, records models, and manages model deployment.\n\n## Benefits of MLflow:\n\n-\u003e **Experiment Tracking**: Record and query experiments, including code, data, config, and results.\n-\u003e **Model Management**: Register, annotate, and deploy models from a centralized model repository.\n-\u003e **Reproducibility**: Ensure that experiments can be reproduced with the same parameters and data\n\n# Installation\n\nTo install the necessary dependencies, run:\n\npip install -r requirements.txt\n\n\n## Usage\n1. **Data Processing**: Process the raw data and store the processed data.\n\n python src/data_processing.py\n\n2. **Model Training**: Train the machine learning model.\n\n python src/model_training.py\n\n3. **Model Evaluation**: Evaluate the trained model.\n\n python src/model_evaluation.py\n\n\n4. **Prediction**: Make predictions using the trained model.\n\n python src/prediction.py\n\n## Results\n\nThe model's performance is evaluated based on metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) score. Detailed results and visualizations are provided in the reports/report.html.\n\n\n# Conclusion\n\nThis project demonstrates the application of MLOps tools like DVC and MLflow to efficiently manage the lifecycle of a machine learning project. By predicting student math scores, we gain valuable insights into factors that influence academic performance, which can be leveraged to improve educational outcomes.\n\n![Alt Text](images/web2.png)\n\n![Alt Text](images/web1.png)\n\n# Mlflow\n \n![Alt Text](images/mlflow.png)\n\n\n![Alt Text](images/mlfl.png)\n\n## Contact \n \nFor any inquiries or further information, please contact:\n\nSahil Tiwari\nEmail: sahiltiwari1222@gmail.com\ngithub:https://github.com/sahilTiwariiii\n\n---\n\nFeel free to explore the project, experiment with the data, and enhance the model. Your feedback and contributions are welcome!\n\n---\n\nNote: This README file provides a comprehensive overview of the project, highlighting the use of MLOps tools and the steps involved in the project workflow.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsahiltiwariiii%2Fdssp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsahiltiwariiii%2Fdssp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsahiltiwariiii%2Fdssp/lists"}