{"id":28105065,"url":"https://github.com/fernandesotero/project-data-exploration","last_synced_at":"2026-04-30T10:02:32.377Z","repository":{"id":293130646,"uuid":"983044923","full_name":"fernandesotero/project-data-exploration","owner":"fernandesotero","description":"Student Performance Prediction with Data Science","archived":false,"fork":false,"pushed_at":"2025-05-13T19:44:47.000Z","size":158,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-13T20:58:17.207Z","etag":null,"topics":["data-visualization","jupyter-notebook","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fernandesotero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-13T19:43:24.000Z","updated_at":"2025-05-13T19:48:31.000Z","dependencies_parsed_at":"2025-05-13T21:10:29.044Z","dependency_job_id":null,"html_url":"https://github.com/fernandesotero/project-data-exploration","commit_stats":null,"previous_names":["fernandesotero/project-data-exploration"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fernandesotero/project-data-exploration","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandesotero%2Fproject-data-exploration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandesotero%2Fproject-data-exploration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandesotero%2Fproject-data-exploration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandesotero%2Fproject-data-exploration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fernandesotero","download_url":"https://codeload.github.com/fernandesotero/project-data-exploration/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandesotero%2Fproject-data-exploration/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32460781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","jupyter-notebook","python"],"created_at":"2025-05-13T21:23:49.488Z","updated_at":"2026-04-30T10:02:32.360Z","avatar_url":"https://github.com/fernandesotero.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎓 Student Performance Prediction with Data Science\n\nThis project aims to apply data science techniques to predict **students academic performance** based on their study habits, lifestyle, and socioeconomic background. The main idea is to explore how different behavioral and contextual variables influence final exam scores using machine learning.\n\n## 📁 Dataset\n\nThe dataset used, named `student_habits_performance.csv`, contains **1,000 student records** with the following information:\n\n- Demographic data (age, gender)  \n- Daily habits (study hours, sleep, social media, Netflix)  \n- External conditions (part-time job, internet quality, parents' education level)  \n- Health and well-being (exercise, diet, mental health)  \n- Participation in extracurricular activities  \n- Final exam score (target variable: `exam_score`)\n\n## 🎯 Objective\n\nTo develop a predictive model capable of estimating a student's exam score based on their habits and characteristics, enabling:\n\n- Understanding of the variables that most impact academic performance  \n- Support for the development of data-driven educational policies  \n- Practical demonstration of machine learning techniques\n\n## ⚙️ Project Steps\n\n### 1. Exploratory Data Analysis (EDA)\n\nTools used: `pandas`, `seaborn`, `matplotlib`\n\n- Check data types, missing values, and descriptive statistics  \n- Graphical analysis of distributions, correlations, and variable relationships  \n- Initial understanding of data patterns\n\n### 2. Preprocessing\n\nTools used: `scikit-learn` (`ColumnTransformer`, `StandardScaler`, `OneHotEncoder`)\n\n- Separation of numerical and categorical features  \n- Normalization of continuous variables  \n- Encoding of categorical variables (one-hot encoding)  \n- Removal of irrelevant columns (e.g., student ID)\n\n### 3. Predictive Modeling\n\nModels used:\n- **Linear Regression** – Baseline model to evaluate simple linear relationships  \n- **Random Forest Regressor** – Robust, non-linear model to capture complex interactions\n\nBoth models were integrated into a `Pipeline`, enabling automatic execution of preprocessing and training.\n\n### 4. Evaluation\n\nMetrics used:\n- **R² (coefficient of determination)**: measures explained variance  \n- **MAE (Mean Absolute Error)**: interprets average error in actual units  \n\nThese metrics allow for model comparison and understanding of prediction effectiveness.\n\n## 🧠 Conclusion\n\nThe project demonstrated how behavioral and lifestyle data can be analyzed to predict academic performance. In addition to building predictive models, it helped identify key factors that influence students' learning—valuable insights for real-world educational applications.\n\n## 🛠️ Technologies and Libraries\n\n- Python  \n- pandas  \n- matplotlib  \n- seaborn  \n- scikit-learn\n\n## 📌 Future Improvements\n\n- Hyperparameter tuning with GridSearchCV  \n- Testing other models (XGBoost, LightGBM)  \n- Feature selection to reduce complexity  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffernandesotero%2Fproject-data-exploration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffernandesotero%2Fproject-data-exploration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffernandesotero%2Fproject-data-exploration/lists"}