https://github.com/marianamartiyns/inep-educationperfomance
Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.
https://github.com/marianamartiyns/inep-educationperfomance
data-analysis data-cleaning data-science inep predictive-modeling pyhton web-scraping
Last synced: 3 months ago
JSON representation
Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.
- Host: GitHub
- URL: https://github.com/marianamartiyns/inep-educationperfomance
- Owner: marianamartiyns
- License: mit
- Created: 2025-03-16T11:37:50.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-16T12:10:34.000Z (3 months ago)
- Last Synced: 2025-03-16T12:38:22.149Z (3 months ago)
- Topics: data-analysis, data-cleaning, data-science, inep, predictive-modeling, pyhton, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 1.29 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π INEP-EducationPerformance
> Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais AnΓsio Teixeira).
## π Description
This project focuses on analyzing and predicting **school performance rates** (approval, failure, and dropout) using publicly available data from **INEP's Open Data platform**. These rates are crucial for monitoring educational indicators at the school and municipal levels in Brazil.
The dataset is available in `.xlsx` format and is collected, processed, and transformed into `.csv` for better analysis. The project applies **exploratory data analysis (EDA)**, **machine learning models**, and **regression techniques** to forecast failure rates for the year **2023** based on historical data (2019-2022).
## π§© Data Processing & Modeling
- [x] **Data Collection:** Extract `.xlsx` files from the government website using `requests`, `zipfile`, and `glob`.
- [x] **Data Transformation:** Convert raw spreadsheets into `.csv` format.
- [x] **Exploratory Data Analysis (EDA):** Compare fundamental and high school data.
- [x] **Data Preprocessing:** Keep only high school data, apply **Label Encoding**, and select best models using **LazyRegressor** and **LazyClassifier**.
- [x] **Predictive Modeling:** Apply **Linear Regression** (Model Evaluation (RΒ² & MAE), Normality of Residuals, Independent Residual & Homoscedasticity) to *failure rate* data (train: 2019-2022) to forecast 2023 (test).
## π Results & Insights- Identification of trends in approval, failure, and dropout rates.
- Performance comparison between **elementary and high school**.
- Prediction of **2023 failure rates** using historical data.> [!NOTE]
> The code and documentation are in **Portuguese π§π·**, with variable names in Portuguese for consistency with the original dataset.```py
# Author Info# LinkedIn: https://www.linkedin.com/in/profile-mariana-martins/
# GitHub: https://github.com/marianamartiyns
# Email: [email protected]
```
![]()
![]()