An open API service indexing awesome lists of open source software.

https://github.com/marianamartiyns/inep-educationperfomance

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.
https://github.com/marianamartiyns/inep-educationperfomance

data-analysis data-cleaning data-science inep predictive-modeling pyhton web-scraping

Last synced: 3 months ago
JSON representation

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.

Awesome Lists containing this project

README

        

# πŸ“Š INEP-EducationPerformance

> Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais AnΓ­sio Teixeira).

## πŸ“‹ Description

This project focuses on analyzing and predicting **school performance rates** (approval, failure, and dropout) using publicly available data from **INEP's Open Data platform**. These rates are crucial for monitoring educational indicators at the school and municipal levels in Brazil.

The dataset is available in `.xlsx` format and is collected, processed, and transformed into `.csv` for better analysis. The project applies **exploratory data analysis (EDA)**, **machine learning models**, and **regression techniques** to forecast failure rates for the year **2023** based on historical data (2019-2022).

## 🧩 Data Processing & Modeling

- [x] **Data Collection:** Extract `.xlsx` files from the government website using `requests`, `zipfile`, and `glob`.
- [x] **Data Transformation:** Convert raw spreadsheets into `.csv` format.
- [x] **Exploratory Data Analysis (EDA):** Compare fundamental and high school data.
- [x] **Data Preprocessing:** Keep only high school data, apply **Label Encoding**, and select best models using **LazyRegressor** and **LazyClassifier**.
- [x] **Predictive Modeling:** Apply **Linear Regression** (Model Evaluation (RΒ² & MAE), Normality of Residuals, Independent Residual & Homoscedasticity) to *failure rate* data (train: 2019-2022) to forecast 2023 (test).

## πŸ“Š Results & Insights

- Identification of trends in approval, failure, and dropout rates.
- Performance comparison between **elementary and high school**.
- Prediction of **2023 failure rates** using historical data.

> [!NOTE]
> The code and documentation are in **Portuguese πŸ‡§πŸ‡·**, with variable names in Portuguese for consistency with the original dataset.

```py
# Author Info

# LinkedIn: https://www.linkedin.com/in/profile-mariana-martins/
# GitHub: https://github.com/marianamartiyns
# Email: [email protected]
```