https://github.com/marianamartiyns/inep-educationperfomance

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.
https://github.com/marianamartiyns/inep-educationperfomance

data-analysis data-cleaning data-science inep predictive-modeling pyhton web-scraping

Last synced: 3 months ago
JSON representation

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.

Host: GitHub
URL: https://github.com/marianamartiyns/inep-educationperfomance
Owner: marianamartiyns
License: mit
Created: 2025-03-16T11:37:50.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-03-16T12:10:34.000Z (3 months ago)
Last Synced: 2025-03-16T12:38:22.149Z (3 months ago)
Topics: data-analysis, data-cleaning, data-science, inep, predictive-modeling, pyhton, web-scraping
Language: Jupyter Notebook
Homepage:
Size: 1.29 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 📊 INEP-EducationPerformance

> Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira).

## 📋 Description

This project focuses on analyzing and predicting **school performance rates** (approval, failure, and dropout) using publicly available data from **INEP's Open Data platform**. These rates are crucial for monitoring educational indicators at the school and municipal levels in Brazil.

The dataset is available in `.xlsx` format and is collected, processed, and transformed into `.csv` for better analysis. The project applies **exploratory data analysis (EDA)**, **machine learning models**, and **regression techniques** to forecast failure rates for the year **2023** based on historical data (2019-2022).

## 🧩 Data Processing & Modeling

- [x] **Data Collection:** Extract `.xlsx` files from the government website using `requests`, `zipfile`, and `glob`.
- [x] **Data Transformation:** Convert raw spreadsheets into `.csv` format.
- [x] **Exploratory Data Analysis (EDA):** Compare fundamental and high school data.
- [x] **Data Preprocessing:** Keep only high school data, apply **Label Encoding**, and select best models using **LazyRegressor** and **LazyClassifier**.
- [x] **Predictive Modeling:** Apply **Linear Regression** (Model Evaluation (R² & MAE), Normality of Residuals, Independent Residual & Homoscedasticity) to *failure rate* data (train: 2019-2022) to forecast 2023 (test).

## 📊 Results & Insights

- Identification of trends in approval, failure, and dropout rates.
- Performance comparison between **elementary and high school**.
- Prediction of **2023 failure rates** using historical data.

> [!NOTE]
> The code and documentation are in **Portuguese 🇧🇷**, with variable names in Portuguese for consistency with the original dataset.

```py
# Author Info

# LinkedIn: https://www.linkedin.com/in/profile-mariana-martins/
# GitHub: https://github.com/marianamartiyns
# Email: [email protected]
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/marianamartiyns/inep-educationperfomance

Awesome Lists containing this project

README