An open API service indexing awesome lists of open source software.

https://github.com/yashasgm07/averaged-regression-imputation

A regression-based missing value imputation system that uses weighted averaging of multiple regression models, supporting both CLI and Flask-based web execution.
https://github.com/yashasgm07/averaged-regression-imputation

cli-application data-preprocessing machine-learning python regression webapplication

Last synced: about 2 months ago
JSON representation

A regression-based missing value imputation system that uses weighted averaging of multiple regression models, supporting both CLI and Flask-based web execution.

Awesome Lists containing this project

README

          

# Regression-Based Missing Value Imputation System

## 🎓 Degree
M.E. Computer Science – PG Mini Project

## 📌 Project Title
Regression using Averaged Regression on Single and Multi Variable Models

---

## 🧠 Problem Statement
Handling missing values is a critical challenge in data analysis. Traditional methods such as mean
or median imputation ignore relationships between variables, leading to loss of information.
This project proposes a regression-based imputation approach that leverages both single-variable
and multivariable linear regression models to produce accurate and stable imputations.

---

## 📄 Abstract
This project implements an averaged regression-based approach to handle missing values in
structured datasets. Single-variable and multivariable linear regression models are trained for
each feature with missing data. Cross-validated mean squared error is used to compute weighted
predictions, followed by a nearest-neighbor refinement to improve stability and accuracy.
The system supports both command-line and web-based execution.

---

## ⚙️ Technologies Used
- Python
- Pandas, NumPy
- Scikit-learn
- Flask (Web Interface)
- HTML & CSS
- VS Code

---

## 🔄 System Workflow
1. Load original dataset
2. Inject missing values for experimentation
3. Apply single-variable regression
4. Apply multi-variable regression
5. Evaluate models using cross-validation (MSE)
6. Compute weighted averaged predictions
7. Refine predictions using nearest neighbor
8. Generate final imputed dataset

---

## 📁 Output Files
After execution, the following files are generated:

- `student_dataset_original.csv`
- `student_dataset_with_missing.csv`
- `student_dataset_imputed_final.csv`

These files ensure transparency and traceability across all stages of data processing.

---

## ▶️ How to Run (Command Line)

```bash
pip install -r requirements.txt
python src/main.py

## 🌐 How to Run (Web Application)

```bash
python app.py
Open a browser and navigate to:

cpp
Copy code
http://127.0.0.1:5000
Steps:
Upload the CSV dataset

Click Run Imputation

Download the final imputed dataset

🎯 Key Features
Intelligent handling of missing values using regression

Weighted averaging based on model performance

Preservation of all data processing stages

Supports both CLI and web-based execution

Easy to demonstrate and explain during viva

📌 Conclusion
The averaged regression-based imputation method produces more reliable and stable results
compared to traditional mean-based approaches. The dual execution modes make the system
interactive, practical, and suitable for real-world data processing scenarios.

🔮 Future Scope
Extend to non-linear regression models

Apply deep learning-based imputation techniques

Support large-scale datasets

Add visualization dashboards for results

👨‍💻 Developed By
Yashas G M
M.E. Computer Science