https://github.com/yashasgm07/averaged-regression-imputation
A regression-based missing value imputation system that uses weighted averaging of multiple regression models, supporting both CLI and Flask-based web execution.
https://github.com/yashasgm07/averaged-regression-imputation
cli-application data-preprocessing machine-learning python regression webapplication
Last synced: about 2 months ago
JSON representation
A regression-based missing value imputation system that uses weighted averaging of multiple regression models, supporting both CLI and Flask-based web execution.
- Host: GitHub
- URL: https://github.com/yashasgm07/averaged-regression-imputation
- Owner: Yashasgm07
- License: mit
- Created: 2025-12-22T09:02:48.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-22T19:08:27.000Z (6 months ago)
- Last Synced: 2025-12-23T20:35:52.006Z (6 months ago)
- Topics: cli-application, data-preprocessing, machine-learning, python, regression, webapplication
- Language: Python
- Homepage:
- Size: 23.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Regression-Based Missing Value Imputation System
## 🎓 Degree
M.E. Computer Science – PG Mini Project
## 📌 Project Title
Regression using Averaged Regression on Single and Multi Variable Models
---
## 🧠 Problem Statement
Handling missing values is a critical challenge in data analysis. Traditional methods such as mean
or median imputation ignore relationships between variables, leading to loss of information.
This project proposes a regression-based imputation approach that leverages both single-variable
and multivariable linear regression models to produce accurate and stable imputations.
---
## 📄 Abstract
This project implements an averaged regression-based approach to handle missing values in
structured datasets. Single-variable and multivariable linear regression models are trained for
each feature with missing data. Cross-validated mean squared error is used to compute weighted
predictions, followed by a nearest-neighbor refinement to improve stability and accuracy.
The system supports both command-line and web-based execution.
---
## ⚙️ Technologies Used
- Python
- Pandas, NumPy
- Scikit-learn
- Flask (Web Interface)
- HTML & CSS
- VS Code
---
## 🔄 System Workflow
1. Load original dataset
2. Inject missing values for experimentation
3. Apply single-variable regression
4. Apply multi-variable regression
5. Evaluate models using cross-validation (MSE)
6. Compute weighted averaged predictions
7. Refine predictions using nearest neighbor
8. Generate final imputed dataset
---
## 📁 Output Files
After execution, the following files are generated:
- `student_dataset_original.csv`
- `student_dataset_with_missing.csv`
- `student_dataset_imputed_final.csv`
These files ensure transparency and traceability across all stages of data processing.
---
## ▶️ How to Run (Command Line)
```bash
pip install -r requirements.txt
python src/main.py
## 🌐 How to Run (Web Application)
```bash
python app.py
Open a browser and navigate to:
cpp
Copy code
http://127.0.0.1:5000
Steps:
Upload the CSV dataset
Click Run Imputation
Download the final imputed dataset
🎯 Key Features
Intelligent handling of missing values using regression
Weighted averaging based on model performance
Preservation of all data processing stages
Supports both CLI and web-based execution
Easy to demonstrate and explain during viva
📌 Conclusion
The averaged regression-based imputation method produces more reliable and stable results
compared to traditional mean-based approaches. The dual execution modes make the system
interactive, practical, and suitable for real-world data processing scenarios.
🔮 Future Scope
Extend to non-linear regression models
Apply deep learning-based imputation techniques
Support large-scale datasets
Add visualization dashboards for results
👨💻 Developed By
Yashas G M
M.E. Computer Science