https://github.com/abdulrahmanaymann/data-mining
data mining project involving two tasks: a regression problem and a classification problem.
https://github.com/abdulrahmanaymann/data-mining
classification data-mining imputation jupyter-notebook knn linear-regression outlier-detection polynomial-regression preprocessing python regression scaling
Last synced: 10 months ago
JSON representation
data mining project involving two tasks: a regression problem and a classification problem.
- Host: GitHub
- URL: https://github.com/abdulrahmanaymann/data-mining
- Owner: abdulrahmanaymann
- Created: 2024-01-16T13:18:15.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-16T13:37:11.000Z (over 2 years ago)
- Last Synced: 2025-03-21T15:42:57.184Z (about 1 year ago)
- Topics: classification, data-mining, imputation, jupyter-notebook, knn, linear-regression, outlier-detection, polynomial-regression, preprocessing, python, regression, scaling
- Language: Jupyter Notebook
- Homepage:
- Size: 876 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Mining Project
## Overview
This repository contains the code and documentation for a data mining project involving two tasks: a regression problem and a classification problem. The project covers various preprocessing steps, model training, testing, visualization, and evaluation using different techniques.
## Tasks
### 1. Regression Problem
#### a. Impute a Categorical Missing Value
- Utilized imputation method to handle categorical missing values in the dataset.
#### b. Impute a Numerical Missing Value
- Employed [specific imputation method] to handle numerical missing values in the dataset.
#### c. Identify a Scaling Problem Visually
- Visualized scaling issues in the dataset using [visualization method].
#### d. Apply 2 Methods of Scaling to Treat Outliers
- Applied [scaling method 1] and [scaling method 2] to address outliers in the dataset.
#### e. Convert a Categorical Variable to Number(s)
- Transformed categorical variables into numerical format.
#### f. Generate 2 Regression Models with MAE and R2
- Developed two regression models using [model 1] and [model 2], assessing Mean Absolute Error (MAE) and R-squared for each.
#### g. Compare Both Models to Identify Which is Better
- Conducted a thorough comparison of [model 1] and [model 2] to identify the superior performing regression model.
### 2. Classification Problem
#### a. Impute a Categorical Missing Value
- Implemented imputation method to handle categorical missing values in the dataset.
#### b. Impute a Numerical Missing Value
- Utilized imputation method to handle numerical missing values in the dataset.
#### c. Identify a Scaling Problem Visually
- Visualized scaling issues in the dataset.
#### d. Apply 2 Methods of Scaling to Treat Outliers
- Employed Scaling methods to address outliers in the dataset.
#### e. Convert a Categorical Variable to Number(s)
- Transformed categorical variables into numerical format.
#### f. Fit a Classification Model
- Fitted a classification model using KNN.
#### g. Evaluate Your Model
- Assessed the classification model using confusion matrix and accuracy metrics.
## Visualization
- Visualized the dataset after performing the models, showcasing predicted results versus actual results.
## Evaluation
### Regression Problem
- Used Mean Absolute Error (MAE) and R-squared to evaluate the regression models.
### Classification Problem
- Utilized confusion matrix and accuracy metrics to evaluate the classification model.