An open API service indexing awesome lists of open source software.

https://github.com/abdulrahmanaymann/data-mining

data mining project involving two tasks: a regression problem and a classification problem.
https://github.com/abdulrahmanaymann/data-mining

classification data-mining imputation jupyter-notebook knn linear-regression outlier-detection polynomial-regression preprocessing python regression scaling

Last synced: 10 months ago
JSON representation

data mining project involving two tasks: a regression problem and a classification problem.

Awesome Lists containing this project

README

          

# Data Mining Project

## Overview

This repository contains the code and documentation for a data mining project involving two tasks: a regression problem and a classification problem. The project covers various preprocessing steps, model training, testing, visualization, and evaluation using different techniques.

## Tasks

### 1. Regression Problem

#### a. Impute a Categorical Missing Value

- Utilized imputation method to handle categorical missing values in the dataset.

#### b. Impute a Numerical Missing Value

- Employed [specific imputation method] to handle numerical missing values in the dataset.

#### c. Identify a Scaling Problem Visually

- Visualized scaling issues in the dataset using [visualization method].

#### d. Apply 2 Methods of Scaling to Treat Outliers

- Applied [scaling method 1] and [scaling method 2] to address outliers in the dataset.

#### e. Convert a Categorical Variable to Number(s)

- Transformed categorical variables into numerical format.

#### f. Generate 2 Regression Models with MAE and R2

- Developed two regression models using [model 1] and [model 2], assessing Mean Absolute Error (MAE) and R-squared for each.

#### g. Compare Both Models to Identify Which is Better

- Conducted a thorough comparison of [model 1] and [model 2] to identify the superior performing regression model.

### 2. Classification Problem

#### a. Impute a Categorical Missing Value

- Implemented imputation method to handle categorical missing values in the dataset.

#### b. Impute a Numerical Missing Value

- Utilized imputation method to handle numerical missing values in the dataset.

#### c. Identify a Scaling Problem Visually

- Visualized scaling issues in the dataset.

#### d. Apply 2 Methods of Scaling to Treat Outliers

- Employed Scaling methods to address outliers in the dataset.

#### e. Convert a Categorical Variable to Number(s)

- Transformed categorical variables into numerical format.

#### f. Fit a Classification Model

- Fitted a classification model using KNN.

#### g. Evaluate Your Model

- Assessed the classification model using confusion matrix and accuracy metrics.

## Visualization

- Visualized the dataset after performing the models, showcasing predicted results versus actual results.

## Evaluation

### Regression Problem

- Used Mean Absolute Error (MAE) and R-squared to evaluate the regression models.

### Classification Problem

- Utilized confusion matrix and accuracy metrics to evaluate the classification model.