https://github.com/aneeshmurali-n/project-ml-data-preprocessing

The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.
https://github.com/aneeshmurali-n/project-ml-data-preprocessing

data-analysis data-cleaning data-encoding data-exploration feature-scaling label-encoding matplotlib minmaxscaler numpy one-hot-encoding outlier-detection pandas standardscaler

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/aneeshmurali-n/project-ml-data-preprocessing
Owner: aneeshmurali-n
License: mit
Created: 2024-08-21T07:12:21.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-25T06:09:59.000Z (11 months ago)
Last Synced: 2025-01-09T02:59:24.090Z (6 months ago)
Topics: data-analysis, data-cleaning, data-encoding, data-exploration, feature-scaling, label-encoding, matplotlib, minmaxscaler, numpy, one-hot-encoding, outlier-detection, pandas, standardscaler
Language: Jupyter Notebook
Homepage:
Size: 174 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Project-ML-Data-Preprocessing
The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.

## Fulfilled Key Components:

### Data Exploration:
Explore the data, list down the unique values in each feature and find its length.
Perform the statistical analysis and renaming of the columns.

### Data Cleaning:
Find the missing and inappropriate values, treat them appropriately.
Remove all duplicate rows.
Find the outliers.
Replace the value 0 in age as NaN
Treat the null values in all columns using any measures(removing/ replace the values with mean/median/mode)

### Data Analysis:
Filter the data with age >40 and salary<5000
Plot the chart with age and salary
Count the number of people from each place and represent it visually

### Data Encoding:
Convert categorical variables into numerical representations using techniques such as one-hot encoding, label encoding, making them suitable for analysis by machine learning algorithms.

### Feature Scaling:
After the process of encoding, perform the scaling of the features using standardscaler and minmaxscaler.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aneeshmurali-n/project-ml-data-preprocessing

Awesome Lists containing this project

README