Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/humairarizwan/datapreprocessor
https://github.com/humairarizwan/datapreprocessor
Last synced: 28 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/humairarizwan/datapreprocessor
- Owner: HumairaRizwan
- Created: 2024-02-02T07:00:35.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-02T10:10:50.000Z (11 months ago)
- Last Synced: 2024-02-02T11:27:49.142Z (11 months ago)
- Language: Python
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Overview
SmartEnocder is a Python class implemented using PySpark that automates data preprocessing tasks, making it easier to clean and prepare large-scale datasets for analysis or model training.
Features:
- Handle missing values through imputation or deletion.
- Identify duplicates and nulll values.
- Encode categorical variables using one-hot encoding or label encoding.
- Random oversampling to balance the class distribution.
- Scale numerical features for better model performance.
- PCA .