Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/humairarizwan/datapreprocessor


https://github.com/humairarizwan/datapreprocessor

Last synced: 28 days ago
JSON representation

Awesome Lists containing this project

README

        

Overview


SmartEnocder is a Python class implemented using PySpark that automates data preprocessing tasks, making it easier to clean and prepare large-scale datasets for analysis or model training.

Features:


  • Handle missing values through imputation or deletion.

  • Identify duplicates and nulll values.

  • Encode categorical variables using one-hot encoding or label encoding.

  • Random oversampling to balance the class distribution.

  • Scale numerical features for better model performance.

  • PCA
  • .