https://github.com/whereishussain/data-science
Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
https://github.com/whereishussain/data-science
data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks
Last synced: 4 months ago
JSON representation
Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
- Host: GitHub
- URL: https://github.com/whereishussain/data-science
- Owner: WhereisHussain
- Created: 2025-04-12T14:40:23.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-05-09T22:16:44.000Z (5 months ago)
- Last Synced: 2025-06-03T12:15:52.302Z (4 months ago)
- Topics: data-cleaning-and-preprocessing, data-science, data-visualisation, machine-learning, machine-learning-models, model-training-and-evaluation, neural-networks
- Language: Jupyter Notebook
- Homepage:
- Size: 451 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ง Data Science Projects Repository
This repository contains a collection of practical data science projects covering the complete machine learning workflow, including:
- ๐งน Data Cleaning
- ๐งช Data Preprocessing
- ๐ Data Visualization
- ๐๏ธ Model Building & Compilation
- ๐ Model Training & EvaluationThese projects aim to help learners and practitioners understand each phase of working with data and machine learning models.
---
## ๐ Project Structure
### 1. ๐งผ Data Cleaning Project
**Objective:**
Clean and standardize a raw dataset with missing values, duplicates, incorrect data types, and inconsistent formatting.**Techniques Used:**
- Handling missing data (mean, median, drop)
- Removing duplicates
- Converting data types
- String formatting and trimming
- Date and time conversion**Tools:** `pandas`, `numpy`
๐ File: `data_cleaning.py`
---
### 2. โ๏ธ Data Preprocessing Project
**Objective:**
Prepare clean data for machine learning algorithms by transforming features and labels.**Techniques Used:**
- Feature scaling (StandardScaler, MinMaxScaler)
- Encoding categorical variables (OneHotEncoder, LabelEncoder)
- Train-test split
- Data balancing (optional: SMOTE)**Tools:** `pandas`, `scikit-learn`, `numpy`
---
### 3. ๐ Data Visualization Project
**Objective:**
Understand the dataset using visual exploration techniques and identify patterns or anomalies.**Techniques Used:**
- Histograms, box plots, scatter plots
- Correlation heatmaps
- Pair plots
- Class distribution graphs**Tools:** `matplotlib`, `seaborn`, `pandas`
---
### 4. ๐ง Model Compilation & Training Project
**Objective:**
Build a machine learning or deep learning model, compile it with appropriate configurations, and train it on prepared data.**Steps Covered:**
- Defining a model (ML or DL)
- Choosing loss function, optimizer, metrics
- Model training with validation
- Accuracy and loss plots**Tools:** `scikit-learn`, `keras` / `tensorflow`, `matplotlib`
---
### 5. ๐ Evaluation & Testing
**Objective:**
Evaluate model performance using appropriate metrics and visualize the results.**Evaluation Metrics:**
- Accuracy, precision, recall, F1-score
- Confusion matrix
- ROC-AUC (for classification)---