https://github.com/whereishussain/data-science

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
https://github.com/whereishussain/data-science

data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks

Last synced: about 1 year ago
JSON representation

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation

Host: GitHub
URL: https://github.com/whereishussain/data-science
Owner: WhereisHussain
Created: 2025-04-12T14:40:23.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-05-09T22:16:44.000Z (about 1 year ago)
Last Synced: 2025-06-03T12:15:52.302Z (about 1 year ago)
Topics: data-cleaning-and-preprocessing, data-science, data-visualisation, machine-learning, machine-learning-models, model-training-and-evaluation, neural-networks
Language: Jupyter Notebook
Homepage:
Size: 451 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🧠 Data Science Projects Repository

This repository contains a collection of practical data science projects covering the complete machine learning workflow, including:

- 🧹 Data Cleaning
- 🧪 Data Preprocessing
- 📊 Data Visualization
- 🏗️ Model Building & Compilation
- 🏃 Model Training & Evaluation

These projects aim to help learners and practitioners understand each phase of working with data and machine learning models.

---

## 📁 Project Structure

### 1. 🧼 Data Cleaning Project
**Objective:**
Clean and standardize a raw dataset with missing values, duplicates, incorrect data types, and inconsistent formatting.

**Techniques Used:**
- Handling missing data (mean, median, drop)
- Removing duplicates
- Converting data types
- String formatting and trimming
- Date and time conversion

**Tools:** `pandas`, `numpy`

📄 File: `data_cleaning.py`

---

### 2. ⚙️ Data Preprocessing Project
**Objective:**
Prepare clean data for machine learning algorithms by transforming features and labels.

**Techniques Used:**
- Feature scaling (StandardScaler, MinMaxScaler)
- Encoding categorical variables (OneHotEncoder, LabelEncoder)
- Train-test split
- Data balancing (optional: SMOTE)

**Tools:** `pandas`, `scikit-learn`, `numpy`

---

### 3. 📈 Data Visualization Project
**Objective:**
Understand the dataset using visual exploration techniques and identify patterns or anomalies.

**Techniques Used:**
- Histograms, box plots, scatter plots
- Correlation heatmaps
- Pair plots
- Class distribution graphs

**Tools:** `matplotlib`, `seaborn`, `pandas`

---

### 4. 🧠 Model Compilation & Training Project
**Objective:**
Build a machine learning or deep learning model, compile it with appropriate configurations, and train it on prepared data.

**Steps Covered:**
- Defining a model (ML or DL)
- Choosing loss function, optimizer, metrics
- Model training with validation
- Accuracy and loss plots

**Tools:** `scikit-learn`, `keras` / `tensorflow`, `matplotlib`

---

### 5. 🔍 Evaluation & Testing
**Objective:**
Evaluate model performance using appropriate metrics and visualize the results.

**Evaluation Metrics:**
- Accuracy, precision, recall, F1-score
- Confusion matrix
- ROC-AUC (for classification)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/whereishussain/data-science

Awesome Lists containing this project

README