An open API service indexing awesome lists of open source software.

https://github.com/whereishussain/data-science

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
https://github.com/whereishussain/data-science

data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks

Last synced: 4 months ago
JSON representation

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation

Awesome Lists containing this project

README

          

# ๐Ÿง  Data Science Projects Repository

This repository contains a collection of practical data science projects covering the complete machine learning workflow, including:

- ๐Ÿงน Data Cleaning
- ๐Ÿงช Data Preprocessing
- ๐Ÿ“Š Data Visualization
- ๐Ÿ—๏ธ Model Building & Compilation
- ๐Ÿƒ Model Training & Evaluation

These projects aim to help learners and practitioners understand each phase of working with data and machine learning models.

---

## ๐Ÿ“ Project Structure

### 1. ๐Ÿงผ Data Cleaning Project
**Objective:**
Clean and standardize a raw dataset with missing values, duplicates, incorrect data types, and inconsistent formatting.

**Techniques Used:**
- Handling missing data (mean, median, drop)
- Removing duplicates
- Converting data types
- String formatting and trimming
- Date and time conversion

**Tools:** `pandas`, `numpy`

๐Ÿ“„ File: `data_cleaning.py`

---

### 2. โš™๏ธ Data Preprocessing Project
**Objective:**
Prepare clean data for machine learning algorithms by transforming features and labels.

**Techniques Used:**
- Feature scaling (StandardScaler, MinMaxScaler)
- Encoding categorical variables (OneHotEncoder, LabelEncoder)
- Train-test split
- Data balancing (optional: SMOTE)

**Tools:** `pandas`, `scikit-learn`, `numpy`

---

### 3. ๐Ÿ“ˆ Data Visualization Project
**Objective:**
Understand the dataset using visual exploration techniques and identify patterns or anomalies.

**Techniques Used:**
- Histograms, box plots, scatter plots
- Correlation heatmaps
- Pair plots
- Class distribution graphs

**Tools:** `matplotlib`, `seaborn`, `pandas`

---

### 4. ๐Ÿง  Model Compilation & Training Project
**Objective:**
Build a machine learning or deep learning model, compile it with appropriate configurations, and train it on prepared data.

**Steps Covered:**
- Defining a model (ML or DL)
- Choosing loss function, optimizer, metrics
- Model training with validation
- Accuracy and loss plots

**Tools:** `scikit-learn`, `keras` / `tensorflow`, `matplotlib`

---

### 5. ๐Ÿ” Evaluation & Testing
**Objective:**
Evaluate model performance using appropriate metrics and visualize the results.

**Evaluation Metrics:**
- Accuracy, precision, recall, F1-score
- Confusion matrix
- ROC-AUC (for classification)

---