https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset

Applying end-to-end ML pipeline incl. EDA to get to know data more, data preprocessing to prepare data for modelling, and at last REGRESSION to predict one feature's value, CLASSIFICATION to classify one feature, and K-means for clustering and its analysis.
https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset

data-preprocessing exploratory-data-analysis k-means-clustering lasso-regression logistic-regression matplotlib ml-pipeline python scikit-learn

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset
Owner: Jazib-2004
Created: 2024-11-11T13:12:24.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-11T15:56:09.000Z (8 months ago)
Last Synced: 2025-01-21T21:47:16.236Z (5 months ago)
Topics: data-preprocessing, exploratory-data-analysis, k-means-clustering, lasso-regression, logistic-regression, matplotlib, ml-pipeline, python, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 377 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Prediction-Classification-and-Clustering-on-Public-Expenses-Dataset

So, I used a public expenses dataset, and applied a simple end-to-end Machine Learning pipeline on it. The pipeline included exploratory data analysis to get to know data more, basic data preprocessing to make the dataset feasible enough to be feeded into any model, and at last did REGRESSION to predict one feature's value, did CLASSIFICATION to classify one feature, and did K-MEANS CLUSTERING to do cluster analysis for optimal number of clusters and finally clustering the data.

**Exploratory Data Analysis**

![image](https://github.com/user-attachments/assets/219a2aa8-94af-4583-ac4b-20f8ec25443d)
![image](https://github.com/user-attachments/assets/3b38d91a-a6f2-4646-87b7-ec26245a27c4)

**Lasso Regression Results**

![image](https://github.com/user-attachments/assets/921aa964-89df-423f-96e2-2aab08751b97)

**Logistic Regression Results**

![image](https://github.com/user-attachments/assets/93e6447d-a8e2-40a7-8780-d0d1a1fb731b)

The reason of such poor results is the bad selection of feature for classification. I wanted to see if classifying GENDER based on this data can work but given these results, it's evident that the dataset is fair enough and is not biased towards any gender.

**Cluster Analysis**

![image](https://github.com/user-attachments/assets/4979db82-6fd3-49fe-94ab-9e3cffb0d969)

**K-Means Clustering**

![image](https://github.com/user-attachments/assets/fada2427-be71-48b8-95dd-4e9027563529)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset

Awesome Lists containing this project

README