Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset
Applying end-to-end ML pipeline incl. EDA to get to know data more, data preprocessing to prepare data for modelling, and at last REGRESSION to predict one feature's value, CLASSIFICATION to classify one feature, and K-means for clustering and its analysis.
https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset
data-preprocessing exploratory-data-analysis k-means-clustering lasso-regression logistic-regression matplotlib ml-pipeline python scikit-learn
Last synced: 16 days ago
JSON representation
Applying end-to-end ML pipeline incl. EDA to get to know data more, data preprocessing to prepare data for modelling, and at last REGRESSION to predict one feature's value, CLASSIFICATION to classify one feature, and K-means for clustering and its analysis.
- Host: GitHub
- URL: https://github.com/jazib-2004/prediction-classification-and-clustering-on-public-expenses-dataset
- Owner: Jazib-2004
- Created: 2024-11-11T13:12:24.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-11T15:56:09.000Z (3 months ago)
- Last Synced: 2024-11-21T04:15:57.830Z (3 months ago)
- Topics: data-preprocessing, exploratory-data-analysis, k-means-clustering, lasso-regression, logistic-regression, matplotlib, ml-pipeline, python, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 377 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Prediction-Classification-and-Clustering-on-Public-Expenses-Dataset
So, I used a public expenses dataset, and applied a simple end-to-end Machine Learning pipeline on it. The pipeline included exploratory data analysis to get to know data more, basic data preprocessing to make the dataset feasible enough to be feeded into any model, and at last did REGRESSION to predict one feature's value, did CLASSIFICATION to classify one feature, and did K-MEANS CLUSTERING to do cluster analysis for optimal number of clusters and finally clustering the data.
**Exploratory Data Analysis**
![image](https://github.com/user-attachments/assets/219a2aa8-94af-4583-ac4b-20f8ec25443d)
![image](https://github.com/user-attachments/assets/3b38d91a-a6f2-4646-87b7-ec26245a27c4)**Lasso Regression Results**
![image](https://github.com/user-attachments/assets/921aa964-89df-423f-96e2-2aab08751b97)
**Logistic Regression Results**
![image](https://github.com/user-attachments/assets/93e6447d-a8e2-40a7-8780-d0d1a1fb731b)
The reason of such poor results is the bad selection of feature for classification. I wanted to see if classifying GENDER based on this data can work but given these results, it's evident that the dataset is fair enough and is not biased towards any gender.
**Cluster Analysis**
![image](https://github.com/user-attachments/assets/4979db82-6fd3-49fe-94ab-9e3cffb0d969)
**K-Means Clustering**
![image](https://github.com/user-attachments/assets/fada2427-be71-48b8-95dd-4e9027563529)