Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amiraflak/data-mining
Data Mining Course - Spring 2024
https://github.com/amiraflak/data-mining
classification clustering data-analysis data-mining decision-tree-classifier eda pca
Last synced: 1 day ago
JSON representation
Data Mining Course - Spring 2024
- Host: GitHub
- URL: https://github.com/amiraflak/data-mining
- Owner: AmirAflak
- Created: 2024-04-17T11:09:57.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T19:52:18.000Z (2 months ago)
- Last Synced: 2024-09-07T00:17:22.846Z (2 months ago)
- Topics: classification, clustering, data-analysis, data-mining, decision-tree-classifier, eda, pca
- Language: Jupyter Notebook
- Homepage:
- Size: 751 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Mining Project
This project develops a machine learning model to classify customers based on their features and predict whether they will make a deposit into their newly opened account, using a dataset from a Portuguese bank's marketing campaign.
- Data Mining Course - Spring 2024## Dataset Description
The dataset represents a marketing campaign by a Portuguese bank, containing customer information.
The dataset contains 11162 samples, with 16 features. The features include:
- Demographic information: age, job, marital status, education
- Financial information: default, housing, loan
- Marketing campaign information: contact, month, day of week, duration, campaign, pdays, previous, poutcome
- Target variable: deposit (binary categorical)## Exploratory Data Analysis (EDA)
The EDA notebook provides an exploratory data analysis of the dataset, including:
- Data cleaning and preprocessing
- Visualization of the data using various plots and charts## Classification
The classification notebook develops a machine learning model to classify customers based on their observed features. The model is trained using a variety of algorithms, including KNN and decision trees. The performance of each model is evaluated using metrics such as accuracy, precision, and recall.## Clustering
The clustering notebook applies clustering algorithms to the dataset to identify patterns and group similar customers together. The algorithms used include k-means and hierarchical clustering. The results of the clustering analysis are visualized using various plots and charts.