https://github.com/philiptitus/mall-customers
K-means Model to categorize Mall customers into different clusters based on their spending habits
https://github.com/philiptitus/mall-customers
clustering k-means-clustering sickit-learn silhouette-score unsupervised-clustering unsupervised-learning unsupervised-machine-learning
Last synced: 5 months ago
JSON representation
K-means Model to categorize Mall customers into different clusters based on their spending habits
- Host: GitHub
- URL: https://github.com/philiptitus/mall-customers
- Owner: philiptitus
- Created: 2025-03-11T11:20:07.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-05-30T18:46:35.000Z (6 months ago)
- Last Synced: 2025-06-10T04:07:42.742Z (6 months ago)
- Topics: clustering, k-means-clustering, sickit-learn, silhouette-score, unsupervised-clustering, unsupervised-learning, unsupervised-machine-learning
- Language: Jupyter Notebook
- Homepage: https://philiptitus-mall-customers-app-8gflbs.streamlit.app/
- Size: 142 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Mall Customers K-Means Clustering Model
This project implements K-Means clustering on the "Mall Customers" dataset from Kaggle. The goal is to segment customers based on their annual income and spending score.
## Project Structure
- `model.ipynb`: Jupyter Notebook containing the implementation of the K-Means clustering model.
- `mall.csv`: Dataset used for clustering.
- `README.md`: Project documentation.
- `requirements.txt`: List of dependencies required to run the project.
## Dataset
The dataset used in this project is the "Mall Customers" dataset from Kaggle. It contains information about customers, including their annual income and spending score.
## Steps
1. **Load the Dataset**: Load the dataset using pandas and display the first few rows.
2. **Data Preprocessing**: Check for missing values and select relevant features for clustering.
3. **Standardize the Data**: Standardize the features to ensure equal contribution to distance calculations.
4. **Implement K-Means Clustering**: Initialize and fit the K-Means model, then predict the cluster for each data point.
5. **Visualize the Clusters**: Create a scatter plot to visualize the clusters.
6. **Evaluate Clustering**: Use the Elbow Method and Silhouette Method to determine the optimal number of clusters.
7. **Save the Model and Clustered Data**: Save the K-Means model and the clustered data to files.
## Evaluation
The performance of the K-Means clustering model is evaluated using the silhouette score. A higher silhouette score indicates better-defined clusters.
## Improving the Model
To improve the performance of the K-Means clustering model, consider the following strategies:
- Feature scaling and normalization
- Dimensionality reduction (e.g., PCA)
- Optimal number of clusters (Elbow Method, Silhouette Method)
- Initialization (k-means++)
- Multiple runs (n_init parameter)
- Alternative clustering algorithms (e.g., GMM, DBSCAN)
- Incorporate domain knowledge
## Installation
To run this project, you need to have Python installed. You can install the required dependencies using the following command:
```sh
pip install -r requirements.txt
```
## Usage
Open the `model.ipynb` file in Jupyter Notebook or JupyterLab to see the implementation and results of the K-Means clustering model.
## License
This project is licensed under the MIT License.