https://github.com/philiptitus/mall-customers

K-means Model to categorize Mall customers into different clusters based on their spending habits
https://github.com/philiptitus/mall-customers

clustering k-means-clustering sickit-learn silhouette-score unsupervised-clustering unsupervised-learning unsupervised-machine-learning

Last synced: 5 months ago
JSON representation

K-means Model to categorize Mall customers into different clusters based on their spending habits

Host: GitHub
URL: https://github.com/philiptitus/mall-customers
Owner: philiptitus
Created: 2025-03-11T11:20:07.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-05-30T18:46:35.000Z (6 months ago)
Last Synced: 2025-06-10T04:07:42.742Z (6 months ago)
Topics: clustering, k-means-clustering, sickit-learn, silhouette-score, unsupervised-clustering, unsupervised-learning, unsupervised-machine-learning
Language: Jupyter Notebook
Homepage: https://philiptitus-mall-customers-app-8gflbs.streamlit.app/
Size: 142 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# Mall Customers K-Means Clustering Model

This project implements K-Means clustering on the "Mall Customers" dataset from Kaggle. The goal is to segment customers based on their annual income and spending score.

## Project Structure

- `model.ipynb`: Jupyter Notebook containing the implementation of the K-Means clustering model.
- `mall.csv`: Dataset used for clustering.
- `README.md`: Project documentation.
- `requirements.txt`: List of dependencies required to run the project.

## Dataset

The dataset used in this project is the "Mall Customers" dataset from Kaggle. It contains information about customers, including their annual income and spending score.

## Steps

1. **Load the Dataset**: Load the dataset using pandas and display the first few rows.
2. **Data Preprocessing**: Check for missing values and select relevant features for clustering.
3. **Standardize the Data**: Standardize the features to ensure equal contribution to distance calculations.
4. **Implement K-Means Clustering**: Initialize and fit the K-Means model, then predict the cluster for each data point.
5. **Visualize the Clusters**: Create a scatter plot to visualize the clusters.
6. **Evaluate Clustering**: Use the Elbow Method and Silhouette Method to determine the optimal number of clusters.
7. **Save the Model and Clustered Data**: Save the K-Means model and the clustered data to files.

## Evaluation

The performance of the K-Means clustering model is evaluated using the silhouette score. A higher silhouette score indicates better-defined clusters.

## Improving the Model

To improve the performance of the K-Means clustering model, consider the following strategies:

- Feature scaling and normalization
- Dimensionality reduction (e.g., PCA)
- Optimal number of clusters (Elbow Method, Silhouette Method)
- Initialization (k-means++)
- Multiple runs (n_init parameter)
- Alternative clustering algorithms (e.g., GMM, DBSCAN)
- Incorporate domain knowledge

## Installation

To run this project, you need to have Python installed. You can install the required dependencies using the following command:

```sh
pip install -r requirements.txt
```

## Usage

Open the `model.ipynb` file in Jupyter Notebook or JupyterLab to see the implementation and results of the K-Means clustering model.

## License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/philiptitus/mall-customers

Awesome Lists containing this project

README