An open API service indexing awesome lists of open source software.

https://github.com/philiptitus/mall-customers

K-means Model to categorize Mall customers into different clusters based on their spending habits
https://github.com/philiptitus/mall-customers

clustering k-means-clustering sickit-learn silhouette-score unsupervised-clustering unsupervised-learning unsupervised-machine-learning

Last synced: 5 months ago
JSON representation

K-means Model to categorize Mall customers into different clusters based on their spending habits

Awesome Lists containing this project

README

          

# Mall Customers K-Means Clustering Model

This project implements K-Means clustering on the "Mall Customers" dataset from Kaggle. The goal is to segment customers based on their annual income and spending score.

## Project Structure

- `model.ipynb`: Jupyter Notebook containing the implementation of the K-Means clustering model.
- `mall.csv`: Dataset used for clustering.
- `README.md`: Project documentation.
- `requirements.txt`: List of dependencies required to run the project.

## Dataset

The dataset used in this project is the "Mall Customers" dataset from Kaggle. It contains information about customers, including their annual income and spending score.

## Steps

1. **Load the Dataset**: Load the dataset using pandas and display the first few rows.
2. **Data Preprocessing**: Check for missing values and select relevant features for clustering.
3. **Standardize the Data**: Standardize the features to ensure equal contribution to distance calculations.
4. **Implement K-Means Clustering**: Initialize and fit the K-Means model, then predict the cluster for each data point.
5. **Visualize the Clusters**: Create a scatter plot to visualize the clusters.
6. **Evaluate Clustering**: Use the Elbow Method and Silhouette Method to determine the optimal number of clusters.
7. **Save the Model and Clustered Data**: Save the K-Means model and the clustered data to files.

## Evaluation

The performance of the K-Means clustering model is evaluated using the silhouette score. A higher silhouette score indicates better-defined clusters.

## Improving the Model

To improve the performance of the K-Means clustering model, consider the following strategies:

- Feature scaling and normalization
- Dimensionality reduction (e.g., PCA)
- Optimal number of clusters (Elbow Method, Silhouette Method)
- Initialization (k-means++)
- Multiple runs (n_init parameter)
- Alternative clustering algorithms (e.g., GMM, DBSCAN)
- Incorporate domain knowledge

## Installation

To run this project, you need to have Python installed. You can install the required dependencies using the following command:

```sh
pip install -r requirements.txt
```

## Usage

Open the `model.ipynb` file in Jupyter Notebook or JupyterLab to see the implementation and results of the K-Means clustering model.

## License

This project is licensed under the MIT License.