https://github.com/chandkund/customer-segmentation-using-k-means-clustering

Implemented K-Means Clustering to segment customers based on purchasing behavior, enabling targeted marketing strategies. Analyzed data, optimized clusters using the Elbow Method, and derived insights to enhance customer engagement and retention.
https://github.com/chandkund/customer-segmentation-using-k-means-clustering

data-science kmeans-clustering machine-learning matplotlib numpy pandas python seaborn sklearn

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/chandkund/customer-segmentation-using-k-means-clustering
Owner: chandkund
License: mit
Created: 2024-08-16T18:21:42.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-16T18:44:27.000Z (11 months ago)
Last Synced: 2025-01-18T14:53:41.772Z (6 months ago)
Topics: data-science, kmeans-clustering, machine-learning, matplotlib, numpy, pandas, python, seaborn, sklearn
Language: Jupyter Notebook
Homepage:
Size: 126 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING

## Project Overview

This project focuses on customer segmentation using the K-Means Clustering algorithm. By clustering customers based on their purchasing behaviors and demographics, businesses can tailor their marketing strategies and improve customer engagement.

## Table of Contents

- [Project Overview](#project-overview)

- [Installation](#installation)

- [Usage](#usage)

- [Code Explanation](#code-explanation)

- [Model Evaluation](#model-evaluation)

- [License](#license)

## Installation

To run this project, you will need Python along with the following libraries:

- `pandas`

- `numpy`

- `matplotlib`

- `seaborn`

- `scikit-learn`

Install the required packages using `pip`:

    pip install pandas numpy matplotlib seaborn scikit-learn

Clone the repository:

    git clone https://github.com/chandkund/customer-segmentation-using-k-means-clustering.git

    cd customer-segmentation-using-k-means-clustering

## Usage

- Prepare the Dataset:

  Place your dataset (e.g., `mall_customers.csv`) in the project directory or adjust the file path in the code.

- Run the Code:

  Execute the Python scripts to perform data preprocessing, clustering, and visualization.

      python script.py

## Code Explanation

- **Import Relevant Libraries**:

    ```python

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    import seaborn as sns

    from sklearn.preprocessing import MinMaxScaler

    from sklearn.cluster import KMeans

    ```

- **Load and Preprocess Data**:

    ```python

    raw_data = pd.read_csv("path/to/your/dataset.csv")

    df = raw_data.copy()

    ```

- **Visualize Data Before Normalization**:

    ```python

    sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)

    plt.show()

    ```

- **Normalize the Data**:

    ```python

    cols = ['Annual Income (k$)', 'Spending Score (1-100)']

    scaled = MinMaxScaler()

    df[cols] = pd.DataFrame(scaled.fit_transform(df[cols]), columns=cols)

    sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)

    plt.show()

    ```

- **Determine Optimal Number of Clusters**:

    ```python

    Wcss = []

    for i in range(1, 12):

        kmeans = KMeans(n_clusters=i, random_state=0)

        kmeans.fit(df[cols])

        Wcss.append(kmeans.inertia_)

    plt.plot(range(1, 12), Wcss)

    plt.title("The Elbow Method")

    plt.xlabel("Number of Clusters")

    plt.ylabel("Wcss Values")

    plt.show()

    ```

- **Train the K-Means Model**:

    ```python

    optimal_clusters = 5

    kmeans_model = KMeans(n_clusters=optimal_clusters, random_state=0)

    df['cluster'] = kmeans_model.predict(df[cols])

    ```

- **Visualize Clusters**:

    ```python

    plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], c=df['cluster'], cmap='rainbow')

    plt.title("Clusters of Customers")

    plt.xlabel("Annual Income (k$)")

    plt.ylabel("Spending Score (1-100)")

    plt.show()

    ```

## Model Evaluation

Evaluate the clustering results by analyzing the characteristics of each cluster and the distribution of data points within each cluster.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chandkund/customer-segmentation-using-k-means-clustering

Awesome Lists containing this project

README