Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xgagandeep/kmeans-from-scratch
This project implements the KMeans clustering algorithm from scratch using Python.
https://github.com/xgagandeep/kmeans-from-scratch
Last synced: 6 days ago
JSON representation
This project implements the KMeans clustering algorithm from scratch using Python.
- Host: GitHub
- URL: https://github.com/xgagandeep/kmeans-from-scratch
- Owner: xgagandeep
- Created: 2024-09-11T18:21:20.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-11T18:22:40.000Z (2 months ago)
- Last Synced: 2024-09-12T04:41:45.098Z (2 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 81.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# KMeans from Scratch
**Repository:** [xgagandeep/KMeans-from-scratch](https://github.com/xgagandeep/KMeans-from-scratch)
**Date:** 2020
**Language:** Python
**Libraries:** NumPy, Matplotlib, Scikit-learn## Description
This project implements the KMeans clustering algorithm from scratch using Python. The notebook demonstrates how to perform clustering on synthetic data generated using the `make_blobs` function from Scikit-learn. The implementation includes initializing clusters, assigning points to clusters, updating cluster centers, and visualizing the results.
## Features
- **Synthetic Data Generation:** Uses Scikit-learn's `make_blobs` to generate synthetic datasets with specified number of clusters.
- **Cluster Initialization:** Randomly initializes cluster centers.
- **Distance Calculation:** Computes Euclidean distance between points and cluster centers.
- **Cluster Assignment:** Assigns each point to the nearest cluster center.
- **Cluster Update:** Updates cluster centers based on the mean of assigned points.
- **Visualization:** Plots clusters and their centers to visualize the clustering process.## Files
- **`Kmeans.ipynb`:** Jupyter Notebook containing the implementation of the KMeans clustering algorithm from scratch.
## Installation
To run this project, you need Python and the required libraries installed. Follow these steps:
1. **Clone the repository:**
```bash
git clone https://github.com/xgagandeep/KMeans-from-scratch.git
```2. **Navigate to the project directory:**
```bash
cd KMeans-from-scratch
```3. **Install the required libraries:**
```bash
pip install numpy matplotlib scikit-learn
```4. **Run the Jupyter Notebook:**
```bash
jupyter notebook Kmeans.ipynb
```## Usage
1. **Data Generation:** Generates synthetic data with `make_blobs`.
2. **Cluster Initialization:** Initializes clusters with random centers.
3. **Clustering Process:**
- Assign points to clusters based on the nearest center.
- Update cluster centers to the mean of the assigned points.
- Repeat the assignment and update steps until convergence.
4. **Visualization:** Plots clusters and their centers to show the clustering process.## Functions
- `distance(v1, v2)`: Computes the Euclidean distance between two points.
- `assignPointsToClusters(clusters)`: Assigns each point to the nearest cluster.
- `updateClusters(clusters)`: Updates the center of each cluster based on the mean of assigned points.
- `plotClusters(clusters)`: Visualizes the clusters and their centers.## Performance
- **Initialization:** Randomly initializes cluster centers.
- **Iteration:** Repeats the assignment and update steps until convergence.
- **Visualization:** Provides visual feedback of the clustering process and final cluster centers.## Contribution
Feel free to contribute to this project by submitting issues or pull requests. For any questions or feedback, please open an issue on the GitHub repository.